Is Your PC? |
|---|
| * Slow |
| * Crashing |
| * Blue-Screening |
| * Not Connecting |
| * Misbehaving |
| * Infected |
Products |
|---|
| * Home PCs |
| * Anti-Virus Software |
Services |
|---|
| * Computer Service |
| * Network Setup |
| * Web Hosting |
|   |
| * Price List |
| * Home |
| * Contact Us |
| Useful Stuff | Oracle Scripts | Unix Scripts |
Text files sent to you from another platform do not execute correctly in sqlplus
Oracle version – Generic
Files that have been encoded in a Unicode derivative may contain non-pritable characters that interfere with sqlplus' interpretation of SQL statements.
Files that are transferred between a Windows environment and a Unix environment may be affected by a variety of issues.
1. End-of-line characters
2. Byte Order Mark characters
In Unix environments the end-of-line is delimited by a single character - the newline character aka a line-feed 0x0a ( control-J ). Terminal emulators know that when they encounter a newline, they also need to supply a carriage-return character 0x0c ( control-M ). Whereas in windows, files explicitly have both the carriage-return and line-feed characters at the end of a line. [ For those not familiar with type-writers, a carriage-return is the control character that moves the cursor to the start of the line. A line-feed moves the cursor down one line ]
This difference can be very annoying - with files in Windows appearing to be one continuous line or in Unix having ^M at the end of each line or worse!
A. When transferring files between environments, do so in TEXT mode. Most programs, like ftp, winscp and similar, have a TEXT mode for this - they add/remove the extra characters so you don't have to.
B. Use a conversion script such as dos2unix
For files that have been encoded in a derivative of Unicode, there may well be characters at the start of the file denoting the endianness of the file - this is called the Byte Order Mark e.g....
# od -bc fred.txt
0000000 357 273 277 115 171 040 156 141 155 145 040 151 163 040 106 162
357 273 277 M y n a m e i s F r
0000020 145 144 012
e d \n
Notice the three characters Octal 357, Octal 273 and Octal 277 at the very start of the above file? This is the UTF-8 encoding of the Byte Order Mark. I'll explain ...
Unicode is a coding system that encompases most of the character sets in the world. It has 16 planes each having 65,535 ( 0xFFFF ) possible characters making a total of 1,114,112 different code-points! ( from 0 to 0x10FFFF ). Most of the time, it is sufficient to deal only with plane 0 - the Base Multi-lingual Plane (BMP).
The code-points of the BMP, can be encoded into an 8-bit, 16-bit or even a 32-bit character set. These encodings are called Unicode Transformation Formats and they give us UTF-8, UTF-16 and UTF-32.
For a multi-byte character set like UTF-16 the bytes of each character will be stored in one of two possible orders - these are known as Big Endian and Little Endian.
When a file is transferred from one platform to another it is important to know which way around the bytes were stored and if there is no other indication, a Byte Order Mark ( BOM ) is added to the start of the file to show this. This is always 0xFEFF so that if the Endianness changes, the BOM will be affected in a predictable way.
This is all well and good but UTF-8 isn't a multi-byte character set, so there is no Endianness but nevertheless some Windows programs still encode the data with a BOM (notepad is one such program) causing interoperability issues.
The simplest is to process files with a script - I have written a script which does a bit more than the possibly familiar dos2unix script called (get ready to groan!) dos3unix ...
#!/bin/ksh
#--------------------------------------------------------------------------------
# File: dos3unix
# Purpose: Remove the UTF-8 Byte Order Mark and Windows style carriage-returns
# Usage: dos3unix file [file ...]
# Notes: The orignal file is preserved as file- (i.e. with a hyphen appended)
#--------------------------------------------------------------------------------
alias doit='true'
while getopts n name
do
case $name in
n) alias doit='false';;
*) cat << eof
dos3unix: Usage
# dos3unix [-n] file [file...]
-n check and report only, no changes made
eof
exit 1 ;;
esac
done
shift $(($OPTIND -1))
for f in "$@"
do
ty=$( file -b "$f" )
n=$( grep -cP '\r$' "$f" )
f3=$( sed '/./q' "$f" | cut -c1-3 )
f32=$( sed '/./q' "$f" | cut -c1-3 | tr -d '\357\273\277' )
if [[ "$f3" != "$f32" && "$n" == "0" ]]
then
doit && mv "$f" "${f}-"
doit && tr -d '\357\273\277' < "${f}-" > "$f"
echo "$f: unBOM: $ty"
elif [[ "$f3" != "$f32" ]]
then
doit && mv "$f" "${f}-"
doit && tr -d '\357\273\277' < "${f}-" | sed 's/\r$//' > "$f"
echo "$f: dos2unix-unBOM: $ty"
elif [[ "$n" == "0" ]]
then
echo "$f: $ty"
else
doit && mv "$f" "${f}-"
doit && sed 's/\r$//' < "${f}-" > "$f"
echo "$f: dos2unix: $ty"
fi
done
# End-of-file dos3unix
Simply cut-and-paste the above script into the file dos3unix, make it executable and then put it somewhere in your path e.g.
# mkdir ~/bin # PATH=$PATH:~/bin # chmod +x dos3unix # mv dos3unix ~/bin
|   Software4Students   |   Your advert here - call 0116 2870 610   |   Internet Connection Speedtest   |   Premier Maps Online   |