UTF8 insert Byte Order Mark

From n0r1sk software solutions
Jump to: navigation, search

UTF8 in general does not need a BOM, a Byte Order Mark but sometimes libraries that are reading and writing files need it as mandatory argument. In UTF16 and UTF32 the BOM is mandatory.

More information could be found on Wikipedia.[1][2]

Show the differce

With the following commands you could determine the used encoding.

Command:

file mytest.file

Output without BOM:

mytest.file: UTF-8 Unicode text

Output with BOM:

mytest.file: UTF-8 Unicode (with BOM) text

The script to add BOM (Byte Order Mark

Prerequirement

The follwing software package will install the user space utiliy "uconv".[3]

apt-get install libicu-dev

Script

If you like you could make the BASEDIR and TARGETDIR virables as parameter passed to the script. The script will duplicate the filesystem tree to the target directory. The source will remain unchanged.

Be aware: The script deletes the target directory on each run!

#!/bin/bash

BASEDIR=/root/messages
TARGETDIR=/tmp/messages

rm -Rf $TARGETDIR
mkdir $TARGETDIR

function RecursiveConvert()
{
        for f in *
        do
                if [ -d $f ]; then
                        echo "Directory: $f"
                        (cd $f; mkdir -v $TARGETDIR/${PWD##*/}; RecursiveConvert);
                else   
                        echo "File: $f"
                        OUTPUT=`file $f | awk -F ":" '{ print $2 }'`
                        OUTPUT=$(sed -e 's/^[[:space:]]*//' <<<"$OUTPUT")
                        echo $OUTPUT                    
                        if [ "$OUTPUT" = "UTF-8 Unicode text" ]; then
                                echo "UNICODE WITHOUT BOM"
                                echo "Converting....."
                                uconv --add-signature $f > $TARGETDIR/${PWD##*/}/$f
                                echo "......done!"

                        else   
                                echo "Other file encoding $OUTPUT"
                                echo "Copying....."
                                cp -v $f $TARGETDIR/${PWD##*/}/$f
                                echo ".....done"
                        fi

                fi

        done
}

(cd $BASEDIR; RecursiveConvert)