Tuesday 26 July 2011

Null/special characters

Like my LaTeX post, this is just going to be a home for scripting tricks...I'll start it off with one that's taken me the whole day to figure out.

awk and grep don't like 'binary files', by which I mean files with null characters in. I didn't even realise I had binary characters in the file until I used grep... it wasn't my file.

There are ways to continue to use grep and diff on binary files:
grep --binary-files=text
diff --text

In awk, any null/special characters are treated as the end of a line. Two ways to get around this are:
  1. rev - for example, if you want the last four columns of a file, but they're after a special character, just use rev twice: rev ${file} | awk '{print $1, $2, $3, $4}' | rev
  2. perl - you can do anything in perl, apparently. I just hate the language; it looks like the kind of poetry hard drive pixies would write after way too much whisky. Neither tr nor sed work, but the perl syntax is almost identical to that of sed: perl -pe 's/\000//g' ${file}