transposing csv for gnuplot

categories: oneliner

I recently got a csv that was exported from openoffice spreadsheet with data arranged in rows and not columns as gnuplot likes it. It seems that gnuplot (intentionally) lacks the ability to parse data in rows instead of columns. Hence I had to switch rows and columns (transpose) my input csv such that gnuplot likes it.

Transposing whitespace delimetered text can be done with awk but csv is a bit more complex as it allows quotes and escapes. So a solution had to be found which understood how to read csv.

This turned out to be so simple and minimalistic that I had to post the resulting oneliner that did the job for me:

python -c'import csv,sys;csv.writer(sys.stdout,csv.excel_tab).writerows(map(None,*list(csv.reader(sys.stdin))))'

This will read input csv from stdin and output the transpose to stdout. The transpose is done by using:

map(None, *thelist)

Another way to do the transpose in python is by using:

zip(*thelist)

But this solution doesnt handle rows of different length well.

In addition the solution above will output the csv tab delimetered instead of using commas as gnuplot likes it by using the excel_tab dialect in the csv.writer.

The solution above is problematic when some of the input values inbetween are empty. It is not problematic because the csv would be transposed incorrectly but because gnuplot collapses several whitespaces into one. There are several solutions to that problem. Either, instead of an empty cell, insert "-" in the output:

python -c'import csv,sys; csv.writer(sys.stdout, csv.excel_tab).writerows(map(lambda *x:map(lambda x:x or "-",x),*list(csv.reader(sys.stdin))))'

Or output a comma delimetered cvs and tell gnupot that the input is comma delimetered:

python -c'import csv,sys;csv.writer(sys.stdout).writerows(map(None,*list(csv.reader(sys.stdin))))'

And then in gnuplot:

set datafile separator ","
View Comments

converting filename charset

categories: oneliner

So I was copying files on a vfat-formatted usb stick when this error popped up:

Invalid or incomplete multibyte or wide character

Same issue with cp, rsync and tar. So I just thought this would be an issue with stupid vfat so I mkfs.ntfs the drive and tried again - same issue.

So while I thought that at least NTFS would allow any character except NULL I was proven wrong. Looking it up in wikipedia indeed yielded that ntfs file names are limited to UTF-16 code points - in contrast to ext2/3/4, reiserfs, Btrfs, XFS and surely even more which allow any byte except NULL.

So what my files contained was a number of iso 8859-1 or latin1 characters like 0xe8 (è), 0xe0 (à), 0xf9 (ù) or 0xec (ì). It might also have been Windows-1252 but apparently the bytes between 0x80 and 0x9F didnt appear.

This is how to finally batch-convert character sets in filenames:

convmv -f latin1 -t utf8 --notest *.flac
View Comments

audio conversion, splitting, tagging

categories: oneliner

I had some operas lying around amongst others Wagner's "Ring Des Nibelungen", Puccini's "La Boheme" and Verdi's "La Traviata". Sadly, some of them were not stored as flac but as monkey's audio (extension .ape) which is a non-free lossless codec but fortunately ffmpeg has a GPL'd decoder included. Additionally some CD's were stored as a single file with a cue sheet next to them.

So my task was: convert ape to flac, split audio by information from cue files and tag everything correctly.

These were the lines I used to split the audio:

ffmpeg -i CD1.ape CD1.wav
shnsplit -o flac -f CD1.cue -t "%a %n %t" CD1.wav
rm *pregap.flac CD1.wav CD1.ape
cuetag CD1.cue *.flac

First ape is converted to wav so that shnsplit can read it. Shnsplit will then take the timing information from the cue sheet and split the wav into chunks which it conveniently also converts to flac on the fly. Sadly it doesnt do tagging of those files so this is done by cuetag afterward with information from the cue sheet. You need the shntool and cuetools packages.

Other data was already existing as separate tracks and only had to be converted from ape to flac. Sadly flac offers no means for batch conversion but using a for loop and basename the same effect is created. Conveniently ffmpeg will copy the tags from the ape files to the new flac files as well.

for f in *.ape; do b=`basename $f .ape`; ffmpeg -i "$b".ape "$b".flac; done

The resulting flac files were about 3-5% larger than the ape files which is a totally acceptable tradeoff for being able to ditch an unfree format.

thomasg pointed out to me, that using the zsh there is an even neater way to do this loop without basename(1) and without some shell-loop keywords:

for f in *.ape; ffmpeg -i $f $f:r.flac
View Comments

converting base64 on the commandline

categories: oneliner

There is an interesting amount of ways to do that. First line is encoding, second line is decoding.

using python:

python -m base64 -e < infile > outfile
python -m base64 -d < infile > outfile

using openssl (-a is short for -base64):

openssl enc -a -e < infile > outfile
openssl enc -a -d < infile > outfile

using base64 (part of the coreutils package):

base64 < infile > outfile
base64 -d < infile > outfile

using perl:

perl -MMIME::Base64 -ne 'print encode_base64($_)' < infile > outfile
perl -MMIME::Base64 -ne 'print encode_base64($_)' < infile > outfile

using uuencode (but adds a header and footer):

uuencode -m - < infile > outfile
uudecode < infile > outfile
View Comments

convert mailman archive to mbox

categories: oneliner

The mailman mailing list manager allows to download monthly archives in "Gzip'd Text" format. This format is not mbox but can easily be turned into it by the following simple line of sed:

sed 's/^\(From:\? .*\) \(at\|en\) /\1@/'

This makes it much easier to browse paste emails of an archive eg. using mutt or reply to past emails with a proper Message-ID as to not break threads.

View Comments
« Older Entries