transposing csv for gnuplot

categories: oneliner

I recently got a csv that was exported from openoffice spreadsheet with data arranged in rows and not columns as gnuplot likes it. It seems that gnuplot (intentionally) lacks the ability to parse data in rows instead of columns. Hence I had to switch rows and columns (transpose) my input csv such that gnuplot likes it.

Transposing whitespace delimetered text can be done with awk but csv is a bit more complex as it allows quotes and escapes. So a solution had to be found which understood how to read csv.

This turned out to be so simple and minimalistic that I had to post the resulting oneliner that did the job for me:

python -c'import csv,sys;csv.writer(sys.stdout,csv.excel_tab).writerows(map(None,*list(csv.reader(sys.stdin))))'

This will read input csv from stdin and output the transpose to stdout. The transpose is done by using:

map(None, *thelist)

Another way to do the transpose in python is by using:

zip(*thelist)

But this solution doesnt handle rows of different length well.

In addition the solution above will output the csv tab delimetered instead of using commas as gnuplot likes it by using the excel_tab dialect in the csv.writer.

The solution above is problematic when some of the input values inbetween are empty. It is not problematic because the csv would be transposed incorrectly but because gnuplot collapses several whitespaces into one. There are several solutions to that problem. Either, instead of an empty cell, insert "-" in the output:

python -c'import csv,sys; csv.writer(sys.stdout, csv.excel_tab).writerows(map(lambda *x:map(lambda x:x or "-",x),*list(csv.reader(sys.stdin))))'

Or output a comma delimetered cvs and tell gnupot that the input is comma delimetered:

python -c'import csv,sys;csv.writer(sys.stdout).writerows(map(None,*list(csv.reader(sys.stdin))))'

And then in gnuplot:

set datafile separator ","
View Comments

xen hypervisor on qemu kvm and domu nfs boot with vde

categories: debian

Let me share how to setup xen inside qemu with kvm support and domus booting over nfs from the qemu host and connecting multiple of those instances together using vde networking. The debian installer, the debian inside qemu, the domus and debootstrap will use my local apt-cacher setup at port 3142.

This setup is based on Debian wheezy (testing at this point). To install testing, grab the latest debian installer business card image from here:

wget http://cdimage.debian.org/cdimage/daily-builds/daily/arch-latest/amd64/iso-cd/debian-testing-amd64-businesscard.iso

Then create a disk image for qemu to use as its harddisk, create a sparse 3000mb file with dd:

dd if=/dev/zero of=disk.img bs=1 count=1 seek=3000MiB

Using the debian installer to setup the system is preferable in comparison to creating a rootfs with debootstrap as the xen hypervisor wants to be booted by grub. Qemu does not yet support booting the xen hypervisor straight way as it can boot a linux kernel with the -kernel option. Grub installation and partitioning is most easily done by just using d-i.

To automate the installation I'm using the following preseed file:

d-i debian-installer/locale string en_US
d-i console-keymaps-at/keymap select us
d-i keyboard-configuration/xkb-keymap select us
d-i netcfg/choose_interface select auto
d-i netcfg/get_hostname string debian
d-i netcfg/get_domain string 
d-i mirror/country string manual
d-i mirror/http/hostname string 10.0.2.2:3142
d-i mirror/http/directory string /ftp.de.debian.org/debian
d-i mirror/suite string wheezy
d-i mirror/udeb/suite string wheezy
d-i passwd/root-login boolean true
d-i passwd/make-user boolean false
d-i passwd/root-password password root
d-i passwd/root-password-again password root
d-i clock-setup/utc boolean true
d-i time/zone string UTC
d-i clock-setup/ntp boolean true
d-i partman-auto/method string regular
d-i partman-auto/choose_recipe select atomic
d-i partman-partitioning/confirm_write_new_label boolean true
d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i partman/confirm_nooverwrite boolean true
d-i base-installer/install-recommends boolean false
d-i base-installer/kernel/image select none
tasksel tasksel/first multiselect 
d-i pkgsel/include string xen-linux-system-amd64 xen-tools xen-utils
d-i finish-install/reboot_in_progress note

It answers all questions for debconf so that no user input is needed. It will tell d-i to use the apt-cacher setup on the qemu host (10.0.2.2 is the default ip address of the host from inside qemu when using usermode networking), will install wheezy, will set the root password to "root", will only create a single partition for / on the virtual harddrive, will not install recommends and not install the "standard" tasksel target but will install the xen hypervisor and some xen tools. I uploaded the file to http://mister-muffin.de/debian/preseed3.txt

To make qemu use virtualization features of the host cpu, one only needs to install the qemu-kvm package.

apt-get install qemu-kvm

From that point on, qemu will automagically use kvm. The speedup gained is tremendous. Instead of taking 13 minutes to boot my xen dom0 (ridiculous) the machine would boot up in only 3 minutes (workable).

Now start qemu, giving it disk.img as the harddisk and debian-testing-amd64-businesscard.iso as the boot medium in the cd drive.

qemu-system-x86_64 -m 1024 -hda disk.img -cdrom

debian-testing-amd64-businesscard.iso

The isolinux boot menu will pop up. Choose "Advanced options" and then select "Automated install" and press [TAB] to edit the boot commandline. Append the preseed url for debconf like this to the end:

preseed/url=http://mister-muffin.de/debian/preseed3.txt

After hitting enter the system will install by itself. After it is done it will automatically reboot into debian. The hypervisor will not boot by default (bug#603832) so change the grub priority from inside the virtual machine by doing:

mv /etc/grub.d/10_linux /etc/grub.d/21_linux
update-grub

You also do not want to continue using qemu in graphical mode but want to connect to the virtual machine via serial. To do so, let a tty spawn on the serial line in inittab:

echo "T0:23:respawn:/sbin/getty -L ttyS0 9600 vt100" >> /etc/inittab

In comparison to the graphical SDL display, the advantages are easy copy and paste, using the same keyboard layout as the host, screensaver is not deactivated, proper console font, terminal, window manager integration, no grabbing and ungrabbing and working system bell.

To also get qemu, kernel and init output on serial add the following options to /etc/default/grub:

GRUB_CMDLINE_LINUX="console=ttyS0"
GRUB_TERMINAL=serial

And run update-grub again.

Now qemu can be started like this:

qemu-system-x86_64 -m 1024 -hda disk.img -nographic

It will automatically boot the xen hypervisor and attach a tty to serial when the boot is finished.

Once it is, configure xen. Activate bridged networking by uncommenting the following line in the xen config:

(network-script network-bridge)

For bridging to work, xen will need the brctl utility of the bridge-utils package (bug#648816).

apt-get install bridge-utils

Since the debian mirror is still the same as during installation, the apt-cacher setup from back then will still be used which makes installation of additional packages extremely fast.

There are then several ways to create a new domu. The easiest one is to just call:

xen-create-image --hostname=vm01 --dir=/root --dhcp --noswap --size=400Mb

This command will first run debootstrap and then configure the result of it. Since the debian mirror of the host system is chosen as the default, debootstrap will run reasonably fast.

A faster way is to have a tarball that contains the result of a debootstrap run ready and then calling xen-create-image the --install-method=tar option.

So either inside or outside qemu (outside is naturally faster) run debootstrap like this:

debootstrap --variant=minbase wheezy target-directory http://127.0.0.1:3142/ftp.de.debian.org/debian

Tar it, put it inside the virtual machine and then inside qemu:

xen-create-image --hostname=vm01 --dir=/root --dhcp --noswap --size=400Mb --install-method=tar --install-source=/root/vm01.tar

This command will unpack the tarball into a disk image and then configure it.

Instead of running xen-create-image inside qemu, you can also run it on the qemu host which will be faster but if you do not want to nfs boot but boot from the disk image it creates, dont forget to copy the xen domu config it creates inside the virtual machine.

Instead of xen-create-image you can also do all steps manually. So first run debootstrap as usual and then do some basic configuration. The most important part would be the activation of the xen tty in inittab. xen-create-image will call a number of hook scripts which do this configuration. Those hooks can also be run manually on a manually created debootstrap root directory like this:

export verbose=true
export hostname=foobar
export dhcp=true
export mirror=http://127.0.0.1:3142/ftp.de.debian.org/debian
export dist=wheezy
for script in `find /usr/lib/xen-tools/debian.d -type f ! -name '90-make-fstab' | sort -n`; do
        $script /home/josch/debian-wheezy
done

Editing fstab is not strictly needed and only hurts when using nfs boot. Dont execute the fstab hook when using nfs and generally have a look in each of the hooks to find out what they do. It is saver to run xen-create-image but to understand what it does, look into the hooks in /usr/lib/xen-tools/debian.d.

Also as a note, you can always mount the qemu image using:

mount -o loop,offset=1048576 disk.img /mnt

The offset of the first partition can be found out using fdisk on disk.img.

In our setup we want to boot the domu from a root directory which is served by the host of the virtual machine. Doing so will just require a proper xen configuration and no disk space on the hypervisor side is used.

Either create a configuration from scratch or use:

xen-create-nfs --hostname=vm01 --dhcp --nfs_server=10.0.2.2 --nfs_root=/srv/nfs/vm01 --memory=128

And then edit the result so that it looks like this:

    kernel     = '/boot/vmlinuz-3.0.0-1-amd64'
    ramdisk    = '/boot/initrd.img-3.0.0-1-amd64'
    vcpus       = '1'
    memory     = '128'
    name       = 'vm01'
    hostname   = 'vm01'
    dhcp       = 'dhcp'
    vif        = [ '' ]
    nfs_server = '10.0.2.2'
    nfs_root   = '/srv/nfs/vm01'
    root       = '/dev/nfs'
    extra      = 'boot=nfs root=/dev/nfs'

You should only have to add the 'extra' option as this is important for the initrd to boot from nfs.

On your host do:

apt-get install nfs-kernel-server

And then add a directory serving a rootfs in /etc/exports:

/srv/nfs 127.0.0.1(rw,sync,no_subtree_check,no_root_squash,insecure)

These options will only allow localhost to access it. The insecure option is necessary because of the choice of port the initrd will connect from.

The rootfs can be created using debootstrap and then running the hooks as explained above, or by taking a filesystem image that was created by xen-create-image and extracting its contents to the directory exported as an nfs share as this image will already contain the modifications needed for a proper boot.

In qemu you can now start the vm and connect to it using:

xm create /etc/xen/vm01.cfg -c

One disconnects with it using the ctrl+] escape as in telnet. To reconnect, use:

xm console vm01

To now connect multiple qemu instances, each running a hypervisor together so that they can each access the internet and talk to each other, the most convenient setup is using VDE networking. It is a network bridge implemented in userspace (no superuser priviliges required) connecting the machines together (bridging them) using socket communication. Together with the slirp module this bridge is connected to the outer world and slirp can even provide dhcp to the qemu instances connected to it.

To start vde:

vde_switch

The -daemon switch can be used to send the process into the background and the -sock switch can be used to supply a socket different from the default /tmp/vde.ctl.

To start slirp:

slirpvde -dhcp

The --daemon option will send it to the background as well while the --sock option allows to supply a custom control socket.

After the bridge is set up and slirp is connected to it, start qemu like this:

qemu-system-x86_64 -m 1024 -hda disk.img -nographic -net nic,macaddr=XX:XX:XX:XX:XX:XX -net vde,sock=/tmp/vde.ctl

Where XX:XX:XX:XX:XX:XX is a unique mac address that can be generated by doing:

printf 'DE:AD:BE:EF:%02X:%02X\n' $((RANDOM%256)) $((RANDOM%256))

The ip addresses supplied by the slirp dhcp will be the same as by qemu usermode networking so the setup from above doesnt change.

View Comments

adblocking with a hosts file

categories: blog

Naturally adblock plus is a must-have extension for firefox but other programs displaying websites might not offer such a facility.

To block ads on any application accessing the internet, the use of a hosts file which redirects requests to certain hostnames to 127.0.0.1 (which will refuse incoming connections) provides a universal method to get rid of advertisements.

The question is how to obtain a list of malicious hosts.

Searching around revealed three lists that seemed to be well-maintained:

  • http://winhelp2002.mvps.org/hosts.htm
  • http://pgl.yoyo.org/adservers/
  • http://someonewhocares.org/hosts/

The according hosts-file entries can be found under these urls respectively:

  • http://winhelp2002.mvps.org/hosts.txt
  • http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext
  • http://someonewhocares.org/hosts/hosts

I also looked into the adblock plus filter rules but they mostly contain expressions for the path, query and fragment part of URIs and not so much hostnames. This makes sense because using its syntax adblock plus is able to block with much more accuracy than just blocking whole domains.

Now I wanted a combined list of them without duplicates so I cleaned them up using the following sed expression:

sed 's/\([^#]*\)#.*/\1/;s/[ \t]*$//;s/^[ \t]*//;s/[ \t]\+/ /g'

It removes comments, whitespace at the beginning and end of the line and reduces any additional whitespace (between ip and hostname) to only one space. I would then run the output through sort and uniq and append the result to my /etc/hosts.

What is still problematic about this approach is, that if one doesnt have a service bound to 127.0.0.1:80 then every application trying to establish a TCP connection to it will meaninglessly wait for localhost to respond until timeout is reached. To avoid this and immediately send a tcp RST when the browser is redirected to 127.0.0.1 when it tries to retrieve an advertisement, I use the following iptables rule:

iptables -A INPUT -i lo -p tcp -m tcp --dport 80 -j REJECT --reject-with tcp-reset

Some hosts that you might also want to add to your /etc/hosts because they are there to track users are:

127.0.0.1 www.google-analytics.com
127.0.0.1 auto.search.msn.com
127.0.0.1 ad.doubleclick.net
127.0.0.1 google-analytics.com
127.0.0.1 stat.livejournal.com
127.0.0.1 stats.surfaid.ihost.com
127.0.0.1 ads.imeem.com

They are not included by default in the lists above because it might break some websites if they were.

EDIT (2012-05-21)

I forgot to include port 443 for https in the iptables rule above. For example google uses https for googleadservices.com and others might too, so dont forget to also reset connections to port 443 with the rule given above.

View Comments

gnuplot live update

categories: blog

When you have data that is not instantly generated but trickles in one by one (either because calculation is difficult or because the data source produces new datapoints over time) but you still want to visualize it with gnuplot the easiest option is to hit the refresh button in the wx interface everytime you want to refresh the data.

A more cool way, is to make gnuplot "autoupdate" the data it displays whenever new data is available automagically.

Lets simulate a data source by the output of this python script that will just output coordinates on a circle with radius one around the coordinate center and wait for 0.5 seconds after every step.

from math import pi, sin, cos
from time import sleep

for alpha in [2*pi*i/16.0 for i in range(16)]:
print sin(alpha), cos(alpha)
sleep(0.5)

The magic will be done by this shell script which does nothing else than reading lines from stdin, appending them to a buffer and printing the buffer with gnuplot syntax around it every time new input is received.

while read line; do
lines="$lines$line\n"
echo "plot \"-\""
echo -n $lines
echo "e"
done

These two scripts can now be used like this:

python -u circle.py | sh live-gnuplot.sh | gnuplot -p

The -u option has to be passed to python to enable unbuffered output. The -p option for gnuplot makes the plot window remain even after the main gnuplot exited.

Since you with this setup gnuplot would auto-scale the coordinate axes according to the current input, a more beatiful output with a fixed size coordinate frame would be produced by:

{ echo "set xrange [-2:2]"; echo "set yrange [-2:2]"; python -u circle.py | sh live-gnuplot.sh } | gnuplot -p

This way you can also pass more options like title, style, terminal or others.

View Comments

persistent NAT setup

categories: blog

Often having some tablet or smartphone running linux connected to my usb port and I always have to look up how to enable NAT-ing on my laptop so that those devices can access the internet via usb-ethernet connection. So here is how to do it:

sysctl net.ipv4.ip_forward=1
iptables -t nat -A POSTROUTING -j MASQUERADE

But I will not need this reminder anymore because this is how to make the setup persistant between reboots:

$ echo net.ipv4.ip_forward = 1 > /etc/sysctl.d/ip_forward.conf
$ cat /etc/iptables/rules.v4
*nat
-A POSTROUTING -j MASQUERADE
COMMIT

The content in /etc/iptables/rules.v4 is normally generated via iptables-save so there might be some extra work needed to combine this with possible existing content. The initscript required to run iptables-restore with this file as input is provided as the iptables-persistent package.

View Comments