Mister Muffin Blog

windows xp on qemu

Thu, 18 Aug 2011 20:33 categories: blog

Works like a breeze - only note to NOT use a qcow diskimage (will be horribly slow) and use the following qemu options to select network and sound hardware that windows knows about out of the box:

To create a disk image use one of the following commands depending on whether your system has fallocate support or not:

$ dd if=/dev/zero of=windows.img bs=1 count=1 seek=3000MiB
$ fallocate -l 3000MiB windows.img

If your filesystem supports sparse files, then your image will not immediately occupy its full size on your disk. Then start qemu like this:

$ qemu-system-x86_64 -k en-us -enable-kvm -hda windows.img \
> -cdrom windows_xp.iso -net nic,model=rtl8139 -net user \
> -soundhw ac97 -m 1024

For bare Windows XP you do not need to specify the -m 1024 option. Windows XP will be quite happy with the default of 128 MiB RAM. But given the amount of RAM current hosts have available I usually throw in a bit extra.

My roommate akira used this setup to connect to a university distance course which required the students to use the proprietary Adobe Connect software to connect to the classroom. It turns out that with above setup, the speaker and microphone forwarding between the host and the Windows XP guest worked out of the box. Getting a USB webcame to work turned out to be a bit more tricky but can be accomplished by adding the following to the qemu invocation above:

$ qemu-system-x86_64 [...] \
> -readconfig /usr/share/doc/qemu-system-common/ich9-ehci-uhci.cfg \
> -device usb-host,vendorid=0x046d,productid=0x0825,id=webcam,bus=ehci.0

This will attach the hosts usb device with id 046d:0825 (a Logitech webcam in this case) to the qemu guest. It doesn't even seem to be necessary to unload the kernel module responsible for the webcame (uvcvideo in this case) from the host. The guest seems to be able to cooperate well with it. The interesting bit of above invocation is the -readconfig argument which points to /usr/share/doc/qemu-system-common/ich9-ehci-uhci.cfg which is a hardware configuration for qemu written by Gerd Hoffmann. It creates a USB 2.0 adapter with companion USB 1.0 controllers as a multifunction device on one of the guests PCI slots. This has the advantage that attaching any USB device from the host to the guest bus ehci.0 will work no matter whether the device is USB 1.0 or 2.0. If you know what you are doing you can always specify -usb or -device usb-ehci,id=ehci, depending on the USB standard of your device and then attach it to the right bus. But the -readconfig solution will work out of the box in both cases. You can read more about qemu and USB in the excellent documentation which can be found at /usr/share/doc/qemu-system-common/usb2.txt.gz or at docs/usb2.txt in the qemu git which was also written by Gerd Hoffmann.

In case you want to read data from the guest with the virtual machine switched off you can mount the disk image you created earlier by doing:

mount -o loop,offset=32256 windows.img windows

Or find out the offset with fdisk: switch display units to sectors with 'u' and print the partition table with 'p'. Then do the math for the proper partition offset. With a default windows xp install it will be 63 sectors times 512 bytes = 32256 bytes.

View Comments

python for-loop scope and nested functions

Sun, 14 Aug 2011 13:57 categories: blog

I recently stumbled over some nasty problems with respect to python scopes in for loops and nested functions therein.

Firstly, try out this snippet:

>>> a = []
>>> for i in range(10):
...     a.append(lambda: i)
>>> for f in a: f()

While I expected it to print 0-9 it printed 9 ten times. The reason is twofold.

Firstly it might not be very straightforward but the following should not be surprising:

>>> for i in range(10): pass
>>> i
9

This is, the loop variable is not local to the for loop. There is no new scope created for loops in python. There is only scope for classes and functions.

Secondly, python does late binding with function or lambda calls. The following might be a bit more surprising:

>>> i = 0
>>> f = lambda: i
>>> i = 1
>>> f()
1

Since there is no scope for loops, the following will also print 81 nine times:

>>> a = []
>>> for i in range(10):
...    j = i**2
...    a.append(lambda: j)
>>> for f in a: f()

The problem of course presented itself to me in a much weirder manner which made it take quite some time until I figuered out the root cause of my problem. My problem was, that I indeed expected the first example to print the numbers 0-9 which it doesnt for reasons explained above.

What I was struggeling with, were gtk and dbus callbacks. My code looked like this:

for iface in interfaces:
    def on_succes_cb(msg):
        print iface
    iface.MyDBusMethod(reply_handler=on_success_cb)

This of course printed the last value iface had in this loop on every invocation of the reply_handler. On a sidenote is is surprising to see how sparsely documented the use of reply_handler and error_handler in dbus python is and how seldomly it seems to be used.

Another piece of the same code looked like this:

for func in ["RequestScan", "EnableTechnology", "DisableTechnology"]:
    button = gtk.Button(func)
    def button_onclick(button, event):
        print func
    button.connect("button_press_event", button_onclick)
    hbox.pack_start(button, False, False, 0)

And of course every time the differently named buttons where clicked it would print "DisableTechnology".

So how to fix it?

Lets see how to fix the first example:

>>> lst = []
>>> for i in range(10):
...     lst.append(lambda j=i: j)
>>> for f in lst: f()

What's the difference? The lambda now has it's own local variable j and in contrast to i, the scope of j is local to the lambda. This will now successfully print 0-9.

But this doesnt help me with my above dbus and gtk callback problems as I can't freely change the function signature for the callbacks. So what to do?

The solution Michael 'emdete' Dietrich pointed me to, was to just use a wrapper function around my code which gets the loop variables as its arguments. By doing so, the loop variable gets copied into the function scope and will not be changed there by subsequent loop iterations.

>>> lst = []
>>> for i in range(10):
...     def bind(j):
...             lst.append(lambda: j)
...     bind(i)
>>> for f in lst: f()

>>> lst = []
>>> def bind(j):
...     lst.append(lambda: j)
>>> for i in range(10):
...     bind(i)
>>> for f in lst: f()

I agreed with emdete that the second variant looks cleaner. The first variant would have made sense if loops had their own scope but hey, they havent. Using the second variant also avoids confusing with variable usage etc.

So now my code looks like this:

def bind(iface):
    def on_succes_cb(msg):
        print iface
    iface.MyDBusMethod(reply_handler=on_success_cb)
for iface in interfaces:
    bind(iface)

and this:

def bind(func):
    button = gtk.Button(func)
    def button_onclick(button, event):
        print func
    button.connect("button_press_event", button_onclick)
    return button
for func in ["RequestScan", "EnableTechnology", "DisableTechnology"]:
    button = bind(func)
    hbox.pack_start(button, False, False, 0)

To me it always appeared unintuitive that scope is limited to functions and not extended to code blocks. But on the other hand it mimics other languages with side effects:

int i;
for(i=0; i<10; i++);
printf("%d\n", i);

There was a big discussion (64 mails) with several possible solutions on the python-ideas list three years ago: http://mail.python.org/pipermail/python-ideas/2008-October/002109.html

But it doesnt seem as if anything caught on. Hence, for the time being one has to create just another function for new scope. Works for me. Still, even though understanding why and how it works I have trouble finding the first example easily understandable. I would still expect it to work differently.

View Comments

convert mailman archive to mbox

Tue, 26 Jul 2011 11:36 categories: oneliner

The mailman mailing list manager allows to download monthly archives in "Gzip'd Text" format. This format is not mbox but can easily be turned into it by the following simple line of sed:

sed 's/^\(From:\? .*\) \(at\|en\) /\1@/'

This makes it much easier to browse paste emails of an archive eg. using mutt or reply to past emails with a proper Message-ID as to not break threads.

View Comments

OMG WTF PDF

Fri, 15 Jul 2011 23:07 categories: code

This is the title of a great talk at the last chaos communication congress in berlin (27c3).

When writing my own pdf parser for a homework assignment that I put way too much ambition into, I encountered all of what is mentioned in that talk and was also able to realize how bad the situation really is. I just deleted three paragraphs of this post where I started to rant about how frickin bad the pdf format is. How unworkable it is and how literally impossible to perfectly implement. But instead, just watch the video of the talk and make sure to remember that it is even worse than Julia Wolf is able to make clear in 1h she was given.

So after I stopped myself from spreading another wave of my pdf hate over the internets lets look at the issue at hand:

I wanted to sign up online for a new contract with my bank. One of the requirements was, that I entered a number code that was supposedly only accessible to me once I printed out a pdf document. You heard me right - the pdf document only contained a gray box where the number was supposed to be and only upon printing it, it should reveal itself. I still have no clue how this is supposed to work, but assume it is some weird javascript (yes pdf can contain javascript) or proprietary forms extension. Or maybe even flash (yes, the acrobat reader contains an implementation of flash). Or it might just be native bytecode (yes, you can put native, platform specific bytecode into a pdf that the reader will then execute for you - isnt it great?). Needless to say that no pdf renderer I had at hand (I tried poppler based programs and mupdf) was able to give me the number - even when trying to print it where the magic was supposed to happen. So when I was already down to setting up a qemu instance to install windows xp so that I could install acrobat reader to finally open the document and print it to another pdf so that I could see that number, I thought again and wrote some additional code to my pdf parser that allowed me to investigate that pdf more thoroughly. And indeed, just by chance, I spotted a number in the annotation area of the document which looked just like the six digit number I needed. Tried it and voila it worked.

This is the snippet I uncompressed from the pdf to (just by chance) find the number I was looking for. The 000000 piece was actually containing the number I needed.

6 0 obj
<<
  /DA (/Arial 14 Tf 0 g)
  /Rect [ 243.249 176.784 382.489 210.297 ]
  /FT /Tx
  /MK <<
    /BG [ 0.75 0.75 0.75 ]
  >>
  /Q 1
  /P 4 0 R
  /AP <<
    /N 7 0 R
  >>
  /V (000000)
  /T (Angebotskennnummer)
  /Subtype /Widget
  /Type /Annot
  /F 36
  /Ff 1
>>
endobj

So let me say: WTF? My bank not only requires me to resort to one specific pdf implementation (namely the acrobat reader by adobe) but also requires me to pay to a US based company first to have an operating system that reader software works on? Or am I really supposed to go through the raw pdf source by hand?? Bleh...

Also, dont ask for my code - it's super dirty and unreadable. Instead look at the mupdf project. It supplies a renderer which is massively superior to poppler in terms of speed (even suitable for embedded devices) and comes with a program called pdfclean which does the same thing my program did so that I was able to get the number I needed.

View Comments

timestamp counter

Fri, 15 Jul 2011 14:08 categories: code

I recently discovered the timestamp counter instruction which solved a problem where I had to accurately benchmark a very small piece of code while putting it in a loop made gcc optimize it away with -O3.

static __inline__ unsigned long long getticks(void)
{
     unsigned a, d;
     asm volatile("rdtsc" : "=a" (a), "=d" (d));
     return ((unsigned long long)a) | (((unsigned long long)d) << 32);
}

More code for other architectures as well can be found here.

When using that piece one has to take care that the code stays on the same processor, the processor doesnt change its clock speed and the system is not hibernated/suspended inbetween.

View Comments

« Older Entries -- Newer Entries »

Mister Muffin Blog

Static Pages

Services

Latest Blog Posts

Categories

Archives

Syndication

windows xp on qemu

python for-loop scope and nested functions

convert mailman archive to mbox

OMG WTF PDF

timestamp counter