welcome to gobject introspection

categories: blog

So I was writing a quick python/gtk/webkit application for my own personal pleasure, starting with the usual:

import gtk
import gobject
import webkit
import pango

When the interface was already working pretty nicely after about 500 LOC later, I began adding some more application logic, starting with figuring out how to properly do asynchronous http requests with my gobject main loop.

Threading is of course not an option but it had to be a simple event-based solution. Gobject provides gobject.io_add_watch to react to activity on some socket but there was no library to parse the http communication over a socket connection in sight.

At this point let me also shortly express my dislike for the synchronous nature of urllib/urllib2. This kind of behaviour is unacceptable in my eyes for network based I/O and a reason why I recently had a look at node.js.

But back to the topic. After some search I found out that one could use libcurl in connection with gobject callbacks so using this example of pycurl as a basis I wrote the following snippet which fetches a couple of http resources in parallel in an asynchronous fashion:

import os, sys, pycurl, gobject
from cStringIO import StringIO

sockets = set()
running = 1

urls = ("http://curl.haxx.se","http://www.python.org","http://pycurl.sourceforge.net")

def socket(event, socket, multi, data):
if event == pycurl.POLL_REMOVE: sockets.remove(socket)
elif socket not in sockets: sockets.add(socket)

m = pycurl.CurlMulti()
m.setopt(pycurl.M_PIPELINING, 1)
m.setopt(pycurl.M_SOCKETFUNCTION, socket)
m.handles = []
for url in urls:
c = pycurl.Curl()
c.url = url
c.body = StringIO()
c.http_code = -1
m.handles.append (c)
c.setopt(c.URL, c.url)
c.setopt(c.WRITEFUNCTION, c.body.write)
m.add_handle(c)

while (pycurl.E_CALL_MULTI_PERFORM==m.socket_all()[0]): pass

def done():
for c in m.handles:
c.http_code = c.getinfo(c.HTTP_CODE)
m.remove_handle(c)
c.close()
m.close()

for c in m.handles:
data = c.body.getvalue()
print "%-53s http_code %3d, %6d bytes" % (c.url, c.http_code, len(data))
exit()

def handler(sock, *args):
while True:
(ret,running) = m.socket_action(sock,0)
if ret!=pycurl.E_CALL_MULTI_PERFORM: break
if running==0: done()
return True

for s in sockets: gobject.io_add_watch(s, gobject.IO_IN | gobject.IO_OUT | gobject.IO_ERR, handler)

gobject.MainLoop().run()

This works nicely and I would've sticked to it when larsc wouldnt have suggested to use libsoup in connection with gobject introspection for the python binding.

Of course I could've used pycurl because curl is cool but every python binding to a C-library adds another point of possible failure or outdatedness when upstream changes.

This issue is now nicely handled by using gobject introspection or pygobject in case of python. What is does is, to use so called "typelibs" to dynamically generate a binding to any gobject code. Typelibs are generated from gir files which are XML representations of the library API.

In Debian the typelibs are stored in /usr/lib/girepository-1.0/ and even if you dont know the mechanism you will probably already have lots of definitions in this directory. You install additional files with gir-packages like gir1.2-gtk-3.0 They are already available for all kinds of libraries like clutter, gconf, glade, glib, gstreamer, gtk, pango, gobject and many more.

To use them, my import line now looks the following:

from gi.repository import Gtk, GObject, GdkPixbuf, Pango, WebKit

This also solves my problem I laid out above about grabbing data over http from within a gobject event loop.

from gi.repository import Soup

Soup can do that but there is no "real" python binding for it. With pygobject one now doesnt need a "real" binding anymore but I just import it as shown above and voila I can interface the library from my python code!

Converting my application from the normal gtk/gobject/pango/webkit bindings to their pygobject counterparts was also a piece of cake and I learned how to do it and did it in under an hour. A really good writeup about how to do it can be found here. For some initial cleanup this regex based script comes in surprisingly handy as well.

View Comments

first steps with gta04

categories: blog

apt-get install emdebian-archive-keyring echo deb http://www.emdebian.org/debian/ squeeze main >> /etc/apt/sources.list apt-get update apt-get install gcc-4.4-arm-linux-gnueabi git clone git://neil.brown.name/gta04 gta04-kernel cd gta04-kernel/ git checkout merge export ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- INSTALL_MOD_PATH=modules_inst make distclean && make gta04a3_defconfig && make uImage -j 5 && make modules -j 5 && make modules_install tar -C modules_inst -czf modules.tgz .

/usr/share/doc/python-serial/examples/miniterm.py --lf -b 115200 /dev/ttyUSB0

git clone --depth 0 git://neil.brown.name/gta04 gta04-kernel-neil git fetch origin; git reset --hard origin/merge

git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git gta04-kernel-neil git remote add -t merge gta04 git://neil.brown.name/gta04 git fetch gta04 git checkout -b foo gta04/merge

View Comments

discovering tcc

categories: blog

Today I just discovered another great piece of software by Fabrice Bellard: tcc.

It is in orders of magnitude faster than gcc - lots of pieces of code of mine (albeit very small) were compiling 10-20 times faster.

Despite it being ANSI C compliant but not fully ISOC99 compliant all code of mine that I was trying it on compiled happily. Checking out the documentation, the missing ISOC99 parts turned out to be those I wouldnt use anyways (complex and imaginary numbers and variable length arrays).

Apart from being small and fast there are two killer features: C scripting support and dynamic code generation through libtcc.

By using the shebang line

#!/usr/local/bin/tcc -run

and setting the executable bit one can now "run" C source files just as scripts. tcc will compile and execute the code on the fly without even creating any temporary files but leaving the code in memory.

Since tcc is also extremely fast there is only little disadvantage over shell code. A simple helloworld.c "script" was "executed" in just 0.08 seconds where the same shell script did that in 0.03 seconds. The example "script" from the manpage reads:

#!/usr/bin/tcc -run
#include <stdio.h>

int main()
{
printf("Hello World\n");
return 0;
}

Another feature that is just the logical consequence of the above is tcc's ability to read C source from standard input and compile and run it on the fly:

echo 'main(){puts("hello");}' | tcc -run -

Now to the second amazing thing: dynamic code generation on the fly. The above is achieved using libtcc with which one can dynamically generate and compile C code through library calls to libtcc and execute the code right away from memory.

The following example program shows how to achieve this (inspired by libtcc_test.c from the tcc source):

#include <stdlib.h>
#include <stdio.h>
#include "libtcc.h"

int add(int a, int b) { return a + b; }

char my_program[] =
"int fib(int n) {\n"
" if (n <= 2) return 1;\n"
" else return fib(n-1) + fib(n-2);\n"
"}\n"
"int foobar(int n) {\n"
" printf(\"fib(%d) = %d\\n\", n, fib(n));\n"
" printf(\"add(%d, %d) = %d\\n\", n, 2 * n, add(n, 2 * n));\n"
" return 1337;\n"
"}\n";

int main(int argc, char **argv)
{
TCCState *s;
int (*foobar_func)(int);
void *mem;

s = tcc_new();
tcc_set_output_type(s, TCC_OUTPUT_MEMORY);
tcc_compile_string(s, my_program);
tcc_add_symbol(s, "add", add);

mem = malloc(tcc_relocate(s, NULL));
tcc_relocate(s, mem);

foobar_func = tcc_get_symbol(s, "foobar");

tcc_delete(s);

printf("foobar returned: %d\n", foobar_func(32));

free(mem);
return 0;
}

Two usecases of tcc already come to my mind. Firstly, there is an amazing movement going on that creates music from C oneliners. erlehman started a project on github where he is gathering a number of such oneliners. The workflow is to first dynamically generate the C source code by plugging the single line of algorithm into a simple for-loop wrapper, compiling this source with gcc and then piping the output of the generated executable into aplay or sox. Using tcc one could now just pipe the generated code into tcc which in turn would compile AND execute the code in one step. It would be faster than with gcc and would require no intermediary source files or executables but would just be one line that does everything.

Secondly there is a project I'm working on at Jacobs called Flowy where I struggle with optimizing performance of a parser of a processing language for network flow records. Performance is already quite good and to increase it, dynamic code generation has always been an option but would have been quite messy if done with gcc. With libtcc I would be able to dynamically construct the rules and execute them with just a number of library calls and without the complexity of gcc and the fact that I would have to call it as an executable.

View Comments

use /dev/shm

categories: blog

TODO: I have to use /dev/shm more often.

With even my laptop having a few gigs of RAM it is such a convenient and most importantly fast scratch space and still I'm not used to using it all the time where it would come in handy.

For example when building a rootfs with multistrap I can reduce the overall time needed for the build to finish from 23 minutes to under 14 minutes!

View Comments

clearing caches

categories: blog

For benchmarking purposes it makes sense to clear the caches, the linux kernel creates for us. An additional sync beforehand makes sure that everything is committed to disk (dirty objects will not be freed).

sync
sudo sysctl vm.drop_caches=3

or

sync
echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null

This will drop the pagecache, dentries and inodes.

View Comments
« Older Entries