So I was writing a quick python/gtk/webkit application for my own personal pleasure, starting with the usual:
import gtk import gobject import webkit import pango
When the interface was already working pretty nicely after about 500 LOC later, I began adding some more application logic, starting with figuring out how to properly do asynchronous http requests with my gobject main loop.
Threading is of course not an option but it had to be a simple event-based
solution. Gobject provides
gobject.io_add_watch to react to activity on
some socket but there was no library to parse the http communication over a
socket connection in sight.
At this point let me also shortly express my dislike for the synchronous nature of urllib/urllib2. This kind of behaviour is unacceptable in my eyes for network based I/O and a reason why I recently had a look at node.js.
But back to the topic. After some search I found out that one could use libcurl in connection with gobject callbacks so using this example of pycurl as a basis I wrote the following snippet which fetches a couple of http resources in parallel in an asynchronous fashion:
import os, sys, pycurl, gobject
from cStringIO import StringIO
sockets = set()
running = 1
urls = ("http://curl.haxx.se","http://www.python.org","http://pycurl.sourceforge.net")
def socket(event, socket, multi, data):
if event == pycurl.POLL_REMOVE: sockets.remove(socket)
elif socket not in sockets: sockets.add(socket)
m = pycurl.CurlMulti()
m.handles = 
for url in urls:
c = pycurl.Curl()
c.url = url
c.body = StringIO()
c.http_code = -1
while (pycurl.E_CALL_MULTI_PERFORM==m.socket_all()): pass
for c in m.handles:
c.http_code = c.getinfo(c.HTTP_CODE)
for c in m.handles:
data = c.body.getvalue()
print "%-53s http_code %3d, %6d bytes" % (c.url, c.http_code, len(data))
def handler(sock, *args):
(ret,running) = m.socket_action(sock,0)
if ret!=pycurl.E_CALL_MULTI_PERFORM: break
if running==0: done()
for s in sockets: gobject.io_add_watch(s, gobject.IO_IN | gobject.IO_OUT | gobject.IO_ERR, handler)
This works nicely and I would've sticked to it when larsc wouldnt have suggested to use libsoup in connection with gobject introspection for the python binding.
Of course I could've used pycurl because curl is cool but every python binding to a C-library adds another point of possible failure or outdatedness when upstream changes.
This issue is now nicely handled by using gobject introspection or pygobject in case of python. What is does is, to use so called "typelibs" to dynamically generate a binding to any gobject code. Typelibs are generated from gir files which are XML representations of the library API.
In Debian the typelibs are stored in /usr/lib/girepository-1.0/ and even if you dont know the mechanism you will probably already have lots of definitions in this directory. You install additional files with gir-packages like gir1.2-gtk-3.0 They are already available for all kinds of libraries like clutter, gconf, glade, glib, gstreamer, gtk, pango, gobject and many more.
To use them, my import line now looks the following:
from gi.repository import Gtk, GObject, GdkPixbuf, Pango, WebKit
This also solves my problem I laid out above about grabbing data over http from within a gobject event loop.
from gi.repository import Soup
Soup can do that but there is no "real" python binding for it. With pygobject one now doesnt need a "real" binding anymore but I just import it as shown above and voila I can interface the library from my python code!
Converting my application from the normal gtk/gobject/pango/webkit bindings to their pygobject counterparts was also a piece of cake and I learned how to do it and did it in under an hour. A really good writeup about how to do it can be found here. For some initial cleanup this regex based script comes in surprisingly handy as well.