convert mailman archive to mbox
Tue, 26 Jul 2011 11:36 categories: onelinerThe mailman mailing list manager allows to download monthly archives in "Gzip'd Text" format. This format is not mbox but can easily be turned into it by the following simple line of sed:
sed 's/^\(From:\? .*\) \(at\|en\) /\1@/'
This makes it much easier to browse paste emails of an archive eg. using mutt or reply to past emails with a proper Message-ID as to not break threads.
OMG WTF PDF
Fri, 15 Jul 2011 23:07 categories: codeThis is the title of a great talk at the last chaos communication congress in berlin (27c3).
When writing my own pdf parser for a homework assignment that I put way too much ambition into, I encountered all of what is mentioned in that talk and was also able to realize how bad the situation really is. I just deleted three paragraphs of this post where I started to rant about how frickin bad the pdf format is. How unworkable it is and how literally impossible to perfectly implement. But instead, just watch the video of the talk and make sure to remember that it is even worse than Julia Wolf is able to make clear in 1h she was given.
So after I stopped myself from spreading another wave of my pdf hate over the internets lets look at the issue at hand:
I wanted to sign up online for a new contract with my bank. One of the requirements was, that I entered a number code that was supposedly only accessible to me once I printed out a pdf document. You heard me right - the pdf document only contained a gray box where the number was supposed to be and only upon printing it, it should reveal itself. I still have no clue how this is supposed to work, but assume it is some weird javascript (yes pdf can contain javascript) or proprietary forms extension. Or maybe even flash (yes, the acrobat reader contains an implementation of flash). Or it might just be native bytecode (yes, you can put native, platform specific bytecode into a pdf that the reader will then execute for you - isnt it great?). Needless to say that no pdf renderer I had at hand (I tried poppler based programs and mupdf) was able to give me the number - even when trying to print it where the magic was supposed to happen. So when I was already down to setting up a qemu instance to install windows xp so that I could install acrobat reader to finally open the document and print it to another pdf so that I could see that number, I thought again and wrote some additional code to my pdf parser that allowed me to investigate that pdf more thoroughly. And indeed, just by chance, I spotted a number in the annotation area of the document which looked just like the six digit number I needed. Tried it and voila it worked.
This is the snippet I uncompressed from the pdf to (just by chance) find the number I was looking for. The 000000 piece was actually containing the number I needed.
6 0 obj
<<
/DA (/Arial 14 Tf 0 g)
/Rect [ 243.249 176.784 382.489 210.297 ]
/FT /Tx
/MK <<
/BG [ 0.75 0.75 0.75 ]
>>
/Q 1
/P 4 0 R
/AP <<
/N 7 0 R
>>
/V (000000)
/T (Angebotskennnummer)
/Subtype /Widget
/Type /Annot
/F 36
/Ff 1
>>
endobj
So let me say: WTF? My bank not only requires me to resort to one specific pdf implementation (namely the acrobat reader by adobe) but also requires me to pay to a US based company first to have an operating system that reader software works on? Or am I really supposed to go through the raw pdf source by hand?? Bleh...
Also, dont ask for my code - it's super dirty and unreadable. Instead look at the mupdf project. It supplies a renderer which is massively superior to poppler in terms of speed (even suitable for embedded devices) and comes with a program called pdfclean which does the same thing my program did so that I was able to get the number I needed.
timestamp counter
Fri, 15 Jul 2011 14:08 categories: codeI recently discovered the timestamp counter instruction which solved a problem where I had to accurately benchmark a very small piece of code while putting it in a loop made gcc optimize it away with -O3.
static __inline__ unsigned long long getticks(void)
{
unsigned a, d;
asm volatile("rdtsc" : "=a" (a), "=d" (d));
return ((unsigned long long)a) | (((unsigned long long)d) << 32);
}
More code for other architectures as well can be found here.
When using that piece one has to take care that the code stays on the same processor, the processor doesnt change its clock speed and the system is not hibernated/suspended inbetween.
scriptreplay in javascript
Tue, 05 Jul 2011 09:19 categories: codeTL;DR: http://mister-muffin.de/scriptreplay/
Using terminal applications instead of GUI applications has the definitive speed advantage of not having to spend some time on moving a cursor to a 2D coordinate on the screen but instead just doing a 2mm downward motion with one or two fingers. I wonder if something other than terminal applications will allow me to interact with my computer faster until the invention of direct neural interfaces.
I really like terminal applications - not only because of the speed advantage but also because they offer much more real-estate in terms of usable screen space as they dont waste space on stuff like buttons, menu bars or all those pixels wasted on separators and the space between UI elements. Starting to use the pentadactyl firefox extension was not only a huge browsing speed improvement for me but I could also finally use my whole frickin 1920x1080 pixels for viewing the website (well except for a 19 pixels status bar at the bottom).
Now lets finally come to the topic of this post: scriptreplay in javascript.
script(1)
and scriptreplay(1)
, being part of the bsdutils package in
debian/ubuntu and the util-linux package in rpm based distribution will
probably also already be installed on your system and is one of those really
handy tools you did not know of before even though they were always there.
script
is a program that you can use to capture a terminal session or console
application output whereas scriptreplay
is able to replay that session by
using a timing file that script
is able to output on stderr. Without the
timings file, the typescript will include all terminal interaction and is
readily readable with a text editor or printable or easily uploadable to a
pastebin (no more selecting parts of your terminal window with your mouse and
copy-pasting that into the browser). Without the timing file it is useful to
document a process - for example if you want to show others how a bug happened
on your system or for a homework submission you have to hand in.
A very powerful feature is the mentioned timingfile you can capture by
redirecting the stderr
output of script
into a file. With the use of
scriptreplay
you can then watch your terminal interaction in real time.
While it would be kinda tedious to share your typescript and timingfile over a
pastebin so that a party would have to download those manually and use
scriptreplay
to watch them, I imagined a kind of youtube for terminal
sessions. This would also solve the problem of all those youtube videos that
are screen captures of terminal windows - needlessly encoding text as moving
images and by that not only destroying the ability to copy&paste things but
also needlessly increasing the filesize.
I remembered having seen such a website years ago but dont manage to find it again. To show a proof of concept I prepared the following website:
http://mister-muffin.de/scriptreplay/
I will probably never have the time to make a real webservice out of it but maybe something of this is useful for others.