text to speech

Timestamp: 2011-05-21 02:44
Tags: oneliner

I’m planning to go biking and swimming again in the summer. What bugs me about it, is that during both activities I can’t do much more than just that. I can’t work on my projects or read a book. What I could do is listen to music or to a audio book. But what I would really like to do most is to continue reading my Perry Rhodan books. There are audio recordings of them but unfortunately those are edited and not a 1:1 copy of the books. But since my goal was to really read all of the books without leaving anything out this was unacceptable.

My idea was to see how far we are in terms of text to speech synthesizers. To increase the requirements I was looking for a good German synthesizer since the Perry Rhodan novels are written in German. After some search I found out that apparently a company called ivona is currently providing the best possible solution for my problem.

But of course there were additional issues. First, this company of course doesn’t release their product for linux so I had two options: installing windows in qemu and running their software there or seeing if I could find out how I could find a new purpose for their online demo application.

While I was setting up an emulated Windows XP in the background I tried to figure out how the webfrontend of ivona.com was working. After a few hours of trial and error I got the solution which in contrast to my initial believe did not even require any reverse engineering of flash binaries.

string="Wenn die Testpiloten und Spezialisten des Linearkommandos von dem
neuartigen Kompensationskonverter zur Errichtung eines aus Sechsdimensional
übergeordneten Feldlinien bestehenden Kompensatorfeldes sprachen, dann gab sich
niemand mehr die Mühe, die zungenbrecherischen Begriffe exakt auszusprechen.";
mplayer `curl --data-urlencode "tresc=$string" --data synth=1
--data chk=\`echo -n $string | perl -pe	'use MIME::Base64;
$_=MIME::Base64::encode($_);tr/\+\/=/-_./;s/\n//g'\` --data voice=20
--header "X-Requested-With: XMLHttpRequest"
http://www.ivona.com/index.php | sed "s/.*loadAudioFile('\(.*\)').*/\1/"`

The X-Requested-With header is required and instead of 20, the voice POST parameter can become 21 for a female German voice. Other numbers are for other languages.

Interestingly enough, the 250 character limit seems to be only enforced on the client side. As one can see with the request above it is no problem at all to convert phrases with more characters as well. Instead the server will just stop converting at a random point. It will be around 23-26 seconds of speech or 180-200 KB. But maybe this is useful anyways for somebody.