David T. Crosby | Voice Recognition in Linux

As I write my book, I’ve been searching for various schemes that might speed up the process. I tend to write in the same way that I speak, so I wondered if it might be possible to “talk out” my book. I looked around for voice recognition software in Linux, and was saddened to see that the technology isn’t quite up to muster. However, that’s not to say that it won’t be ready soon. To help you out, I’ve picked out and reviewed the voice recognition projects that are most likely to succeed.

Simon

This project seems the most interested in trying to give the normal end user a helping hand. Unfortunately, it is just too buggy to be seriously considered right now. I tried 0.2, but it crashes after only a few seconds of being started. I also tried the alpha build of 0.3, but was unable to get it to build properly.

Fortunately, the Simon team seems interested in having decent documentation, and is showing regular activity. If they can focus on the stability issues, they might be able to pull the lead by virtue of better usability.

Sphinx

This project seems to have the technical know-how, and is being spearheaded by Carnegie Mellon University. Powerful? Apparently. Intuitive? Not a chance.

I was really disappointed to find that despite the massive brainpower in this group, they haven’t made anything to help out the ordinary end-user interested in trying it out. A simple tutorial that shows someone how to get from installation to working would be incredibly helpful, as would a GUI for setting it up.

Julius

They’ve got a regular release cycle, it built fine on my machine, and it looks like it supports on-the-fly recognition from a microphone. And… I’m lost.

Like Sphinx, Julius falls into the trap of being too complex, and not helping out the casual user. Two-pass vocabulary speech recognition sounds pretty awesome, but I’d be much happier if I knew how to even use the software, as well as a GUI to help out. They seem to be in the process of converting their documentation to English, but there’s no saying how long it will take for that to happen. Also, they’ve made their own license for the source, which is likely to scare away third-party developers.

Some parting thoughts

It’s become clear to me that this is a field that could use a lot of help. When IBM pulled their Linux ViaVoice support, it looks like it set back voice recognition by a few years. However, the existing tools aren’t bad - far from it, really. With enough spit and polish, I don’t see why these three projects couldn’t start making serious advances towards regular home use. Front-ends, packaging, and documentation aren’t the sexiest part of VR, but a little bit of effort in those areas will carry it far further than constant tweaking of the backend.

Linux