Thursday, 22 October 2015
I have an idea for something I want to build, which will involve a speech recognition component, written in Java and a Hidden Markov Model, written in Python. So that means I have to integrate components written in two different languages. What's the best way of doing it? One way would be to run Python on the JVM. There is a Python implementation for the JVM, Jython, but from what I've heard it's painfully slow. Since I'm aiming for something as close to real time as possible, it's unlikley to meet my needs. It did occur to me that there could be a faster way to run Python on the JVM. Pypy is a self-hosting, JIT-compliled implementation of Python, which is much faster than the reference implementation. If its code generation phase were modified to emit Java Bytecode, then Pypy could run on the JVM. This approach, which I call Jypy, would be a worthwhile project for somebody who knows Java Bytecode. Unfortunately, I'm not that person. However, I then thought about the architecture of my project. I'd already realised that it would have to be organised as a number of concurrent processes, communicating via pipes. I then realised that meant that I didn't need to run Python on the JVM at all. The Java and Python components could each run in their own processes, and didn't need to share any resources. The only integration I needed was pipes. You know the sense of delight when you realise that something complicated is actually simple? That's how I felt when I worked that out.