Erlang Binary Performance
I was benchmarking egeoip today, which is my from-scratch Erlang geolocation library. It uses the MaxMind GeoLite City database, which has implementations in a bunch of other languages so it's great to compare with. The results were rather surprising to me, because I hadn't previously done any benchmarking of Erlang performance.
The test environment is a MacBook Pro 2ghz, Mac OS X 10.4.7, Erlang R11B-1 w/ HiPE enabled, Python 2.4.3 (using their GeoIP Python API, which is written in C). I do have other processes running (namely iTunes), but the benchmark is fair because the background load is consistent throughout the tests.
- Erlang, BEAM:
- ~13k geolocations/sec
- Python/C:
- ~18k geolocations/sec
- Erlang, HiPE:
- ~44k geolocations/sec
As you can see, Erlang holds it own against Python w/ C extensions, and it can mop the floor with it when using the HiPE compiler. Erlang clearly kicks some serious ass at working with binaries, both in syntax and performance. The only work I had to do to make it faster was c(egeoip, [native]).
Note that I've only been using Erlang for a few weeks and have not done any profiling or performance tuning at all beyond what I assumed would be the fastest way given the documentation I had read.
Update
After looking at Shark results across the two implementations, it seems that the GeoIP API default settings are pessimistic for benchmarking purposes and that most of the time was spent in syscalls (Erlang looked like its time was spent in GC). A fair comparison would be using the memory cache option, which gets even better performance.
- Python/C (Memory Cache):
- ~117k geolocations/sec
This is a lot more in line with what I expected, but I'm still impressed that Erlang w/ HiPE can get nearly 40% of the speed of C when scanning through a 25MB array of bytes. I'm pretty sure I can make some algorithmic improvements to the code (which the C implementation may or may not do), so we'll see how close I can get.
Update
After spending a while with eprof doing some profile driven optimizations, I was able to considerably speed up the Erlang code. The biggest BEAM optimization was moving the giant tuples out of function bodies, apparently BEAM is rather naive about that and decides to actually create and garbage collect them on every call in certain cases. Some other optimizations were done to the way it looks for null terminators and a hyper-optimized fast-path for IPv4 string to long conversion.
Given the API I could cheat and parse out some of the data when the user asks for it, rather than at record fetch time. This would make the benchmark incredibly fast, but it would be an unfair comparison with the Python/C version. I'll probably end up doing that anyway, since I'm typically looking for just the country of an IP address.
I still haven't really done any algorithmic optimizations to the lookup, but here's the numbers:
- Erlang, BEAM:
- ~44k geolocations/sec
- Erlang, HiPE:
- ~64k geolocations/sec
This brings the BEAM performance up to about 38% of C/Python and HiPE up to 55%. Not bad!