A few notes:
- Although all the engines currently represented are Java-based, it's designed to support engines written in any language. Engines are run in separate processes and all communication is through either the file system or standard output.
- There's a separate mechanism for testing correctness. This is less polished and not yet represented in the output.
- Collected data is stored in CSV-like files. There are some built-in tools for analyzing this data, like the one that generated the web pages I linked to above.
- Engines are versioned; you could collect statistics from different versions of an engine over time and compare them.
- Similarly, game IDs include hashes of their contents, so if a game is changed we won't get results from different versions mixed up. All games come via the game repositories on ggp.org, including its mirrors of the Dresden and Stanford Gamemaster game servers.
- Games that are known to have bugs are excluded via a blacklist. (STANFORD/kono might need to go on the blacklist; an earlier version is already on it.)
- To capture the effects of caching and JIT compiler optimizations, each game is run in each engine for 30 seconds before returning performance statistics. This means it takes a few hours per engine, and a few days to collect the full set of results.
- There are many more engines I hope to add, especially those used by Fluxplayer, Cadiaplayer, and Sancho. The current set are mostly those that were easiest to add.
- This was inspired by a paper in the GIGA '13 proceedings by Yngvi Björnsson and Stephan Schiffel on relative performance of GDL interpreters. Hopefully this will make future comparison efforts much easier and enable wider adoption of faster engines.