19 October 2015

Chess Engines : Fishtest

In this weekly series on Modern Chess Engines, last seen in Bitboards, I've covered most of the topics that caught my eye in the original video. One topic, specific to Stockfish, took up the last third of the video. It started,

Unit tests: It's hard to unit test a chess engine. You can certainly test that the moves it generate are valid or that it doesn't do terrible moves, but you can't test strategy because you don't know what the corect move is yourself. So we say 'Goodbye!' to unit tests and instead we have the Fishtest framework.

Wikipedia explains,

Stockfish • Since 2013, Stockfish is being developed using a distributed testing framework named Fishtest, where volunteers are able to donate CPU time for testing improvements to the program. Changes to game-playing code are accepted or rejected based on results of playing of tens of thousands of games on the framework against an older version of the program, using sequential probability ratio testing. Tests on the framework are verified using the chi-squared test, and only if the resulting p-value is not statistically significant, the test is deemed reliable.

As of April 2015, the framework has used a total of more than 250 years of CPU time to play more than 165 million chess games. After the inception of Fishtest, Stockfish incurred an explosive growth of 120 elo points in just 12 months, propelling it to the top of all major rating lists.

The italics are mine. From chessprogramming.wikispaces.com:-

Stockfish Testing Framework • Fishtest is a web application written by Gary Linscott mainly in Python under the Pyramid Application Development Framework, to distribute games across different machines to reduce the test latency and increment throughput. Started in early 2013 with Stockfish 3.0, Fishtest has hundreds of contributors, as of May 2014, 744 testers and 52 developers active in testing ideas and tweaks, to make Stockfish the strongest open source or even chess program of the world.

How does this work in practice? I!ll look at that in my next post in this series.

No comments: