11 January 2019

AlphaZero Match Conditions

Last week's post, Insights on AlphaZero, featuring comments on Talkchess.com by DeepMind's Matthew Lai, ended with the remark,

by matthewlai >> Tue Dec 11, 2018 12:13 pm • [DeepMind] decided to release games from the start position and TCEC positions as the main result of the chess part of the paper because start position is more scientifically pure (they were actually playing the game of chess, not a game that's just like chess except you are forced to start from these positions), and from TCEC openings we show that we can play well even in openings that it wouldn't normally play.'

The context of that statement is best understood through a couple of DeepMind's published papers. The following chart is from the paper published in Science magazine, 'A general reinforcement learning algorithm...' by David Silver et al; see last month's post AlphaZero Is Back! (December 2018) for a link to the paper.


Fig. 2. Comparison with specialized programs.

The extended caption to the chart (the portions relevant to chess) explained,

(A) Tournament evaluation of AlphaZero in chess, shogi, and Go in matches against, respectively, Stockfish, Elmo, and the previously published version of AlphaGo Zero that was trained for 3 days. In the top bar, AlphaZero plays white; in the bottom bar, AlphaZero plays black. Each bar shows the results from AlphaZero’s perspective:win (W; green), draw (D; gray), or loss (L; red).

(B) Scalability of AlphaZero with thinking time compared with Stockfish and Elmo. Stockfish and Elmo always receive full time (3 hours per game plus 15 s per move); time for AlphaZero is scaled down as indicated.

(C) Extra evaluations of AlphaZero in chess against the most recent version of Stockfish at the time of writing and against Stockfish with a strong opening book. [...]

(D) Average result of chess matches starting from different opening positions, either common human positions or the 2016 TCEC world championship opening positions [...]

A further explanation was provided in the 'Supplementary Materials' referenced at the end of the paper.

Match conditions • We measured the head-to-head performance of AlphaZero in matches against each of the above opponents. Three types of match were played: starting from the initial board position (the default configuration, unless otherwise specified); starting from human opening positions; or starting from the 2016 TCEC opening positions.

The majority of matches for chess, shogi and Go used the 2016 TCEC superfinal time controls: 3 hours of main thinking time, plus 15 additional seconds of thinking time for each move. We also investigated asymmetric time controls, where the opponent received 3 hours of main thinking time but AlphaZero received only a fraction of this time. [...]

Matches consisted of 1,000 games, except for the human openings (200 games as black and 200 games as white from each opening) and the 2016 TCEC openings (50 games as black and 50 games as white from each of the 50 openings). The human opening positions were chosen as those played more than 100,000 times in an online database: 365Chess.com

Now that we're up to speed on match conditions, let's look more closely at the impact of an opening book. I'm a big fan of chess960 and was struck by a curious comment in the matthewlai quote that opened this post:-

[The] start position is more scientifically pure (they were actually playing the game of chess, not a game that's just like chess except you are forced to start from these positions)

The context was pre-selected opening variations that arise from the traditional start position (RNBQKBNR), but the comment could just as easily apply to any of the 959 other chess960 start positions. I'll try to come back at some time to the chess960 aspect. That quote, which was addressing the use of Brainfish and the entirety of which can be found in the 'Insights' post, provoked another Q&A dialog:-

by matthewlai >> Tue Dec 11, 2018 12:51 pm • Q: I don't remember what Cerebellum book lines are chosen by what UI, but using for SF8 a regular polyglot opening book like the small, but good BookX.bin and with Lc0 [Leela] without any book in Cutechess-Cli, I did get very varied openings. And a decrease of Lc0 strength of at least 50 Elo points compared to just playing from Initial Board position, but at short time controls. I think in Cutechess-Cli one has a random seed for a .bin book, but I don't remember well now. • A: It's true, there are many books that we could have chosen from. The problem is there wasn't one that we thought everyone will be happy with, and we do have a lot of "critics" (in quotes because they aren't the useful kind of critics) who will probably go into any book we choose, find a line that they think is bad using whatever their preferred analysis method is on the day, and say we deliberately chose that book because it's bad.

In the end we settled on Brainfish book just because it's actually generated by Stockfish, so it's about as "pure" as it gets. I do believe that opening book will help quite a bit at short time control, since "intuitive play" is what AZ/Lc0 are good at. Though at the time controls we played with, SF does make some very reasonable opening moves (at least when it would still be in most books), and I'm not sure if it would have made a lot of difference.

Since that entire dialog is riddled with chess engine jargon, I'll come back to it in another post.

No comments: