05 October 2020

Stockfish NNUE - Three Early Threads

Continuing with Stockfish NNUE Dev (September 2020), I closed the post saying,

Long story short : NNUE gets to the essence of handcrafted evaluation functions. [...] I'll come back to the subject after I've had some time to orient myself. I hope the time will be well spent.

First, here's a Discord search term for locating the discussion on a specific day. This particular expression goes to early discussions of Stockfish NNUE development:-

in: #general-nnue-dev during: 2020-06-06

Those discussions lead to a Talkchess thread that started a week earlier: Stockfish NN release - NNUE (talkchess.com).

So a year [ago] somebody ported Shogi NN called NNUE (efficiently updateable neural network backwards) to SF10 as a proof of concept. He released the binaries and instructions after I asked him a few days ago.

Another discussion leads to the Stockfish Fishcooking forum, also from end-May: Time for NN eval?:-

Talkchess.com: 'Stockfish NN release - NNUE'; Maybe we should try to take a look at this?

To go back even further, it helps to be familiar with Hisayori Noda (chessprogramming.org):-

Hisayori Noda, a Japanese mathematician, computer scientist, and computer Shogi and chess programmer, who introduced NNUE to Stockfish, yielding in Stockfish NNUE.

The Chessprogramming wiki also gives a diagram of NNUE's architecture for evaluation, aka HalfKP.


Source: Stockfish NNUE (chessprogramming.org)

So there we have it: a specific NN architecture, a record of its early training & validation on chess positions, its integration into Stockfish, and samples of its early games against other engines. It's extraordinary how quickly the NNUE concept evolved.

***

Later: After I wrote the post, I spent some time trying to decipher the title of the chart, 'NNUE HalfKP 256x2-32-32', which refers to the structure of the four layers shown in the chart. The number 41024 is explained in the link to Chessprogramming.org under the chart:-

41024 = 64 * 641.
64 comes from the number of the cells where King may exist.

641 = 64 * 5 * 2 + 1.
64 here comes from the number of the cells where a piece other than King may exist.
5 is the number of piece types other than King.
2 is the number of colors, White and Black.
1 is a captured piece.

As for 'HalfKP', I found an explanation in the Stockfish Discord group (11 June):-

KP is combination of own King - piece position [plus] combination of opponent King - piece position. • HalfKP is combination of own King - piece position only.

This leads to questions that I wasn't able to answer. The Chessprogramming chart shows both 'Side to move' and 'Side not to move'. Since both Kings are on distinct squares, are both Kings included in the 64*641 array? If so, why call it 'HalfKP'? If not, where is the opposing King in the 64*5*2 array? I hope this will become clearer as I increase my understanding of NNUE.

Another question regards the big 64*641 array. Why not use a smaller '769 = 64 * 6 * 2 + 1' array? Is there some advantage to isolating the King in a higher level array where all cells but one are zero? The most important factor in creating the network is the large number of training positions that have known evaluations attached to them.

One other point worth mentioning: the acronym 'crep' is sometimes mentioned as an enhancement. It refers to 'castling rights, en passant', which are also attributes of many chess positions.

No comments: