Benchmarks with more than one thread on an Apple M1

Maybe I was too stupid or not that involved in chess engine testing, but only after some small research in the source code of Stockfish 13 I figured out how to do the benchmark with more than one thread:

benchmark.cpp:

 95 /// setup_bench() builds a list of UCI commands to be run by bench. There
 96 /// are five parameters: TT size in MB, number of search threads that
 97 /// should be used, the limit value spent for each position, a file name
 98 /// where to look for positions in FEN format, the type of the limit:
 99 /// depth, perft, nodes and movetime (in millisecs), and evaluation type
100 /// mixed (default), classical, NNUE.
101 ///
102 /// bench -> search default positions up to depth 13
103 /// bench 64 1 15 -> search default positions up to depth 15 (TT = 64MB)
104 /// bench 64 4 5000 current movetime -> search current position with 4 threads for 5 sec
105 /// bench 64 1 100000 default nodes -> search default positions for 100K nodes each
106 /// bench 16 1 5 default perft -> run a perft 5 on default positions

bench 64 [Number of threads]

Update: A smart guy in the talkchess forum gave some advices, so I changed the settings. Surpressing the output and repeating the bench a few times should give more reliable results. I also deleted the lines with engines, where I wasn’t sure that these benchmarks are comparable. Though this test should actually only show where in the hardware ranking the M1 is located (roughly), my first try was probably too sloppy.

In addition I had some fun tonight digging up long forgotten knowledge. I built a small script running each bench ten times and calculating the average.

 1 #!/bin/zsh
 2 
 3 ########################################################
 4 # usage: bench.sh [engine] [number of threads]         #
 5 # for the Honey family:                                #
 6 # bench.sh [engine] [number of threads] 13 [true|false]#
 7 ########################################################
 8 
 9 test -e bench.tmp && rm bench.tmp
10 
11 i=0
12 
13 while [ $i -lt 10 ]; do
14     $1  bench 64 $2 $3 $4  1>/dev/null 2>>bench.tmp
15     let i=i+1
16 done
17 
18 test -e benchmarks/$2-threads-$1.txt && rm benchmarks/$2-threads-$1.txt
19 
20 grep 'second' bench.tmp | sed -e 's/.*[ \t]//'>>benchmarks/$2-threads-$1.txt
21 
22 sum=`cat benchmarks/$2-threads-$1.txt | awk '{sum+=$1} END{print sum}'`
23 
24 avg=$((sum / 10))
25 echo $avg
26 
27 rm bench.tmp
28

So here you go, the numbers are the nodes per second during the benchmark test:

Engine	1 thread	2 threads	4 threads	8 threads
amoeba-3.2	2,850,225	5,569,507	10,999,492	14,946,973
Black-Diamond-v13 (classical)	3,002,359	6,147,954	12,277,000	17,303,000
Black-Diamond-v13 (nnue)	1,962,538	4,108,865	8,159,830	11,206,000
cfish-12	2,399,213	4,766,505	9,387,348	12,803,333
cfish-17022021	2,715,285	5,446,693	10,684,916	15,042,888
corchess-nnue-1.3	2,327,636	4,630,767	9,150,978	12,950,410
crystal-3.1	2,149,787	4,630,295	9,382,968	13,315,242
Honey-v13 (classical)	2,024,407
Honey-v13 (nnue)	1,317,226
Oki-Maguro (classical)	3,205,223	6,622,363	13,317,000	18,754,000
Oki-Maguro (nnue)	2,189,189	4,505,202	9,075,129	13,401,000
RubiChess-2.1-dev (classical)	4,801,175	7,664,483	13,296,037	19,547,936
RubiChess-2.1-dev (nnue)	2,140,918	3,628,272	6,797,738	9,257,131
stockfish-12	2,252,653	4,517,488	8,924,218	12,138,502
stockfish-13	2,320,252	4,557,075	9,414,908	13,183,130
stockfish-13-osx*	1,464,662	2,978,481	6,003,444	8,495,631
sugar-AI-1.50	2,290,729	4,573,459	9,050,257	12,646,313
sugar-AI-ICCF-140a	2,149,396	4,424,464	9,032,318	13,010,932

*x86_64 binaries

For some engines I didn’t figure out yet how to do a multiple thread benchmark.

acepoint's home

Benchmarks with more than one thread on an Apple M1

Ein Gedanke zu “Benchmarks with more than one thread on an Apple M1”