MatPlus.Net

Website founded by
Milan Velimirović
in 2006

23:06 UTC

ISC 2024


	Headlines
	Forum*
	Fellows
	Members
	DL Archive
	Links

CHESS
SOLVING
Tournaments
Rating lists
1-Jul-2024

B P C F

MatPlus.Net

Forum

General

Creation of benchmark(s) for software solving orthodox moremovers

You can only view this page!

(1) Posted by Marcin Sterkowiec [Tuesday, Sep 17, 2024 11:50]

Creation of benchmark(s) for software solving orthodox moremovers

I was thinking a bit about suggestion by Hauke Reddmann to create benchmark(s) for software solving orthodox moremovers (in this thread: https://matplus.net/start.php?px=1726562371&app=forum&act=posts&fid=gen&tid=3191 )

"I would extend the suggestion to compare:
Compare *all* problem solving engines (might include databases, might include
a standard LiChess Stockfish engine) for various aspects. There is the classic
Alybadix, there is the completely unknown MateMaster (which I still use
for composing #2, it's a bit easier than Popeye/Olive) etc..."

After some consideration it seems quite a large task involving a few methodological and pracical issues. Here are my thoughts:

As suggested by Hauke we should try to test various aspects separately, for example:
1) Test speed (in brute-force sense) on a random set of #4-#9 compositions (#4-#6 all solutions; #7-#9 single solution; maybe also: #4/#7 more pieces, #6/#9 fewer pieces on board)
2) Test speed/cleverness on a random set of #15-#30 (this would have to be a bit larger set than in point (1) to avoid "random cleverness")
3) Test "magic capabilities" on a set of #31+

The main methodological issue is selection of problems: I think it has to be a random selection - but from which set?
(This randomness should let avoid issues with testing on problems for which programs were finetuned - I think every program has such a set)
I think for the beginning we should select from "sound" problems (no duals/no multiple solutions; once done, we might add more complexity)

Practical issues I can see:
a) Every program has some prerequisites (not installation alone). For example J.G.Island has a set of 5-piece tablebase that has to be downloaded separately (~4GB). As far as I know Gustav may require manual setting of parameters of analysis.
b) The benchmark should be done by someone who has access to full/commercial versions of all the programs in the benchmark

(Read Only)

pid=26209

(2) Posted by Marcin Sterkowiec [Thursday, Sep 19, 2024 20:35]

I have created a procedure for the first test (to test speed in brute-force sense).
The procedure and first tentative results (comparison of J.G.Island vs Popeye) can be found here: https://jgisland.pl/benchmark/procedure.txt
(Of course 1-thread Popeye that focuses on versatility couldn't be expected to be faster, although in case of the first problem - #4 - it indeed was slightly faster)

My question is whether such a procedure (with 10 sets of problems https://jgisland.pl/benchmark/sets.zip and a Python script rand.py https://jgisland.pl/benchmark/rand.py.txt to randomly draw 10 compositions from these sets) can generally be accepted as sufficient. If so, we might try to test a broader range of programs - and later also run the second version of the test (speed/cleverness using new sets of #15-30).

(Read Only)

pid=26211

(3) Posted by Viktoras Paliulionis [Thursday, Sep 19, 2024 20:41]

I think the test set should be representative enough to include different styles of problems with a variety of motives, for example:
1) Flight giving,
2) Underpromotion,
3) Zugzwang, tempo maneuvers,
4) Anticipatory white sacrifice.

The main purpose of the chess composer tool should be to search for cooks and duals. The program should show all variations and duals. If the program does not have such capabilities, then it is not very useful for testing chess problems. Perhaps, such programs should not even be included in the comparison.

(Read Only)

pid=26212

(4) Posted by Marcin Sterkowiec [Friday, Sep 20, 2024 00:07]

Yes, for sure this procedure can be much better - just as you suggested I have added five additional sets of problems to cover the motives you wrote about: see https://jgisland.pl/benchmark/procedureUpdated.txt and https://jgisland.pl/benchmark/setsExtended.zip
My intention is to create a simple procedure that would be just good enough, that might be generally considered to provide unbiased results (unbiased towards any of the programs)
Some of these problems within this random test suite (#7+) are intended to be solved in a single solution mode (MaxSolutions 1) in order to keep the overall time of the test in bounds.

As far as the second issue you mentioned is concerned, I think that the main goal is always to find a solution (and duals, if any) in the first move.
In my opinion showing all variants in case of moremovers (particularly those very long ones) may be not so much valuable (too much information at a time).
So from my perspective it seems just enough if this information is accessible on demand after the first/main search. Note that J.G.Island has an option to save all this information (it is called a "temporary register") to a disk file.
You can set the limit the size of the file - you can share this file. It is a powerful option - in my opinion much more powerful that just displaying a very, very long multiline text.

But indeed J.G.Island (in its first/main run) does not look for duals in moves other than the first one. At this moment of the analysis the program assumes that the overall speed is much more important for user than details about all the possible late duals (if they exist, if needed, they can be found later, during "interactive" analysis - usually it is a very fast operation since it uses all the gathered data - the temporary register).
Technically, first you make a black move/response (or wait for the best defence suggested by program) and then right click 'Start' button to force finding all the solutions for white in the second or later moves (clicking left mouse button would mean search for a single solution only). You can also find this information in the tooltip of the Start button (alternatively, you can use items from menu "Problem")

(Read Only)

pid=26213

(5) Posted by Viktoras Paliulionis [Friday, Sep 20, 2024 10:23]

To ensure unbiased results, all programs being compared must provide the same output. If one program does not search for duals, and another does, then it is clear that the one that does less work is likely to be significantly faster. As far as I know, neither Popeye nor WinChloe have an option to not search for duals, but only to refuse to search for short variations.

(Read Only)

pid=26215

(6) Posted by Joost de Heer [Friday, Sep 20, 2024 11:20]

For the Popeye output: I am in favour for an 'ignoreDualisticBlackDefenses' option which won't print black moves which have multiple white replies. That should clean up the output significantly, and if you want to see details why a certain defense is not printed, you can always test the intermediate position.

(Read Only)

pid=26216

(7) Posted by Viktoras Paliulionis [Friday, Sep 20, 2024 13:55]

Such an option would clean up the output, but would not significantly reduce the solving time, since the program would still have to look for duals.

(Read Only)

pid=26217

(8) Posted by Marcin Sterkowiec [Friday, Sep 20, 2024 17:55]

In my opinion we should simply run all the compared programs with all the available options for speed. Of course every engine has its specifics, it's obious that it should not be evaluated by this single criterion (speed in orthodox moremovers), but still a benchmark for speed is a... benchmark for speed. The comparison table may also have a clearly visible column saying which additional features are included within solving time (e.g. "search for late duals included [green]/not included[red]". That's how I see it... Probably we should wait for more feedback, including Hauke, who is probably the most interested (this thread may be considered his initiative)

(Read Only)

pid=26218

(9) Posted by Viktoras Paliulionis [Friday, Sep 20, 2024 22:58]

I did a test by solving one chess problem using different programs.

[7, '4n3/4K3/3BpNp1/1R1bP3/pPPk1pr1/2R3pQ/2PPr2n/q4Bb1']
---------- MaxSolutions 1 ----------
1. Gustav: 0.13 s (+all duals, automatic parameters, 8MB)
2. WinChloe: ≈3 s (4 threads), ≈4 s (2 threads), ≈10 s (1 thread) (+dual searching, 1GB)
3. J.G. Island: 9.92 s (4 threads), 16.98 s (2 threads), > 30 s (1 thread) (no dual searching, default parameters, 1GB)
4. Popeye: 451.12 s (+dual searching, 1GB)
---------- All solutions ----------
1. Gustav: 0.20 s (+dual searching, automatic parameters, 8MB)
2. WinChloe: 6.81 s (4 threads), 11.32 s (2 threads), 18.47 s (1 thread) (+dual searching, 1GB)
3. J.G. Island: > 30 s (8 threads) (no dual searching, default parameters, 1GB)
4. Popeye: 1207.76 s (+dual searching, 1GB)

WinChloe has a parameter "Stop solving if >= ... solutions", but it doesn't seem to work, so the time for 1 solution is approx.
Free J.G. Island has a time limit (30s), so some times are unknown.

(Read Only)

pid=26219

(10) Posted by Hauke Reddmann [Saturday, Sep 21, 2024 11:45]

Same problem, different solvers:

- MateMaster: Isn't built for that; "2%" after 2 hours or so, so expect a few days.
- LiChess Stockfish: Isn't built for that either, shows "#7" on level 99 in a few minutes.
- Hauke, intelligent mode: Isn't built for that anyway, but plays the key in a second
purely by instinct, then slowly sees why, would need a few minutes to sort details :-)

(Read Only)

pid=26220

(11) Posted by Marcin Sterkowiec [Saturday, Sep 21, 2024 14:56]

I have a feeling that we slightly moved out of our path - we took an arbitrary problem (from the 15 drawn) which is not consistent with the procedure intended to provide unbiased comparison. I hope we do want to reach this goal (i.e. a pretty good benchmark).

Another thing is that most probably (I may be wrong) Gustav is not solving in this case but taking the solution from database.
Of course, many solutions - including this one - are available immediately (e.g. by searching by FEN the following html: https://jgisland.pl/download/reports/moremovers_978g_vs_972o_byAuthor.html ) but this is rather not the point.

J.G.Island also has an option to immediately show the solution of all previously solved moremovers (#7+) - it is available by clicking either a small orange book located near Start button (if single solution available), or a small stack of books (if all solutions available).
(It's also worth noting that J.G.Island stores information about all solved problems in its log file - by default this log file is located in folder C:Program FilesJGIsland_ChessMoremovers)

Last but not least we should also be aware that parameters of CPU might have significant effect on solution time.
For example J.G.Island's default selection of threads adjusts to the number of processors/cores. Starting from 6 CPUs the default selection is 4^2 threads. Such selection may improve performance significantly.
Thus, we may also consider a separate benchmark for 4-core (older) and 8-core (more modern) CPUs - although it is debatable - as a matter of fact I think 8 cores/processors should be preferred (as anyone interested in speed would simply select a fast/modern machine).

(Read Only)

pid=26221

(12) Posted by Marcin Sterkowiec [Wednesday, Sep 25, 2024 19:10]

Maybe Olaf Jenkner could clarify on this? Because such an option to save and (re)use some data of already solved problems would indeed be valuable for users, but still for benchmarking should rather be off - we should assume that we are dealing with a new/unknown problem/position.

The only reasonable exception I can see are endgame tablebases.
J.G.Island - except for endgame tablebases - actually contains also custom tablebases (for positions like http://www.yacpdb.org/?id=89976 or https://pdb.dieschwalbe.de/search.jsp?expression=PROBID='P1200254') - their usage during benchmark test may be debatable. Anyway I would assume that any kind of tablebases that covers all combinations of movable pieces (while move/capture of any other one would prevent mate) should be OK for benchmarking.

(Read Only)

pid=26222

(13) Posted by Olaf Jenkner [Wednesday, Sep 25, 2024 23:38]

What has to be clarified?
Gustav does not (re)use some data of already solved problems.
Solved and stored problems are shown immediatly on a gray screen.

(Read Only)

pid=26223

(14) Posted by Marcin Sterkowiec [Thursday, Sep 26, 2024 10:14]

Thank you, Olaf. Now it's clear.

Could you maybe also confirm that the following solution time (0.2 sec for 'All solutions') looks correct as for a typical machine/CPU (parameters of benchmark machine not specified above by Viktoras) :
#7 '4n3/4K3/3BpNp1/1R1bP3/pPPk1pr1/2R3pQ/2PPr2n/q4Bb1' http://www.yacpdb.org/?id=306402
---------- All solutions ----------
1. Gustav: 0.20 s (+dual searching, automatic parameters, 8MB)

(I am asking because for me it looks much too fast as for the run without (re)using any data from previous analyses)

(Read Only)

pid=26224

(15) Posted by Viktoras Paliulionis [Thursday, Sep 26, 2024 19:32]

I used a typical PC with an Intel Core i7-8700 @ 3.20 GHz. Without optimization, the solution time with Gustav (64 MB) is 22.4 s, which is comparable to WinChloe (1GB, 1 thread) - 18.5 s.
Optimization significantly speeds up solving but does not ensure that all solutions and duals are found.

(Read Only)

pid=26225

(16) Posted by Olaf Jenkner [Thursday, Sep 26, 2024 19:40]

Yes, I get nearly the same computation time.
Automatic parameters are made for a quick solving. Sometimes cooks or duals are found with these parameters.
It is not a brute force test.

(Read Only)

pid=26226

(17) Posted by Marcin Sterkowiec [Thursday, Sep 26, 2024 22:55]

I have a few questions/doubts about "automatic parameters" in Gustav:
- is time used for finding of automatic parameters included in solving time?
- you write that "sometimes cooks or duals are found with these parameters" - at first glance the word "sometimes" seems to me to be in contradiction with "All solutions"; I would expect "always" (of course as far as duals in the first move are concerned, not late duals)

Is there maybe some brochure/information available that would explain more thoroughly the concept of "automatic parameters"? I must admit that my initial intuition is that it may be some "hash" of solution data (or a strong indication based on the known solution) and, as such, should rather be excluded from benchmarking procedure - please correct me, in case I am wrong here.

(Read Only)

pid=26227

(18) Posted by Olaf Jenkner [Thursday, Sep 26, 2024 23:46]

Of course soving time is the time elapsed between click at the solution button
and display of the solution.

"All solutions" are all solutions found with the same parameters like the first solution.

See the excerpt from the German manual:

7. Parameter für die Lösungssuche

7.1 Alle Lösungen / eine Lösung

Im linken Teil des Hauptfensters wird eingestellt, ob Gustav alle Lösungen
suchen soll oder nach der Ermittlung der ersten die Berechnung
beenden soll. Die Begrenzung auf eine Lösung kann unter Umständen
sehr viel Zeit sparen, weil sonst erst alle weißen Anfangszüge untersucht
werden, bevor die Varianten berechnet werden.
Der Abbruch der Berechnung im Modus alle Lösungen nach Auffinden
des Schlüsselzugs hat zur Folge, daß die weiteren Züge der Lösung
nicht angezeigt werden. Im Modus eine Lösung wird die Berechnung
sofort nach dem Finden des ersten Lösungszugs beendet. Zum Prüfen
von Schachproblemen müssen natürlich alle Lösungen gesucht werden.
Bei vollzügigen Varianten werden auch alle Duale ermittelt. Diese werden
aber nicht automatisch erkannt, sondern müssen anhand der Lösungsnotation
gesucht werden.

7.2 Zugzahl unbekannt

Wenn die Zugzahl unbekannt ist oder eine kürzere Lösung als die vom
Autor angegebene zu vermuten ist, kann man den Schalter Zugzahl unbekannt
aktivieren. Gustav rechnet dann die Aufgabe zuerst mit der
Zugzahl 1 und erhöht diese solange um 1, bis die Lösung gefunden wurde
oder die Zugzahl der Aufgabe erreicht ist.
Der Vorteil davon ist, daß die Berechnung wesentlich schneller geht,
falls die Lösung tatsächlich in weniger Zügen gefunden wird. Nachteilig
ist die etwas längere Berechnungsdauer in den Fällen, wo die Lösung
nicht vor der angegebenen Zugzahl gefunden wird. Außerdem ist zu beachten,
daß die Berechnung bei der Zugtiefe, wo die Lösung gefunden
wird, beendet wird. Längere Lösungen werden dann nicht gefunden.

Gustav zeigt jeweils die benötigte Rechenzeit für die vorangegangenen
Suchtiefen an. Damit ist eine grobe Abschätzung für den restlichen Zeitaufwand
möglich.

7.3 Lösungsparameter

Langzügige Schachaufgaben lassen sich meist nur in vernünftiger Zeit
lösen, wenn man nicht alle infrage kommenden weißen Züge untersucht.
Die Art der Einschränkungen werden durch die Lösungsparameter
festgelegt. Für Hilfsmatts bzw. Hilfspatts bietet Gustav außer dem
Mattfeld keine speziellen Berechnungsmodi an.

7.3.1 Automatische Parameter

Die Berechnung mit eingeschalteter Automatik ist die einfachste Art,
die Lösung eines Schachproblems zu finden. Gustav ermittelt selbständig
die Parameter, um möglichst schnell die Lösung zu erhalten. Die
Parameter, bei der die Lösung gefunden wurde, werden unter dem Lösungsfeld
angezeigt. Es ist zu beachten, daß unter der Einstellung alle
Lösungen weitere Lösungen nur gefunden werden, wenn die automatisch
ermittelten Parameter dafür ausreichen. Die Einstellung Zugzahl
unbekannt ist zusammen mit automatische Parameter nicht möglich.

7.3.2 Brute Force

Bei dieser Einstellung wird die Aufgabe ohne Einschränkungen berechnet,
weshalb hier die Lösezeit am größten ist. Eine 100-prozentige Prüfung
auf Korrektheit kann nur mit Brute Force erfolgen. Die meisten
Nebenlösungen werden jedoch bei relativ starken Einschränkungen gefunden.

7.3.3 Eigene Parameter

Durch geschickte Wahl dieser Einstellungen läßt sich die Lösungssuche
erheblich beschleunigen. Gustav schränkt den Suchbaum ein, indem nur
weiße Züge untersucht werden, die bestimmte Kriterien erfüllen. Dazu
gibt es 7 Möglichkeiten:

1. Begrenzung der Fluchtfelder des schwarzen Königs:

Wenn der schwarze König nach dem weißen Zug mehr Felder betreten
könnte als im Eingabefeld Fluchtfelder festgelegt sind, wird
dieser Zug nicht untersucht.

2. Betrachtung von Zügen, die forciert Matt drohen:

Im Eingabefeld Drohtiefe kann man angeben, in wieviel Zügen
Weiß Matt bzw. Patt drohen muß. Dabei gelten nur Zugfolgen als
Drohung, bei denen Weiß ständig Schach gibt. Der Wert 0 deaktiviert
diese Einschränkung.

3. Begrenzung der schwarzen Gegenzüge:

Im Eingabefeld maximale Zugzahl läßt sich festlegen, wieviele
Züge Schwarz nach einem weißen Zug ausführen darf. Das bedeutet,
daß weiße Züge, die dem Schwarzen zu viel Züge lassen,
gar nicht erst untersucht werden. Für diese starke Einschränkung
kann man eine bestimmte Zahl von Ausnahmen festlegen. Außerdem
kann man angeben, ab welcher Zugnummer und bis zu welcher
Zugnummer die maximale Zugzahl gelten soll.

4. Festlegen des Mattfeldes:

Wenn klar ist, auf welchem Feld, welcher Zeile oder Reihe der König
mattgesetzt wird, kann man durch die Vorgabe des Mattfeldes
die Lösungssuche beschleunigen.

5. Festlegen der Figuren, in die Weiß umwandeln kann:

Falls die Umwandlung in bestimmte Figuren nicht sinnvoll erscheint,
kann man diese ausschließen.
6. Festlegen der Figuren, die Weiß schlagen darf:
Wenn bei einem Selbstmatt/-Patt das Matt/Patt von einer bestimmten
Figur zu erwarten ist, kann man festlegen, daß diese Figur
nicht geschlagen werden darf. Das erspart unnötigen Rechenauf-
wand für Stellungen ohne diese Figur.

7. Matt möglich durch Umwandlungsfigur:

Wenn bei einem Selbstmatt Schwarz nur noch Bauern hat, welche
alle auf der zweiten Reihe stehen, ist nur noch ein Matt durch eine
Umwandlungsfigur möglich. Da dies sehr unwahrscheinlich ist,
sucht Gustav bei deaktiviertem Parameter in solchen Positionen
nicht weiter.

8. Höchstzahl der weißen Steine:

Der Parameter steht nur für Selbstpatts zur Verfügung. Er gibt
an, wieviel Steine Weiß in der Pattstellung noch haben darf. Das
spart Zeit, weil schon mehrere Züge vor dem Ende erkannt wird,
daß kein solches Patt mehr möglich ist.Wenn diese Einschränkung
nicht gewünscht ist, wird 0 eingetragen.
Diese 8 Möglichkeiten zur Einschränkung der zu untersuchenden weißen
Züge lassen sich beliebig kombinieren.

(Read Only)

pid=26228

No more posts

MatPlus.Net

Forum

General

Creation of benchmark(s) for software solving orthodox moremovers