Chess Database Analysis
Brief Analysis of Personal Chess Database
A similar methodology to that which is used in this report is employed in the book titled “Chess Two New Rating Systems: The 2022 Best Players in History (English Edition)” available on Amazon.com by Hindemburg Melão Jr. He also found a strong relationship between “hit count” in the sense found in this report and chess engine ELO strength.
This is based on an analysis of 84 chess games of mine on chess.com from recent months. I analyzed all the games from move 5 onward (this produces 2,569 positions) with a very strong stockfish engine on its maximum strength setting with a large analysis depth and generous time limit to find the "perfect" move in each position of each game. Then I analyzed the hit count of each of these lesser strength engines and then applied optimized weights to the hits (1st best move gets a 1, 2nd best move gets a 1, third best move gets a 0.55). In this particular analysis I suppressed the performance of the stockfish 1950 - 2050 rated engines (three engines) by reducing the depth of their analysis of each move and I removed the highest rated stockfish engine (3644 ELO) because it wasn't performing as well as it should. This rating estimate for me based on my personal hit count (adjusted with weights) is approximately what my rating is right now on chess.com. So this suggests that my performance is reflected accurately in my rating. I am not performing at a much higher level than my rating suggests, or lower level. Extremely interesting is when I run the analysis on only moves 10-14. My rating is very low. I don't tend to make the right moves at this stage of the game. I need to work on that.
Here is the optimized graph for moves 10-14 only (inclusive). As can be seen, my rating is lower than the lowest rated engine. Suggesting I am very poor at selecting the "best" move at this stage of the game.
Using SQL I applied a filter to my games to see if it made any difference in how long the game lasted on my win percentage. It turns out it often seems to, but actually doesn't matter. However, since I found some interesting results in SQL I decided to do a more thorough analysis using Python and a statistical significance test called Chi-Square Test for Independence. This is a good test for determining if there is indeed a relationship between categorical variables or not. So I got a little help to find the syntax so I could perform the tests and I considered that just by checking many ranges like this I could as a result find a significant result purely by chance. So I applied a couple correction factors (Benjamini-Hochberg method or FDR - false discovery rate correction and the more strict Bonferroni correction) to account for this random chance and ensure I am not falsely claiming significance where there is none. This is what I found:
Essentially, there is no meaningful relationship between the length of the game and my percentage of wins, even though there appears to be a relationship there, there is none, or at least I do not yet have sufficient data collected to prove mathematically that there is a relationship between length of game and win-rate.
This was the output of some SQL commands. I have already demonstrated that there is no significant relationship so although this looks meaningful, it is not yet. Once I have played more games and analyzed more games I will perhaps have a meaningful conclusion from this seemingly apparent relationship. Hopefully by the time I play more games I will stop making foolish errors in the opening and losing the game and so there never will be a relationship between game-length and win-rate.
Upon analyzing my most recent chess games, I got a slightly higher estimate of my rating (consistent with my increased rating on chess.com and my efforts to improve my performance at chess). The only problem I found was that I was getting a very narrow range of performance out of the engines I was using to make the estimate, which is not ideal. It would be nice to have a wide range of performances from a wide range of engine strengths instead of a rating range of 1000 points corresponding to a hit proportion score of 0.6 to 0.7 only. I made many efforts to try to achieve this, using node limits on the engines, using specific rating limits, using time limits, using depth limits, and combinations of the previously mentioned limits. All of these efforts failed and so I decided to switch my measurement method.
I decided to use a different metric called "average centipawn loss" to estimate my performance rating. This is a widely accepted metric in the chess community and not nearly as controversial of a method to use for estimating one's performance. I utilized Google Gemini to help build a script to analyze my games using the Stockfish engine for centipawn loss (CPL) and calculate an output file with my results. This is what resulted:
ACPL (Average Centipawn Loss) Report for: Desjardins373
Calculated ACPL: 54.09
Analyzed Moves: 3683
Estimated Skill Level: Expert / Club Player Level
What is ACPL?
Average Centipawn Loss (ACPL) is a metric used to measure the accuracy of chess moves. A "centipawn" is 1/100th of a pawn. ACPL represents the average number of centipawns you lost per move compared to the engine's best move. A lower ACPL means your moves were closer to the computer's top choices, indicating higher accuracy.
Practical ACPL Tiers (Perspective):
These ranges are approximations, particularly for games with faster time controls, but they provide a useful reference point for self-assessment.
- 10-20: Super Grandmaster (2700+ Elo): World-class precision, frequently dropping into single digits in top-level matches.
- 20-25: Grandmaster (2500-2700 Elo): Exceptional consistency and a low frequency of significant errors.
- 25-35: Master (2200-2400 Elo): Strong, professional-level play, indicating a solid performance.
- 30-60: Expert / Club Player (1800-2200 Elo): Represents strong amateur and tournament players. Averages in the 50s and 60s are common, especially in faster time controls.
- 60-120: Intermediate Player (1200-1800 Elo): A typical range for the large cohort of intermediate players. Games are often decided by more obvious tactical mistakes.
- 100+: Novice / Beginner (<1200 Elo): Fundamental blunders are common. Scores of 150 or higher are commonplace. An ACPL over 300 suggests a player is very new to the game.
Note: These ranges are approximate and can be affected by the complexity of the games, time control, and the depth of the analysis engine.
So, in contrast to many claims by friends and family that I am "...like a chess master" the true measurements don't bear this out, I fall significantly short of playing at this level of skill.
Scripts used for this analysis
I hope you enjoyed this analysis. Please feel free to send any comments, suggestions, thoughts, criticisms or insights to andyhayles@gmail.com.