Dear chess friends,
The first release of the CGR database is available! Check here!
But I have a few questions for you. Ideally, if we want to discuss this together, the easiest way would be to join the Yahoo! group dedicated to the CGR.
Do you prefer the current filter on games (this database only includes games where BOTH players have a 2200+ Elo) or you’d prefer something like “all games where at least one of the players has 2200+ Elo” ? I have found MANY games where one of the players was 2200+ but played a much weaker opponent (e.g. simultaneous exhibition, historical games or games against “famous” rated players, many games played at the Chess Olympiads where the weaker player was from a small country and rated below 2000, etc). Do you really need a Karpov’s game played against a 10 year-old kid rated 1281 at a simultaneous exhibition???
Do you want games played by computer chess programs? I was thinking of adding the WhiteIsComp/BlackIsComp tag when it was missing to clearly identify those games/players. What do you think? That way you could filter out those games if you don’t want them.
I have kept every optional/non-standard tag (not defined in the Seven Tag Roster) of the original games processed. I think the more info you have, the better. Perhaps you’d be interested to know that this “new move” of Carlsen was played in a 5-minute game?
I have cleaned all entries with the “(wh)” or “(bl)” suffix in the White/Black tags. This info was idiotic/useless as the player’s color is already identified by the PGN tag! Do you know of any other idiotic/useless/annoyance you’ve seen in PGN files?
I have not included games with no termination as they usually indicate a correspondence game not finished (or in progress). Can you think of any other reason as to why a game would end with no result? Even death of a player has to have a result in chess games!
It’s been known for a while, games from the CCRL (Computer Chess Rating Lists) use a “custom” round number and many chess database programs don’t seem to like it. Besides, importing CCRL games will often cause the famous “Round Name limit of 262143 exceeded” error in Scid or Scid vs PC. So I have decided to replace the round number in the CCRL games by the default value of “?”. Does any one have a problem with this? Do you have any idea/suggestion/comment on this?
Another options I wasn’t sure about is how to package the releases. Do you prefer games split in smaller PGN files (like ECO classification, e.g A, B, C, D, E) or just one big PGN file? You have to take into consideration that soon (I hope), we will hit the 16 million games limit in Scid… For those who use other chess database software, are there similar limits? Do you want the database in multiple Zip files or just one Zip file?
Finally, as a side note, the next release will probably be another FULL one as I have another 83G of PGN games ready ! I have kept the 206G that made it into this first release as I am trying to automate the whole process and cut manual intervention to the minimum. I’ll keep you posted on how it goes once I add an extra 83G next time and build the database from the raw files once again!!
If you have a minute or two, send me an email and tell me what you want, what you like, what you don’t like, your ideas, suggestions, even complaints! Especially complaints! Even better, join the Yahoo! group!
Please, also take a few seconds to answer my poll. The database is released in PGN format right now for drive space/bandwidth reasons and also because every database can read this format. But if 99% of you are using Scid, I could well end up releasing the database in Scid format!
And if you have an extra 5 minutes, send me any games your have collected since January 1st 2017 (don’t bother sending me files from any website on Lars Balzer’ list, I have collected 90% of them already) ! Read the instructions here.
The more I will get new games, the bigger this database will get. And the faster it will grow!