Prepare for defeat against Transformers on Lichess chess platform
In a groundbreaking study, researchers have successfully trained large transformer models to play chess without relying on memorization or explicit search, marking a significant shift in the field of artificial intelligence.
The study, based on a dataset of 10 million human chess games known as ChessBench, used a combination of supervised fine-tuning and reinforcement learning (RL) techniques to train the transformer models. The models were rewarded with either a sparse reward (a binary indicator if the predicted move matched the optimal move) or a dense reward, a continuous score from an expert critic network that estimates the win probability after making a move.
The base Transformer model was first fine-tuned on large datasets of chess games with Stockfish annotations, learning to predict moves. Reinforcement learning fine-tuning was then applied using Gibbs Reward Policy Optimization (GRPO). The critic network, another Transformer trained on 15 billion Stockfish-annotated state-action pairs, provided an expert "teacher" signal that guided the student model during RL, a form of knowledge distillation.
The transformer-based models achieved nearly grandmaster-level ratings, especially in action-value prediction. They almost matched AlphaZero and Stockfish without using search during play. The models could handle novel board positions, proving that the transformer learned strategy rather than relying on memorized moves.
However, despite strong performance, the models still fell short of engines like Stockfish when making quick, tactical moves. The study suggests that architectural innovations or more diverse datasets could help close the performance gap.
The potential applications of this technology extend beyond games to real-world planning scenarios where generalization and adaptability are crucial. The study shows that large-scale transformers can generalize complex strategies in chess, and it reveals how far transformers can go in mastering chess through generalization rather than memorization.
The study also indicates that limitations in pretrained LLMs' chess skill mainly arise from weaker internal reasoning rather than a lack of search, suggesting focused training can boost reasoning ability. Incorporating the expert critic’s dense reward vastly improves the model's ability to play well, enabling the model to make better suboptimal moves rather than blindly mimic.
In summary, the training of large Transformers for chess without memorization or explicit search uses reinforcement learning with a dense, engine-derived evaluation reward, enabling the model to learn strategic reasoning internally. The ChessBench study highlights that such models can approximate expert-level strategic play through careful reward engineering and knowledge distillation from expert critics, signaling a shift away from traditional search-based chess AI paradigms.
[1] Reference: [Insert the actual reference here if available]
Artificial intelligence, specifically through the use of transformer models, has been shown to play chess at nearly grandmaster level, using a combination of supervised fine-tuning, reinforcement learning, and knowledge distillation from an expert critic network. This demonstrates the potential of technology to master complex strategies by generalization rather than memorization, with implications reaching beyond games to real-world planning scenarios.