Meta’s Cicero chatbot can probably beat you at Diplomacy • The Register
Meta-researchers have developed an artificial intelligence system called Cicero that can play the classic strategy game Diplomacy at a level comparable to most human players.
It’s a major achievement in natural language processing, and one that could help people forget last week’s debut of Galactica, a large language model that meta-experts have been training with scientific papers presented untruths as facts and was taken offline after three days criticism from science.
Developed in the 1950s and currently published by Hasbro, Diplomacy focuses on communication and negotiation between players assuming the roles of seven European powers at the beginning of the 20th century. It is considered an ideal way to lose friends by some players.
The game simulates the capture of territories on a map of Europe. Instead of taking turns, players write down their moves in advance and execute them simultaneously. To avoid blocking moves because an opponent has made a countermove, players communicate privately with each other. They discuss possible coordinated actions and then chart their moves, making or breaking commitments to other players.
Diplomacy’s focus on communication, trust, and betrayal makes it a different challenge from games that focus more on rules and resources like Chess and Go. Cicero is essentially a chatbot that can negotiate with other Diplomacy players to make effective plays.
Screenshot of Cicero dialogue – click to enlarge
“For decades, diplomacy has been viewed as a near-impossible grand challenge in AI because it requires players to master the art of understanding other people’s motivations and perspectives, making complex plans and adjusting strategies, and then using natural language to make agreements with other people, convince them to form partnerships and alliances, and more,” Meta explained in a blog post.
“Cicero is so effective at using natural language to negotiate with people in diplomacy that they often prefer working with Cicero over other human participants.”
Cicero is based on a BART-like language model with 2.7 billion parameters, pre-trained with text from the web and augmented with a dataset of more than 40,000 Diplomacy games played online on webDiplomacy.net. These games contained more than 12 million messages exchanged between players.
The AI agent’s dialogue output is tied to its strategic reasoning module, which creates “intents” that represent a possible set of moves by the various players.
“To generate the intentions for the dialogue and to select the final actions for each turn, Cicero runs a strategic reasoning engine that predicts other players’ policies (i.e., a probability distribution over actions) for the current turn based on the state of the board and the mutual dialogue, and then chooses a policy for the current turn that best responds to the predicted policies of the other players,” the meta-researchers explain in a Science Research article.
While AI agents for games like chess can be trained through self-play using reinforcement learning, modeling Diplomacy’s cooperative play required a different technique. According to Meta, the classic approach would involve supervised learning, through which an agent would be trained using labeled data from previous diplomacy games. But supervised learning alone produced a gullible AI agent that was easily manipulated by lying players.
As such, Cicero includes an iterative planning algorithm called piKL, which is used to refine an initial prediction of other players’ policies and planned moves based on dialogue between the bot and other players. The algorithm tries to improve expected sets of moves for other players by evaluating different choices that would lead to better outcomes.
In a statement, world champion diplomacy Andrew Groff praised Cicero’s dispassionate approach to the game. “Many human players will soften their approach or be motivated by revenge, but Cicero never does,” Groff said. “It just plays the situation as it sees it. So it’s reckless in executing its strategy, but it’s not reckless in a way that annoys other players.”
Cicero anonymously played 40 games of Diplomacy in a “Blitz” league on webDiplomacy.net between August 19 and October 13, 2022, finishing in the top 10 percent of participants who played more than one game. And among the 19 who played five or more games, Cicero finished second. For all 40 games, Cicero’s median score was 25.8 percent, more than double the 12.4 percent average among his 82 opponents.
While Cicero still makes some mistakes, Meta’s experts believe their research will prove useful for other applications, like chatbots that can have long-running conversations, or video game characters that understand player motivations and become more effective as a result can interact.
Cicero’s code was released under an open-source license in hopes that the AI developer community could continue to improve it.
https://www.theregister.com/2022/11/23/metas_cicero_chatbot_can_probably/ Meta’s Cicero chatbot can probably beat you at Diplomacy • The Register