Skip to main content

Hold ’Em or Fold ’Em? This A.I. Bluffs With the Best

Pluribus, a poker-playing algorithm, can beat the world’s top human players, proving that machines, too, can master our mind games.

Artificial intelligence has come a long way since 1979, when this card-playing robot pulled a straight flush. ,David Cooper/Toronto Star, via Getty Images

In his 14 years on the professional poker circuit, Darren Elias had never faced anyone who played with so little fear.

A typical poker player, when dealt two Jacks — one faceup, the other hidden, a hand neither good nor bad — would proceed with caution. But not Mr. Elias’s opponent, who seemed to know exactly what to do. Even when Mr. Elias decided to bluff, betting as if he held a strong hand, his opponent effectively called him on it: charging ahead, matching each bet with what seemed to be complete confidence, and winning.

Even more remarkable: This opponent was a machine.

The automated poker player, called Pluribus, was designed by researchers at Carnegie Mellon University in Pittsburgh and the Facebook artificial intelligence lab in New York City. In a paper published on Thursday in Science, and with the World Series of Poker underway in Las Vegas, the researchers described how Pluribus recently bested Mr. Elias and several other elite professionals in a multiplayer game of no-limit Texas Hold ’Em, the most popular and complex form of poker.

The achievement marks another notable milestone in the progress of artificial intelligence. Over the past 30 years, researchers have built systems that beat the best players at checkers, chess, Go, even Jeopardy. But unlike these games, poker is based on hidden information. Each player holds cards that opponents can’t see. The best players must master ways of uncovering what their opponents are hiding, while keeping their own secrets safe.

Darren Elias, a four-time World Poker Tour champion. “Pure numbers and percentages,” he said of Pluribus, the new A.I. poker robot. “It is solving the game itself.”Credit Facebook AI Research

As Mr. Elias realized, Pluribus knew when to bluff, when to call someone else’s bluff and when to vary its behavior so that other players couldn’t pinpoint its strategy. “It does all the things the best players in the world do,” said Mr. Elias, 32, who has won a record four titles on the World Poker Tour. “And it does a few things humans have a hard time doing.”

Experts believe the techniques that drive this and similar systems could be used in Wall Street trading, auctions, political negotiations and cybersecurity, activities that, like poker, involve hidden information. “You don’t always know the state of the real world,” said Noam Brown, the Facebook researcher who oversaw the Pluribus project.

If you like this article, please sign up for Snapshot, Portside's daily summary.

(One summary e-mail a day, you can change anytime, and Portside is always free.)

Two years ago, Mr. Brown and his collaborator, Tuomas Sandholm, a computer scientist at Carnegie Mellon, built a system that could beat top professionals in one-on-one games of Texas Hold Em. Pluribus has extended the feat to multiplayer poker, a far more complex problem.

Other A.I. systems have managed to solve complex games that involve at least some hidden information. Two top artificial intelligence labs, for instance, have built systems that can beat the world’s best players at three-dimensional video games like Dota 2, Quake and StarCraft. But these systems did not have to compete with multiple players at once.

Pluribus learned the nuances of Texas Hold ’Em by playing trillions of hands against itself. After each hand was done, it would evaluate each decision, determining whether a different choice would have produced a better result.

Mr. Brown called this process “counterfactual regret minimization,” and compared it to the way humans learn the game. “One player will ask another, What would you have done if I had raised here instead of called?”

Unlike systems that can master three-dimensional video games like Dota and StarCraft — systems that need weeks or even months to train to play against humans — Pluribus trained for only about eight days on a fairly ordinary computer at a cost of about $150. The hard part was creating the detailed algorithm that analyzed the results of each decision. “We’re not using much computing power,” Mr. Brown said. “We can cope with hidden information in a very particular way.”

In the end, Pluribus learned to apply complex strategies, including bluffing and random behavior, in real time. Then, when playing against human opponents, it would refine these strategies by looking ahead to possible outcomes, as a chess player might. This spring, the researchers tested the system in games in which a single human professional played against five separate instances of Pluribus.

In that format, Mr. Elias was unimpressed. “You could find holes in the way it played,” he said; among other bad habits, Pluribus tended to bluff too often. But after taking suggestions from him and other players, the researchers modified and retrained the system. In subsequent games against top professionals, Mr. Elias said, the system seemed to have reached superhuman levels.

The system did not play for real money. But if the chips had been valued at a dollar apiece, Pluribus would have won about $1,000 an hour against its elite opponents. “At this point, you couldn’t find any holes,” Mr. Elias said.

All the matches were played online, so the system was not deciphering the emotions or physical “tells” of its human opponents. The success of Pluribus showed that poker can be boiled down to nothing but math, Mr. Elias said: “Pure numbers and percentages. It is solving the game itself.”

Will the same prove true beyond the poker table? Perhaps. Michael Wellman, a professor of computer science and engineering at the University of Michigan who specializes in A.I. poker, is working to apply similar methods to cybersecurity, which is often a cat-and-mouse game along the lines of poker.

“The attacker and the defender have limited knowledge of what each other is doing,” Dr. Wellman said. “They are playing games with each other.”

Of course, the real world is even more complex than a game of no-limit Texas Hold ’Em. Cybersecurity is not governed by clear-cut rules and point systems. As researchers struggle to cope with the added complexity, it is unclear whether these techniques will actually work in the chaos of reality.

“This is a promise that is yet to be kept,” Dr. Wellman said.


Cade Metz is a technology correspondent with The New York Times, covering artificial intelligence, driverless cars, robotics, virtual reality, and other emerging areas. Previously, he was a senior staff writer with Wired magazine. He recently signed with Penguin Dutton in the United States and Random House in the United Kingdom to write a non-fiction narrative about the tiny clan of A.I visionaries who are rapidly changing our world.