Libratus Poker Paper

Posted By admin On 30/03/22
Libratus Poker Paper Rating: 7,6/10 1261 votes

Time to pack it in. The only hurdle left for computers to overcome in order to take over the Earth is sentience. This week, the Libratus Poker AI, developed by Tuomas Sandholm and Noam Brown at Carnegie Mellon University, completed a total destruction of four professional poker players in heads-up No-Limit Hold’em. This comes just two years a previous iteration of the AI – Claudico – lost a similar contest.

The four players taking on the electronic card sharp were Jimmy Chou, Dong Kim, Jason Les, and Daniel McAulay. Each played 30,000 hands against the computer from January 11th through January 30th.

In order to weed out some of the luck involved with heads-up No-Limit Hold’em, a few special rules were implemented. First, the players and the AI were given 20,000 chips at the start of each hand. Blinds were 50/100. By resetting the chip stacks every hand, players – living or otherwise – had plenty of room to make plays and good runs by a player couldn’t snowball into a big stack versus small stack scenario.

Ever wondered how Libratus, the celebrated poker playing (and winning) AI software from Carnegie Mellon University, outsmarts its opponents? Turns out Libratus uses a three-pronged strategy which its inventors share in a paper published online yesterday in Science – Superhuman AI for heads-up no-limit poker: Libratus beats top professionals.

  1. In this paper we introduce Libratus (12), an AI that takes a distinct approach to addressing imperfect-information games. In a 20-day, 120,000-hand competition featuring a $200,000 prize pool, it.
  2. The players lost to Libratus in a Heads-up poker game, after a total of 120,000 hands altogether. Libratus was leading in a collective chip count of $1,766,250. Dong Kim, Jimmy Chou, Daniel McAulay and Jason Les were the pros who participated in the event.
  3. After a 20 day marathon challenge at the Rivers Casino in Pittsburgh, Pennsylvania, the results of the Brains Vs.Artificial Intelligence: Upping the Ante shocked the whole poker community when the Artificial Intelligence (AI) “Libratus” beat 4 of the best poker players in the world heads-up in No Limit Texas Holdem.
  4. Libratus: The Superhuman AI for No-Limit Poker (Demonstration) Noam Brown Computer Science Department Carnegie Mellon University noamb@cs.cmu.edu Tuomas Sandholm Computer Science Department Carnegie Mellon University Strategic Machine, Inc. Sandholm@cs.cmu.edu Abstract No-limit Texas Hold’em is the most popular vari-ant of poker in the world.

Additionally, hands were mirrored, meaning that pairs of players received reversed hands. For example, if Chou was dealt 2-7 offsuit and Libratus got pocket Queens in one hand, McAulay would be dealt Queens and the AI would get 2-7 in a mirrored hand. This way, hands were distributed evenly (though deal was not predetermined), so neither the humans nor Libratus could benefit from getting a sick run of cards.

Finally, once players were all-in with a call before the river, no more cards were dealt. In these situations, the winning probabilities were calculated and players received a percentage of the pot corresponding to their equity in the hand. One would think that this would mean that there would be fewer chances taken on all-in calls, but at the same time it also meant that nobody could get lucky and suck-out on an all-in.

Essentially, the purpose of all the special rules was to remove as much chance as possible so that skill could be more accurately measured.

When the contest was over, it was…no contest. Libratus crushed, winning a total of $1,766,250 from the four players. That’s $14.72 per hand. Dong Kim did the best – or least poorly – losing $85,649. Following him were McAuley with a $277,657 loss, Chou with a $522,857 loss, and Les with an $880,087 loss.

Libratus Poker Paper Plate

According to PokerListings.com, this wasn’t dumb luck for Libratus. The probability of the four men outplaying the computer yet still losing that much money is between 0.0001 and 0.54 percent.

Libratus is powered by a $9.65 million supercomputer called “Bridges.” It has trained itself, playing billions of hands to tweak its strategy.

That strategy? Well, of course we don’t know what it is, exactly. At the same time, there is no specific, set strategy that the computer uses. Instead, it examines the specific scenario it is facing in a hand and makes one of several moves. For example, there might be a situation where Libratus has 4-5 pre-flop and its opponent made a min-raise. In that case, it might be programmed to re-raise by a certain amount half the time, re-raise by another amount 25 percent of the time, call 15 percent of the time, and fold 10 percent of the time. After each match, it analyzed the results and adjusted its strategy to try to find the optimal line.

Libratus Poker Paper

Poker-playing AIs typically perform well against human opponents when the play is limited to just two players. Now Carnegie Mellon University and Facebook AI research scientists have raised the bar even further with an AI dubbed Pluribus, which took on 15 professional human players in six-player no-limit Texas Hold 'em and won. The researchers describe how they achieved this feat in a new paper in Science.

Playing more than 5,000 hands each time, five copies of the AI took on two top professional players: Chris 'Jesus' Ferguson, six-time winner of World Series of Poker events, and Darren Elias, who currently holds the record for most World Poker Tour titles. Pluribus defeated them both. It did the same in a second experiment, in which Pluribus played five pros at a time, from a pool of 13 human players, for 10,000 hands.

Co-author Tuomas Sandholm of Carnegie Mellon University has been grappling with the unique challenges poker poses for AI for the last 16 years. No-Limit Texas Hold 'em is a so-called 'imperfect information' game, since there are hidden cards (held by one's opponents in the hand) and no restrictions on the size of the bet one can make. By contrast, with chess and Go, the status of the playing board and all the pieces are known by all the players. Poker players can (and do) bluff on occasion, so it's also a game of misleading information.

Libratus Poker Paperwork

Libratus poker paper.io

Claudico begat Libratus

In 2015, Sandholm's early version of a poker-playing AI, called Claudico, took on four professional players in heads-up Texas Hold 'em—where there are only two players in the hand—at a Brains vs. Artificial Intelligence tournament at the Rivers Casino in Pittsburgh. After 80,000 hands played over two weeks, Claudico didn't quite meet the statistical threshold for declaring victory: the margin must be large enough that there is 99.98% certainty that the AI's victory is not due to chance.

Sandholm et al. followed up in 2017 with another AI, dubbed Libratus. This time, rather than focusing on exploiting its opponents' mistakes, the AI focused on improving its own play–apparently a more reliable approach. 'We looked at fixing holes in our own strategy because it makes our own play safer and safer,' Sandholm told IEEE Spectrum at the time. 'When you exploit opponents, you open yourself up to exploitation more and more.' The researchers also upped the number of games played to 120,000.

Advertisement

The AI prevailed, even though the four human players tried to conspire against it, coordinating on making strange bet sizes to confuse Libratus. As Ars' Sam Machkovech wrote at the time, 'Libratus emerged victorious after 120,000 combined hands of poker played against four human online-poker pros. Libratus' $1.7 million margin of victory, combined with so many hands, clears the primary bar: victory with statistical significance.'

But Libratus was still playing against one other player in heads-up action. A far more challenging conundrum is playing poker with multiple players. So Pluribus builds on that earlier work with Libratus, with a few key innovations to allow it to come up with winning strategies in multiplayer games.

Sandholm and his former graduate student, Noam Brown—who is now working on his PhD with the Facebook Artificial Intelligence Research (FAIR) group—employed 'action abstraction' and 'information abstraction' approaches to reduce how many different actions the AI must consider when devising its strategy. Whenever Pluribus reaches a point in the game when it must act, it forms a subgame—a representation that provides a finer-grained abstraction of the real game, akin to a blueprint, according to Sandholm.

'It goes back a few actions and does a type of game theoretical reasoning,' he said. Each time, Pluribus must come up with four continuation strategies for each of the five human players via a new limited-lookahead search algorithm. This comes out to 'four to the power of six million different continuation strategies overall,' per Sandholm.

Like Libratus, Pluribus does not use poker-specific algorithms; it simply learns the rules of this imperfect information game and then plays against itself to devise its own winning strategy. So Pluribus figured out on its own it was best to devise a mixed strategy of play and being unpredictable—the conventional wisdom among today's top human players. 'We didn't even say, 'The strategy should be randomized,' said Sandholm. 'The algorithm automatically figured out that it should be randomized, and in what way, and with what probabilities in what situations.'

No limping

Pluribus actually confirmed one bit of conventional poker-playing wisdom: it's just not a good idea to 'limp' into a hand, that is, calling the big blind rather than folding or raising. The exception, of course, is if you're in the small blind, when mere calling costs you half as much as the other players. But while human players typically avoid so-called 'donk betting'—in which a player ends one round with a call but starts the next round with a bet—Pluribus placed donk bets far more often than its human opponents.

Advertisement

So, 'In some ways, Pluribus plays the same way as the humans,' said Sandholm. 'In other ways, it plays completely Martian strategies.' Specifically, Pluribus makes unusual bet sizes and is better at randomization.

'Its major strength is its ability to use mixed strategies,' said Elias. 'That's the same thing that humans try to do. It's a matter of execution for humans—to do this in a perfectly random way and to do so consistently. Most people just can't.'

“These AIs have really shown there’s a whole additional depth to the game that humans haven’t understood.”

'It was incredibly fascinating getting to play against the poker bot and seeing some of the strategies it chose,' said Michael 'Gags' Gagliano, another participating poker player. 'There were several plays that humans simply are not making at all, especially relating to its bet sizing. Bots/AI are an important part in the evolution of poker, and it was amazing to have first-hand experience in this large step toward the future.'

This type of AI could be used to design drugs to take on antibiotic-resistant bacteria, for instance, or to improve cybersecurity or military robotic systems. Sandholm cites multi-party negotiation or pricing—such as Amazon, Walmart, and Target trying to come up with the most competitive pricing against each other—as a specific application. Optimal media spending for political campaigns is another example, as well as auction bidding strategies. Sandholm has already licensed much of the poker technology developed in his lab to two startups: Strategic Machine and Strategy Robot. The first startup is interested in gaming and other entertainment applications; Strategy Robot's focus is on defense and intelligence applications.

Potential for fraud

Libratus Poker Paper Towel Holder

When Libratus beat human players in 2017, there were concerns about whether poker could still be considered a skill-based game and whether online games in particular would soon be dominated by disguised bots. Some took heart in the fact that Libratus needed major supercomputer hardware to analyze its game play and figure out how to improve its play: 15 million core hours and 1,400 CPU cores during live play. But Pluribus needs much less processing capability, completing its blueprint strategy in eight days using just 12,400 core hours and 28 cores during live play.

So is this the death knell for skill-based poker? Well, the algorithm was so successful that the researchers have decided not to release its code. 'It could be very dangerous for the poker community,' Brown told Technology Review.

Sandholm acknowledges the risk of sophisticated bots swarming online poker forums, but destroying poker was never his aim, and he still thinks it's a game of skill. 'I have come to love the game, because these AIs have really shown there's a whole additional depth to the game that humans haven't understood, even brilliant professional players who have played millions of hands,' he said. 'So I'm hoping this will contribute to the excitement of poker as a recreational game.'

Libratus Poker Paper.io

DOI: Science, 2019. 10.1126/science.aay2400 (About DOIs).

Listing image by Steve Grayson/WireImage/Getty Images