AI Excels at Learning Pokémon Red in 50,000 Hours

Almost ten years ago, the online phenomenon “Twitch Plays Pokémon” brought together over a million people to play Pokémon Red at the same time, with each player’s keystrokes registering as commands for the single pixelated avatar. Like a Magikarp evolving into a Gyarados, the evolution of technology raises a new question: can AI play Pokémon Red?

How does the AI learn to play Pokémon Red?

For the past few years, Seattle-based software engineer Peter Whidden has been training a reinforcement learning algorithm to navigate the classic first game in the Pokémon series — in that time, the AI has played more than 50,000 hours of the game. Whidden posted a 33-minute YouTube video about the AI’s development, which has over 2.6 million views after nine days

“What’s been super fun to see is how many people are engaging with it,” Whidden said to TechCrunch. He shared the code he used on GitHub, along with instructions for running and training the AI. “There’s a ton of people that seem really interested in actually doing this process of creating or designing.” One fan was able to use his code on Pokémon Crystal, another Game Boy classic.

The AI’s reinforcement model is Pavlovian, with point-based rewards for leveling up Pokémon, exploring new areas, winning battles, and defeating gym leaders. These incentives don’t always perfectly align with game progression, but the AI’s failures are oddly charming, which is probably why Whidden’s video went viral.

What are the challenges, failures, and successes of AI?

In one of the AI’s attempts, it simply stands in Pallet Town — the first place you visit in the game — and stares at the water. It becomes stuck in an area with animated water, grass, and NPCs who pace back and forth, implying that every individual frame appears to the AI to be a novel experience, despite the fact that it is just sitting motionless without even catching its first Pokémon.

However, this AI is not in a hurry to “catch ’em all.” It’s simply enjoying the beauty of the Kanto region (or perhaps it’s taking an ethical stance against forcing these adorable little animals to fight each other… who knows).

“So, according to our own objective, just hanging out and admiring the scenery is more rewarding than exploring the rest of the world,” Whidden says in the video. “This is a paradox that we encounter in real life: curiosity leads us to our most important discoveries, but at the same time, it makes us vulnerable to distractions and gets us into trouble.”

The AI continues to tug at our heartstrings: later, it goes through a traumatic event at the Pokémon Center. The success of the AI is determined in part by the total level of all Pokémon in your party. However, when an AI visits the Pokémon Center and smashes enough buttons to place a Pokémon in storage, the sum of all levels drops dramatically, sending a strong negative signal to the AI.

“It doesn’t have emotions like a human does, but a single event with an extreme reward value can still leave a lasting impact on its behavior,” Whidden goes on to explain. “In this case, losing its Pokémon only one time is enough to form a negative association with the whole Pokémon Center, and the AI will avoid it entirely in all future games.”

Image Source: Peter Whidden on YouTube

What are the implications and future directions of this experiment?

Despite the AI’s ability to feel trauma and admire Pallet Town’s pretty pixels, it’s still just a computer. Because this AI is unable to read and interpret dialogue in the game, the program would become stuck at an early crossroads in the game in early iterations. In Pokémon Red, when you reach the second town, you are given an item to bring back to the Pokémon Professor in Pallet Town.

However, the AI had difficulty backtracking to deliver the parcel, making further progress impossible. As a result, Whidden skipped ahead to start each game after delivering the package, and with Squirtle as the AI’s starter Pokémon, because the early game is generally easier with a water Pokémon at your disposal.

“In the video, [the AI] gets as far as Mt. Moon, between the first and second gym,” Whidden told TechCrunch. Caves in early Pokémon games are notoriously difficult to navigate, even if you have a human brain. However, Whidden recently changed some of the rewards in his code and tried a different learning algorithm, and the AI was able to exit the cave and arrive in Cerulean City.

Other researchers, such as DeepMind’s AlphaGo, which was the first computer program to defeat a professional Go player, have used reinforcement learning to study the use of AI in gaming. But Whidden’s video has gone viral because he is so good at explaining unfamiliar concepts using a familiar medium: Pokémon.