Dnext

September 14, 2024 2:35am

This looks like the video game Doom, but it is actually the output of a diffusion model.

Not only that, but the idea here isn't just to generate video that looks indistinguishable from Doom gameplay, but to create a "game engine" that actually lets you play the game. In fact this diffusion model "game engine" is called "GameNGen", which you pronounce "game engine".

To do this, they actually made two neural networks. The first is a reinforcement learning agent that plays the actual game Doom. As it does so, its output gets ferried over to the second neural network as "training data". In this manner, the first neural network creates unlimited training data for the second neural network.

The second neural network is the actual diffusion model. They started with Stable Diffusion 1.4, a diffusion model "conditioned on" text, which is what enables it to generate images when you input text. They ripped out the "text" stuff, and replaced it with conditioning on "actions", which are the buttons and mouse movements you make to play the game, and previous frames.

Inside the diffusion model, it creates "latent state" that represents the state of the game -- sort of. That's the idea, but it doesn't actually do a good job of it. It does a good job of remembering state that is actually represented on the screen (health, ammo, available weapons, etc), because it's fed the previous 3 frames of video every time step to generate the next frame of video, but not so good at remembering anything that goes off the screen. Oh, probably should mention, this diffusion model runs fast enough to generate images at "real time" video frame rates.

Because it doesn't use the actual Doom game engine state code -- or otherwise represent the game state with conventional code -- but represents state inside the neural network, but does so imperfectly for stuff that goes off the screen, when humans play this game, it seems like real Doom for short time periods, but when played over any extended length of time, humans can tell it's not real Doom.

GameNGen - Michael Kan

#solidstatelife #ai #genai #computervision #diffusionmodels #videogames #doom