ArcadeDreamer - Vishnu Mano

Overview

ArcadeDreamer is a learned world model that generates Atari gameplay frame-by-frame through neural network prediction—not by running game code. The model observes real gameplay, learns the underlying dynamics, and then "dreams" entirely new sequences by predicting what each next frame should look like given the current state and player input.

Demo

Every frame below is generated by the model, not rendered by a game engine.

Architecture

The project employs two main neural network components:

Variational Autoencoder (VAE) — Compresses game frames into a compact 64-dimensional latent space using convolutional layers (32-64-128-256 channels)
GRU-based Dynamics Model — Predicts future game states given player actions using 256 hidden units across 18 possible actions

Features

Data collection from Atari games via Gymnasium and ale-py
VAE training for compressed frame representations
Dynamics prediction for forecasting game states in latent space
Dream generation for creating simulated game sequences as animated GIFs
Interactive gameplay with side-by-side comparison of real vs. predicted frames

Training Pipeline

The workflow follows this sequence: collect 50,000 training frames per game, train the VAE encoder/decoder for 50 epochs, train the dynamics model for 100 epochs, then generate dream sequences or play interactively.