Writing an emulator: the first pixel

Last time, we turned a tiny program that crashed into a smallish program that loops forever. Woo1I want to take a brief moment to apologize if the couple previous article felt horribly sluggish. The truth is: early development of that emulator was sluggish. It did take weeks to get to the point where something visually interesting happened on screen, but for it to work at all, the emulator needed a functional CPU so that’s what we started with.!

But now, we need more than a CPU: something in that Game Boy is using hardware register 0xff44 somehow and the CPU clearly needs it to break out of the infinite loop we left it in.

To the Pan Docs!

FF44 – LY – LCDC Y-Coordinate (R)

The LY indicates the vertical line to which the present data is transferred the LCD Driver. The LY can take on any value between 0 through 153. The values between 144 and 153 indicate the V-Blank period.

This is part of a larger section describing the video display. But now we have to step back and take a broader look at what we’re facing. We could cheat again and come up with a quick counter that would simply update that LY register, but what would be the point of a video display that displays nothing?

Besides, how does the Game Boy display pixels anyway?

To answer that question, we’ll need to look away from the Pan Docs and go back to the Ultimate Game Boy Talk. It’s a goldmine of information, and I spent quite a lot of time watching the technical part about the Pixel Processing Unit (or PPU) that takes care of displaying things on the Game Boy’s LCD screen.

Pretty much everything we need is in there, but that’s a lot at once, between tiles, tile maps, scrolling, sprites… In fact, the part most relevant to us now is the very beginning of the bit about horizontal timing.

TUGBT slide 501
PPU states and timings when displaying a single frame. Source: The Ultimate Game Boy Talk (44:39)

The picture above contains the answer to our initial question: the CPU is waiting for the PPU to be done displaying the 144th (and last) scanline of the current frame. It will then update video registers during the VBLank period, when it’s safe to do so.

First, we can already see there are going to be several states the PPU can be in. Second, we’re finally talking about timings.

All right! I love automata!

I didn’t really care how slow or fast our emulator’s CPU was working before2That really depends on your own computer’s CPU. My home computer can run the emulator a bit faster than an actual Game Boy. My laptop, on the other hand… and I’m not worried about the PPU’s speed either, right now. At best, it’ll be noticeably faster than on the original Game Boy. At worst, its speed will be measured in seconds-per-frames3The home computer and laptop confirm both statements.. Still, as long as we get to see a scrolling logo in the end, I’ll be happy!

(I also haven’t yet managed to come up with a satisfactory way to run the emulator at a speed that feels natural, so we’ll have to make do.)

Right, if it’s not about speed, then what? Remember how we broadly described the emulator’s CPU as an infinite loop in which we do stuff with memory? The PPU also runs in an infinite loop of its own, which basically scans the video RAM and outputs pixels accordingly from top to bottom. 160 pixels wide, 144 pixels high, over and over, approximately 60 times per seconds.

So now we have two separate objects in our emulator running their own business in a loop. While it might not be obvious at all right now, if we want to achieve any level of accuracy, we’re going to have to find a way to synchronize the CPU and PPU’s operations to a certain degree.

As you saw earlier, we can already split PPU operations into four big states, each taking a fixed4If you’ve watched the whole video you know it’s more complicated than that but, again, it will be enough for the time being. amount of cycles to complete. Then, the current state changes and the PPU spends the next few cycles doing something else and so on.

This is easily represented as a finite-state automaton, or finite-state machine. In terms of code, this mostly amounts to a big switch/case within a function that’s called every clock cycle, or tick or whatever we call it.

TUGBT slide 501
PPU states and transitions.

Well, we also need to split the CPU’s endless loop into similar states, each taking a fixed amount of cycles to complete. But seeing how much we have to cover today, I’m going to keep that for a future article, so you know what we’ll do instead: cheat again. We’ll consider that the CPU’s only state is… to execute an instruction, exactly as it’s been doing so far. We’re basically converting the body of the CPU’s execution loop into a Tick() method.

Timings, ticks, clocks and cycles

(If the title above sounds like a spoonerism to you, then you’re very welcome and my work here is done!)

Implementing a PPU as a set of states is one thing, but what is all this timing jargon?

Like a lot of electronic devices5[citation needed], the Game Boy is driven by a clock that generates a signal alternating between 0 and 1, each transition — in the rest of the article I’ll refer to those as “ticks” — letting each component in the whole system perform a tiny atomic (as in “as small as to be indivisible”) operation. The Game Boy’s clock does that 4 194 304 times per second.

You’d expect our various components to do one little thing each tick. However due to the way the hardware works, some memory circuits being faster than others and such, not all parts in our emulator will work at the same rate.

The CPU, for instance, is comparatively faster than RAM. Therefore, reading data from memory, like the LD instruction does, takes several ticks, during which the CPU can do nothing but wait until it’s done accessing memory. In the end, a single atomic instruction of the CPU still takes four ticks.

Documentation gets a little confusing at this point. Some define that four-tick period as a cycle, sometimes a machine cycle. Some use the term “clock” or “clock cycle” instead of tick… What matters is that we now want to write a PPU class that will have to work on its own by only getting its Tick() method called over and over in an infinite loop.

We’ll also want to rework our CPU structure to work in a similar way. This will allow us to “run” those two components somewhat in parallel and, hopefully, be accurate enough to run most games, someday.

But now is the time to roll our sleeves up and make pixels happen!

A quick and dirty PPU

A state machine is really easy to implement. Store a state number in a persistent variable. Call a function repeatedly. In that function, have a switch/case block do what your machine must do in the span of one tick, then store a new value in your state variable if needed. That’s it!

Here is an empty implementation of the PPU’s state machine.

// PPUState is an enum-like type to define all admissible states for the PPU.
type PPUState uint8

// Possible PPU states. Values don't really matter, they start from zero.
const (
    OAMSearch PPUState = iota
    PixelTransfer
    HBlank
    VBlank
)

type PPU struct {
    LY uint8 // Number of the line currently being displayed.

    state PPUState // Current state of the state machine.
    ticks uint     // Clock ticks counter for the current line.

    x uint8 // Number of pixels already output in the current scanline.
}

// Tick advances the PPU state one step. Each `case` block will eventually
// contain code that will take various amounts of ticks to complete before
// updating the PPU state.
func (p *PPU) Tick() {
    switch p.state {
    case OAMSearch:
        // TODO: collect sprite data here.
        p.state = PixelTransfer

    case PixelTransfer:
        // TODO: push pixel data to display.
        p.state = HBlank

    case HBlank:
        // TODO: wait, then go back to sprite search for next line, or vblank.
        if p.LY == 144 {
            p.state = VBlank
        } else {
            p.state = OAMSearch
        }

    case VBlank:
        // TODO: wait, then go back to sprite search for the top line.
        p.state = OAMSearch
    }
}

As always, this is a simplified, stripped-down version. The example program’s source code is more complete and far too heavily commented. I also won’t cover the memory access part of the PPU, but it’s also implemented as an Addressable so the CPU can read the LY register.

We’ll go over each state in order and see what needs to be done for now. I’ll leave a lot of TODO comments that we’ll address in due time.

Object Attribute Memory (OAM) search

We won’t need sprite support to run the boot ROM, so the OAM search phase will be spent counting ticks6Actually, in about half of the states in which it can be, the PPU will just be counting ticks to give the CPU time to do crucial calculations before the next frame is displayed..

    case OAMSearch:
        // In this state, the PPU would scan the OAM (Objects Attribute Memory)
        // from 0xfe00 to 0xfe9f to mix sprite pixels in the current line later.
        // This always takes 40 ticks.
        if p.ticks == 40 {
            // TODO: set up the pixel fetcher here.
            p.state = PixelTransfer
        }

Pixel transfer

Pixel transfer involves the pixel fetcher, which is by far the most complicated bit in the PPU. We’ll put it aside for the time being and pretend it’s shifting out a pixel every tick, but we will come back and fix it in a bit.

    case PixelTransfer:
        // TODO: Fetch pixel data into our pixel FIFO.
        // TODO: Put a pixel (if any) from the FIFO on screen.
        // Check when the scanline is complete (160 pixels).
        p.x++
        if p.x == 160 {
            p.state = HBlank
        }

Horizontal blanking

HBlank happens when all 160 pixels in a scanline have been output to the screen. This is where we can update the PPU’s LY register, which is what we initially set out to do, remember? Other than that, this is also a waiting state where we’ll simply count ticks.

    case HBlank:
        // A full scanline takes 456 ticks to complete. At the end of a
        // scanline, the PPU goes back to the initial OAM Search state.
        // When we reach line 144, we switch to VBlank state instead.
        if p.ticks == 456 {
            p.ticks = 0
            p.LY++
            if p.LY == 144 {
                p.state = VBlank
            } else {
                p.state = OAMSearch
            }
        }

Vertical blanking

VBlank happens when all 144 scanlines in a frame have been output to the screen. Again, the PPU will just wait the equivalent of 10 more lines worth of ticks before going back to the initial OAM search state.

    case VBlank:
        // Wait ten more scanlines before starting over.
        if p.ticks == 456 {
            p.ticks = 0
            p.LY++
            if p.LY == 153 {
                p.LY = 0
                p.state = OAMSearch
            }
        }

And that’s a frame! Well, on paper at least. We’ve got a shell of a PPU, but still no pixels yet. We need to zoom in and look at how the PPU’s fetcher works.

Let’s play fetch

I will — you guessed it — refer once again to the Ultimate Game Boy Talk (49:17). Fetching pixels from the Game Boy’s video RAM is a slightly convoluted process for reasons that are explained in the video. I will just be covering the bits we need to display (and eventually scroll) the background map.

What’s a tile map?

The Game Boy does not actually work with individual pixels, but with 8×8 pixel tiles. The PPU walks through video RAM as if it was a large 32×32 grid where each square is just a number that refers to a tile — we’ll call this a tile ID.

If we don’t count sprites, which are a special case, the Game Boy can handle two different tile maps: the background map, which is displayed by default, and an optional window that can overlap part of the screen.

The principle is the same for both: for a given pixel to be displayed in a scanline, the fetcher will, in order:

  • Find out in which tile map that pixel is. In the boot ROM case, it’s always going to be the background map.
  • Find out on which square in that map the pixel is. This will give it a tile ID to look up.
  • Find out on which horizontal line in the associated tile that pixel is.
  • Read the necessary bytes containing graphics data for the 8 pixels in that tile line.
  • Store those 8 pixels in a FIFO that the PPU will read from to shift pixels out to the display.
Fetcher principle. Source: The Ultimate Game Boy Talk (49:17)

There is obviously more to it7There are plenty of annoying parameters to reading data from a tile map: several memory zones where tiles can be defined, different ways to address them in memory, etc. We’ll just ignore all of it today!, but what I just listed is enough to get started.

We’ll treat the background map as a single, contiguous memory zone from 0x9800 to 0x9bff8Source: VRAM Baground Maps (Pan Docs). where each byte contains an ID, which we’ll use as a mere offset in the other memory zone where tiles are defined.

Speaking of, we’ll also assume tile data is stored from address 0x8000 to 0x8fff9Source: VRAM Tile Data (Pan Docs) though I felt the Ultimate Game Boy Talk was clearer on this.. If you remember, this is the memory zone that the boot ROM code sets to zero at the beginning of the boot process.

With these addresses in mind, we can then start computing where a given pixel’s data should be read from.

A bunch of offsets

It’s a little misleading to refer to individual pixels. We’re actually always going to read 8 pixels at a time from memory. Those 8 pixels are stored as two consecutive bytes in video RAM, which is why a full tile takes 16 bytes: two for each row of 8 pixels.

Figuring out those bytes’ address requires computing a couple offsets:

  • Which square in the tile map corresponds to the current X and Y position. This will be our tile.
  • Which row in that square corresponds to the current Y position. This will be the row of pixels we want to fetch from memory.
TUGBT slide 501
How the memory addresses accessed by the fetcher are computed.

The operations shown above should be inserted at the end of the PPU’s OAM Search state. The offsets we compute will be enough to get the fetcher started on a new scanline.

    if p.ticks == 40 {
        // Initialize pixel transfer state.
        p.x = 0
        tileLine := p.LY % 8
        tileMapRowAddr := 0x9800 + (uint16(p.LY/8) * 32)
        p.Fetcher.Start(tileMapRowAddr, tileLine)

        p.state = PixelTransfer
    }

You’ll notice I just introduced a new Fetcher property in our PPU. It’s time to finally come back to the PPU’s pixel transfer state and see what really happens there.

Another state machine

We could probably have implemented the fetcher logic inside the PPU itself, but the fetcher is another state machine of its own. Implementing it as such will make it easier to synchronize with the rest of the PPU.

The code structure will be very similar to the PPU’s. We’ll just add a Start() function to initialize the fetcher at the start of a new scanline and the related variables to the Fetcher structure.

// FetcherState is an enum-like type to define all admissible states for the
// PPU's fetcher.
type FetcherState uint8

// Possible Fetcher states. Values don't really matter, they start from zero.
const (
    ReadTileID FetcherState = iota
    ReadTileData0
    ReadTileData1
    PushToFIFO
)

// Fetcher reading tile data from VRAM and pushing pixels to a FIFO queue.
type Fetcher struct {
    FIFO     FIFO         // Pixel FIFO that the PPU will read.
    mmu      *MMU         // Reference to the global MMU.
    ticks    int          // Clock cycle counter for timings.
    state    FetcherState // Current state of our state machine.

    // Starting parameters and work variables omitted...
}

(Like I did with the PPU, I omitted some of the fields for clarity. Check the example program’s source code for more.)

So far we’re writing this pretty much like we wrote the PPU. We just need a pointer to our MMU so the fetcher can read tile data from video RAM, and a FIFO to exchange pixel data with the PPU.

I won’t cover the FIFO structure used here. FIFO is simple enough to implement, and my implementation is a mere circular buffer. Just as you’d expect, you can call Push() and Pop() on it, it advertises its size, and items pop out in the same order they were pushed in. Also you can clear it10I’ll show you a fun bug that happens when you don’t, in a future article.. Your favorite language might even already have a FIFO-like container type built-in.

Before going over the states we’ll implement, let’s look at the “empty” state machine like we did before.

// Tick advances the fetcher's state machine one step.
func (f *Fetcher) Tick() {
    // The Fetcher runs at half the speed of the PPU (every 2 clock cycles).
    f.ticks++
    if f.ticks < 2 {
        return
    }
    f.ticks = 0 // Reset tick counter and execute next state.

    switch f.state {
    case ReadTileID:
        // TODO: Read the tile's number from the background map.
        f.state = ReadTileData0

    case ReadTileData0:
        // TODO: Read the first byte of pixel data.
        f.state = ReadTileData1

    case ReadTileData1:
        // TODO: Read the second byte of pixel data.
        f.state = PushToFIFO

    case PushToFIFO:
        // TODO: Store pixel data in the FIFO if there is enough room.
        f.state = ReadTileID
    }
}

The first six lines of that function are used to divide the PPU’s frequency by two. The PPU’s Tick() method is called every clock cycle, but we saw in the Ultimate Game Boy Talk video that the fetcher runs at half that speed11The technical reason for this is that the Game Boy’s RAM runs at 1MHz, while the PPU and fetcher run at 2MHz, so it takes the fetcher two ticks for each memory read. This is also why the CPU, running at 4MHz, needs four ticks to access memory., so we simply skip every other tick.

This way, the fetcher’s Tick() method also gets called every clock cycle, and the fetcher will still run at exactly the required speed12Relative to the rest of the system. It might still be a pretty low speed..

All that’s left is to read the proper bytes from memory.

Initialization

The fetcher needs to know some basic offsets to get started on the proper row in our tile map. This is done in its Start() method. In the future, this method may also be called in the middle of a scanline to start reading tiles from the window map instead of the background.

// Start fetching a line of pixels starting from the given tile address in the
// background map. Here, tileLine indicates which row of pixels to pick from
// each tile we read.
func (f *Fetcher) Start(mapAddr uint16, tileLine uint8) {
    f.tileIndex = 0
    f.mapAddr = mapAddr
    f.tileLine = tileLine
    f.state = ReadTileID

    // Clear FIFO between calls, as it may still contain leftover tile data
    // from the very end of the previous scanline.
    f.FIFO.Clear()
}

Once that’s done, the fetcher will read each tile in the tile map starting from the given address until the whole line has been displayed. This is, again, a simplification that doesn’t yet take scrolling into account.

Reading the tile ID

This is a single memory access. In the present case where we’re not doing any scrolling, for the very first line in the frame, mapAddr points to the very beginning of the background map and tileIndex is zero, so the fetcher will read the ID stored in the first byte at address 0x9800.

    case ReadTileID:
        // Read the tile's number from the background map. This will be used
        // in the next states to find the address where the tile's actual pixel
        // data is stored in memory.
        f.tileID = f.mmu.Read(f.mapAddr + uint16(f.tileIndex))
        f.state = ReadTileData0

Later, when all top 8 pixels of that tile have been written to the FIFO, and the state machine goes back to this state, the fetcher will read the ID stored at address 0x9801 and so on until 160 pixels have been read. This amounts to twenty tiles whose IDs are stored between 0x9800 and 0x9814.

During the second pass, the exact same tile IDs will be read from the exact same addresses in the background map, but this time the fetcher will read tile data for the second row of pixels from the top of the tile.

Reading tile data

This operation is split in two separate states because the fetcher can only read one byte per couple of Tick() calls. Both of those states do virtually the same thing, just with two consecutive bytes.

    case ReadTileData0:
        // A tile's graphical data takes 16 bytes (2 bytes per row of 8 pixels).
        // Tile data starts at address 0x8000 so we first compute an offset to
        // find out where the data for the tile we want starts.
        offset := 0x8000 + (uint16(f.tileID) * 16)

        // Then, from that starting offset, we compute the final address to read
        // by finding out which of the 8-pixel rows of the tile we want.
        addr := offset + (uint16(tileLine) * 2)

        // Finally, read the first or second byte of graphical data depending on
        // what state we're in.
        data := f.mmu.Read(addr) // In next state, this will be addr+1
        for bitPos := uint(0); bitPos <= 7; bitPos++ {
            // Store the first bit of pixel color in the pixel data buffer.
            f.pixelData[bitPos] = (data >> bitPos) & 1
        }

This is merely translating those offsets from earlier into code. In the actual program, this is all inside a function that’s called in both ReadTileData0 and ReadTileData1 states. In the latter state, the function reads the second byte and updates the pixelData array instead of overwriting it.

Feeding the FIFO

We’re almost there, now the fetcher must simply push each pixel value to the FIFO, provided there is room in it. The FIFO’s size is fixed to 16 items so we must make sure there is enough room for at least 8 of them.

    case PushToFIFO:
        if f.FIFO.Size() <= 8 {
            // We stored pixel bits from least significant (rightmost) to most
            // (leftmost) in the data array, so we must push them in reverse
            // order.
            for i := 7; i >= 0; i-- {
                f.FIFO.Push(f.tileData[i])
            }
            // Advance to the next tile in the map's row.
            f.tileIndex++
            f.state = ReadTileID
        }

And that’s the fetcher for now! None of the code we wrote for it is very advanced, but getting those offsets right to read the correct addresses is a pain. More about that in a couple articles.

Dude, where’s my pixel?

We now have pixel data — which is just a bunch of numbers between 0 and 3 — in a FIFO, ready to display… and displaying pixels on a computer screen is yet another big non-trivial thing that deserves its own article.

But since we are so close now, let’s whip up a terminal-based display. Our example program runs from the command line anyway, and it turns out there is a neat way to output Game Boy pixels to a terminal.

Enter Unicode, and its Block Elements. If you ever looked into terminal-based interfaces, chances are you’ve seen these or their ancestors. We’re especially interested in ░, ▒, ▓ and █ (a.k.a. “light shade”, “medium shade”, “dark shade” and “full block”).

We can use these characters as grayscale pixels of sorts, we only need four shades to emulate a Game Boy screen.

But wait, there is more!

Obviously, we won’t be satisfied with a terminal-only output. We will definitely implement something serious next, with a proper window and real color pixels. You know what that means…

// Display interface supporting pixel output and palettes.
type Display interface {
    // Write outputs a pixel (defined as a color number) to the display.
    Write(color uint8)

    // HBlank is called whenever all pixels in a scanline have been output.
    HBlank()

    // HBlank is called whenever a full frame has been output.
    VBlank()
}

Yep, another interface. It will let us substitute that terminal display we’ll write next for something fancier that we’ll write later.

Write() will be called by the PPU whenever a pixel should be displayed, HBlank() when the PPU reaches the end of a scanline (which is actually only useful for our terminal display) and VBlank() will be called when the PPU is done displaying a frame.

The simplest screen

With that in mind, implementing the actual display is a breeze: shifting a pixel out will just mean printing out one of those block characters, HBlank will be when we need to print out a carriage return, and VBlank… well, here I chose to do nothing special, but we could clear the screen for the next frame, for instance.

// Console display shifting pixels out to standard output.
type Console struct {
    Palette [4]rune
}

// NewConsole returns a Console display with dark-themed unicode pixels.
// If your terminal is light-themed, reverse the order of the Palette
// array below.
func NewConsole() *Console {
    return &Console{Palette: [4]rune{'█', '▒', '░', ' '}}
}

// Write prints out a pixel from our rune palette. We actually print two
// characters to obtain a relatively square pixel.
func (c *Console) Write(colorIndex uint8) {
    // Here we just tell Printf to use argument number 1 (the character to
    // display) twice.
    fmt.Printf("%[1]c%[1]c", c.Palette[colorIndex])
}

// HBlank prints a newline to set up the console for the next scanline.
func (c *Console) HBlank() {
    fmt.Print("\n")
}

// VBlank prints a separation between each console screen frame.
func (c *Console) VBlank() {
    fmt.Print("\n === VBLANK ===\n")
    // If you want to clear the screen instead, you can try the code below:
    //fmt.Print("\033[2J")
}

If you look at the example program’s code later, you’ll notice I glossed over some code dealing with whether the display is enabled or not. I also only used three of those special block characters, and a mere space for the darkest pixels (as my terminal is using a dark color theme).

At last, since those block characters are pretty much rectangular — in my terminal at least, I’ve noticed the mobile version of this article shows square characters — I’m printing two at once to make resulting pixels look square-ish instead of narrow-ish.

The downside is that a full frame will be 320×144 characters. For reference, my current terminal is only 238 characters wide and 56 characters high. Unless you use a tiny font or a gigantic screen, you’ll probably have to use your terminal’s zoom-out feature to make a single frame fit.

However, the example program also contains an HDConsole structure, which uses ANSI color codes and the upper half block character (▀) to make a frame fit in 160×72 characters instead. If your terminal supports 256-color ANSI codes, then you should give it a try.

Putting it all together

This is it, we have everything we need. Now we can add the missing bits to the PPU.

type PPU struct {
    LY   uint8 // Number of the scanline currently being displayed.

    // Fetcher runs at half the PPU's speed and fetches pixel data from the
    // background map's tiles, according to the current scanline. It also holds
    // the FIFO pixel queue that we will be writing out to the display.
    Fetcher Fetcher

    // Screen object implementing our Display interface. Just a text
    // console for now, but we can swap it for a proper window later.
    Screen Display

    state PPUState // Current state of the state machine.
    ticks uint     // Clock ticks counter for the current line
    x     uint8    // Number of pixels already output for the current line.

}

Then, we’ll go over the PPU’s states again and fill in the blanks we left earlier, now we have a proper fetcher and a real display. We already saw how the fetcher was started at the end of the OAM Search state. We just need to output pixels and call the display’s HBlank() and VBlank() methods.

Displaying the fetched pixels

The pixel transfer state needs two things: to actually fetch pixels from memory, and to shift them out to the display, which is all interleaved.

We also need to call our display’s HBlank() method when the scanline is complete. This is why we keep track of the X position.

    case PixelTransfer:
        // Fetch pixel data into our pixel FIFO.
        p.Fetcher.Tick()

        // Put a pixel from the FIFO on screen if we have any. The FIFO can
        // store any type so we explicitly cast it to uint8.
        if pixelColor, err := p.Fetcher.FIFO.Pop(); err == nil {
            p.Screen.Write(pixelColor.(uint8))
            p.x++
        }

        // Check when the scanline is complete (160 pixels).
        if p.x == 160 {
            // Switch to HBlank state. For our console screen, that means
            // printing a carriage return.
            p.Screen.HBlank()
            p.state = HBlank
        }

Finishing the frame

The last touch is to call VBlank() when appropriate at the very end of the HBlank state. It’s not really useful for our terminal output, but it will be the place where we’ll blit a texture to screen when we implement a real display.

    case HBlank:
        // A full scanline takes 456 ticks to complete. At the end of a
        // scanline, the PPU goes back to the initial OAM Search state.
        // When we reach line 144, we switch to VBlank state instead.
        if p.ticks == 456 {
            p.ticks = 0
            p.LY++
            if p.LY == 144 {
                p.Screen.VBlank()
                p.state = VBlank
            } else {
                p.state = OAMSearch
            }
        }

The reward

Here is the refactored main for loop to run our CPU and PPU together. As I mentioned at the beginning, I simply moved all the code that was in that loop to the CPU’s new Tick() method.

I also added the PPU as an extra address space in our MMU, made the fetcher aware of it and instantiated the screen (and thanks to our Display interface you can trivially plug the HD console display in the PPU instead).

func main() {
    boot := NewBoot("./dmg-rom.bin")     // Covers 0x0000→0x00ff and 0xff50
    ram := NewRAM(0x8000, 0xffff-0x8000) // Covers 0x8000→0xffff

    // Set up display using a text-based output.
    //screen := NewHDConsole() // Try this if your terminal supports 256 colors
    screen := NewConsole()
    ppu := NewPPU(screen) // Contains address 0xff44

    // MMU looking up addresses in the boot ROM or BOOT register first,
    // then in the PPU, then in RAM. So even if the RAM object technically
    // contains addresses shadowing the BOOT or LY registers, the boot
    // or ppu objects will take precedence.
    mmu := MMU{[]Addressable{boot, ppu, ram}}
    ppu.Fetcher.mmu = &mmu
    cpu := CPU{mmu: mmu}

    for {
        // Now we have each component executing in parallel perform one tick's
        // worth of work in one iteration of our loop.
        cpu.Tick()
        ppu.Tick()
    }
}

Remember that our PPU spends a lot of its ticks idly waiting. Right now this means that our CPU, with its unique state that will advance one step for each system clock tick, is running light-speedingly faster than our more accurate PPU. It just means it’ll be spending more cycles waiting for that frame and I can live with that for the time being.

Right now, why don’t we just run that code?

TUGBT slide 501
A single frame. Notice how I had to zoom out to make the whole frame fit in a terminal window.

So hey, cool, we solved the issue from last time: the CPU did wait for a frame and then went on to… exit as it encountered the next unimplemented opcode. Also we wrote a crude but functional screen on which we displayed the aforementioned frame!

Wasn’t that worth it? Sure, it’s a little weirdly shaped, the color is actually wrong — that dark bar looks gray, it should be black — and we’re actually seeing pixels right away when the screen should initially be blank until the logo scrolls in. We’ll fix all of that!

In the mean time, I’ll be implementing the remaining CPU instructions and we’ll dive into SDL, textures and how to scroll our logo for real in the next article.

Thank you for reading!

References

You can download the example program above and run it anywhere from the command line:

$ go run the-first-pixel.go

It expects a dmg-rom.bin file to be present in the same folder.