Famicom Party

14. Sprite Movement

Table of Contents:

In the last chapter, we created background graphics to display behind our sprites. While the addition of backgrounds helps make our game look more like an actual game, it is still entirely static - no different from a still picture. In this chapter, we'll look at how to move sprites around the screen. To do so, we will need to make some changes to how our game is drawing sprites.

First, we can no longer hard-code the positions of our sprites in cartridge ROM. As a reminder, "cartridge ROM" or PRG-ROM is the read-only part of your game, and in the NES' memory map, it is located from $8000 to $ffff. All of your game's code is located there, though your game code can refer to memory addresses outside of PRG-ROM, like when we wrote sprite data to $0200-$02ff. There are many choices for where we could put our sprite's information, but the best location is "zero-page RAM".

Zero-Page RAM

A "page" of memory on the NES is a contiguous block of 256 bytes of memory. For any memory address, the high byte determines the page number, and the low byte determines the specific address within the page. As an example, the range from $0200 to $02ff is "page $02", and the range from $8000 to $80ff is "page $80".

What, then, is "zero-page RAM"? Page zero is the range of memory from $0000 to $00ff. What makes page zero useful for things like sprite positions is its speed. The 6502 processor has a special addressing mode for working with zero-page RAM, which makes operations on zero-page addresses much faster than the same operation on other memory addresses. To use zero-page addressing, use one byte instead of two when providing a memory address. Let's look at an example: Note that you must use just one byte in order to take advantage of zero-page addressing mode. The assembler does not know anything about the memory addresses passing through it. If you were to type LDA $003b instead of LDA $3b in your assembly code, the resulting machine code would use (slower) absolute mode, even though the memory address you are loading from is located in page zero.

  LDA $8000 ; "regular", absolute mode addressing
            ; load contents of address $8000 into A

  LDA $3b   ; zero-page addressing
            ; load contents of address $003b into A

  LDA #$3b  ; immediate mode addressing
            ; load literal value $3b into A

So, using zero-page addressing gives us very fast access to 256 bytes of memory. Those 256 bytes are the ideal place to store values that your game will need to update or reference frequently, making them an ideal place to record things like the current score, the number of lives the player has, which stage or level the player is in, and the positions of the player, enemies, projectiles, etc. Notice that I said the position of "the player", and not the positions of the individual tiles that make up the player. Any game that hopes to have more than a very small number of objects on screen at one time will need to carefully ration the use of zero-page addresses.

Let's start using zero-page RAM in our code. Because only addresses from $8000 and up are ROM (i.e., part of your actual cartridge / code that you write), we can't just write zero page values directly. Instead, we tell the assembler to reserve memory in page zero, Note the ":" after the name of each reserved byte - these look like the labels that we've already been using, and that's because they are! When we use the name of a reserved byte later in our code, we're telling the assembler to find the memory address that corresponds to that label and replace the name with the address, exactly the same as any other label in our code. like this:

.segment "ZEROPAGE"
player_x: .res 1
player_y: .res 1

First, we tell the assembler that we want to reserve page zero memory by using the appropriate segment from our linker config file - in this case, "ZEROPAGE". Then, for each memory range we want to reserve, we use the .res directive, followed by the number of bytes we want to reserve. Generally this will be "1" to reserve a single byte of memory, but being able to specify any number can be useful if, for example, you need to store a 16-bit number in page zero.

Now that we have reserved memory, we need to initialize it to a good starting value somewhere in our code. Two good options for this are either as part of the reset handler, or at the beginning of main. We'll opt for the reset handler approach here. In reset.asm, just before JMP main, add the following code:

  ; initialize zero-page values
  LDA #$80
  STA player_x
  LDA #$a0
  STA player_y

If you try to assemble this code, however (ca65 src/reset.asm), you will get an error:

Error: Symbol 'player_y' is undefined
Error: Symbol 'player_x' is undefined

Generally, reserved memory names are only valid in the same file where they are defined. In this case, we reserved player_x and player_y in our main file, but we were trying to use them in reset.asm. Thankfully, ca65 provides directives to export and import reserved zero-page memory so it can be shared between files. We'll just need to add an .exportzp directive in our main file:

.segment "ZEROPAGE"
player_x: .res 1
player_y: .res 1
.exportzp player_x, player_y

Then, in reset.asm, we can use an .importzp directive: The .importzp directive should go inside of a .segment "ZEROPAGE", even if you are not doing anything else with page zero values in that file.

.segment "ZEROPAGE"
.importzp player_x, player_y

When you assemble these files, ca65 will look through the other source files in the same directory looking for imports and exports and it will figure out what data should go where.

Subroutines

Since we only have a limited number of zero-page addresses (256) available, we need to ration them out carefully. Instead of storing the position of every player sprite tile individually (which would take 8 bytes of zero page just for x/y positions), we will store just an overall player X and Y coordinate and offload the drawing of the actual player sprites to a subroutine. Subroutines are assembly's version of functions - named, reusable code fragments.

To create a subroutine, make a new .proc in your code. The only requirement for a subroutine is that it must end with the opcode RTS, "Return from Subroutine". To call a subroutine, use the opcode JSR, "Jump to Subroutine", followed by the name of the subroutine (whatever follows .proc).

Before we go further, let's take a look at what actually happens when we call a subroutine. Here is some example code:

1 LDA #$80
2 JSR do_something_else
3 STA $8000
4
5.proc do_something_else
6 LDA #$90
7 RTS
8.endproc

When this code runs, the processor first puts the literal value $80 into the accumulator. Then it calls the subroutine do_something_else. When the 6502 sees a JSR opcode, it pushes the current value of the program counter (the special register that holds the memory address of the next byte to be processed) onto the stack. A stack, in computer science, is a "last in, first out" data structure, similar to a stack of real-life plates. Adding something to the stack means putting it on top of the pile, and only the top-most element is available at any given time.

On the 6502, the stack is 256 bytes in size and is located at $0300 to $03ff. The 6502 uses a special register, the "stack pointer" (often abbreviated "S"), to indicate where the "top" of the stack is. When the system is first initialized, the value of the stack pointer is $ff. Every time something is stored on the stack, it is written to $0300 plus the address held in the stack pointer (e.g., the first write to the stack is stored at $03ff), and then the stack pointer is decremented by one. Attempting to write more than 256 items to the stack at one time causes the stack pointer to wrap around from $00 to $ff, meaning that further writes to the stack will overwrite already existing stack data. This generally-catastrophic scenario is called a stack overflow (though in the case of the 6502, it's more properly called a stack underflow). When a value is removed from the stack, the stack pointer is incremented by one.

So, on line 2, the processor sees a JSR opcode and stores the current value of the program counter on the stack. Then, it takes the operand of the JSR and puts that memory address into the program counter. Here, the processor skips from line 2 to line 6, and writes the literal value $90 to the accumulator. The next opcode is an RTS. When the 6502 sees an RTS, it takes the "top" value from the stack (often referred to as "popping" and item off the stack) and puts it into the program counter. Given the way the stack works, this should be the address that was "pushed onto" the stack back when the processor saw a JSR. This pulls us back to whatever code is immediately after the JSR. Here, that means STA $8000 - and the result will be writing $90 to that memory address, not $80. Subroutines do not, by default, "save" the values of any registers either when they are called or when they return. In most higher-level programming languages, this is taken care of for you through concepts like "variable scope" or "lifetimes". In assembly, though, you must handle saving and restoring the state of all registers (including the processor status register!) if you need those values to remain the same when returning from a subroutine. In general, it is good practice to always save and restore registers when subroutines are involved. Interrupts like NMI or IRQ can be called at any time - even while you are inside of another subroutine! - and it can be difficult to accurately predict what value is in a register if your subroutines / interrupt handlers are not written in a "defensive" manner.

Subroutine Register Management

To help you save and restore the contents of registers, the 6502 provides four opcodes: PHA, PHP, PLA, and PLP. PHA and PHP are used to "push" the accumulator ("A") and processor status register ("P"), respectively, onto the stack. In the other direction, PLA and PLP "pull" the top value off of the stack and place it into the accumulator or processor status register. There are no special opcodes for the X and Y registers; to push their values, you must first transfer them into the accumulator (with TXA / TYA), and to restore them you must pull into the accumulator and then transfer again (with TAX / TAY).

Let's look at an example subroutine that uses these new opcodes:

.proc my_subroutine
  PHP
  PHA
  TXA
  PHA
  TYA
  PHA

  ; your actual subroutine code here

  PLA
  TAY
  PLA
  TAX
  PLA
  PLP
  RTS
.endproc

When my_subroutine is called (with JSR my_subroutine), the first six opcodes preserve the state of the registers on the stack before doing anything else. PHP, storing the state of the processor status register, comes first, because the processor status register is updated after every instruction - if we waited until the end to store P, it would likely be modified by the results of instructions like TXA. With the processor status register stored away on the stack, we next push the value of the accumulator, and then transfer and push the values of the X and Y registers. With everything stored on the stack, we are free to use all of the 6502's registers without worrying about what the code that called our subroutine expects to find in them. Once the subroutine code is finished, we reverse all of the storing we did at the beginning. We restore everything in the opposite order of how we stored it, first pulling and transferring to the Y and X registers, then the accumulator, and then the processor status register. Finally, we end with RTS, which returns program flow to the point where we called the subroutine. If you forget to include RTS at the end of your subroutine, the 6502 will not return to where the subroutine was called and will instead happily continue with the next byte after your subroutine code. The processor doesn't know anything about .procs, they are simply a tool to help you organize your code.

Your First Subroutine: Drawing the Player

Now that you've seen how subroutines work, it's time to create your own. Let's write a subroutine that draws the player's ship at a given location. To do that, we'll need to use the player_x and player_y zero-page variables we created earlier to write the appropriate bytes to memory range $0200-$02ff. Previously, we did this by storing the appropriate bytes in RODATA and copying them with a loop and indexed addressing, the same way we did with palettes. As a quick refresher, we need to write four bytes of data for each 8 pixel by 8 pixel sprite tile: the sprite's Y position, tile number, special attributes / palette, and X position. The tile number and palette for each of the four sprites that make up the player ship will not change, so let's start there. We will also save and restore the system's registers at the start and end of our subroutine.

.proc draw_player
  ; save registers
  PHP
  PHA
  TXA
  PHA
  TYA
  PHA

  ; write player ship tile numbers
  LDA #$05
  STA $0201
  LDA #$06
  STA $0205
  LDA #$07
  STA $0209
  LDA #$08
  STA $020d

  ; write player ship tile attributes
  ; use palette 0
  LDA #$00
  STA $0202
  STA $0206
  STA $020a
  STA $020e

  ; restore registers and return
  PLA
  TAY
  PLA
  TAX
  PLA
  PLP
  RTS
.endproc

The player ship uses tiles $05 (top left), $06 (top right), $07 (bottom left), and $08 (bottom right). We write those tile numbers to memory addresses $0201, $0205, $0209, and $020d, respectively, because those correspond to "byte 2" of the first four sprites. All of the player ship's tiles use palette zero (the first palette), so the code to write sprite attributes is much shorter. $0202, $0206, $020a, and $020e are the bytes immediately following the previous tile number bytes, and so they hold the attributes for each of the first four sprites. Finally, we restore all of the registers, in the opposite order of how we stored them, and use RTS to end the subroutine.

What about the location of each tile on screen? For that, we will need to use player_x, player_y, and some basic math. Let's assume, to make things easier, that player_x and player_y represent the top left corner of the top left tile of the player's ship. In our reset handler, we positioned the top left corner of the top left player ship tile at ($70, $a0). Once we have placed the top left tile, we can add eight pixels to player_x and player_y to find the positions of the other three tiles.

In the past, we have used INC / DEC to add or subtract. When adding or subtracting more than 1, however, there are more efficient opcodes. ADC ("ADd with Carry")

Here's what that looks like (previous code reduced to just comments):

.proc draw_player
  ; save registers
  ; store tile numbers
  ; store attributes

  ; store tile locations
  ; top left tile:
  LDA player_y
  STA $0200
  LDA player_x
  STA $0203

  ; top right tile (x + 8):
  LDA player_y
  STA $0204
  LDA player_x
  CLC
  ADC #$08
  STA $0207

  ; bottom left tile (y + 8):
  LDA player_y
  CLC
  ADC #$08
  STA $0208
  LDA player_x
  STA $020b

  ; bottom right tile (x + 8, y + 8)
  LDA player_y
  CLC
  ADC #$08
  STA $020c
  LDA player_x
  CLC
  ADC #$08
  STA $020f

  ; restore registers and return
.endproc

Remember that when you want to perform addition, first call CLC, then use ADC (unless you're trying to add something to a 16-bit number, which will be rare for now). The result of the addition can be found in the accumulator; it does not get written to player_y or player_x.

Putting It All Together

With our subroutine written, it's time to make use of it. We already set up the initial values of player_x and player_y in the reset handler. Now, we'll call our new subroutine as part of the NMI handler, so it runs every frame:

14.proc nmi_handler
15 LDA #$00
16 STA OAMADDR
17 LDA #$02
18 STA OAMDMA
19
20 ; update tiles *after* DMA transfer
21 JSR draw_player
22
23 LDA #$00
24 STA $2005
25 STA $2005
26 RTI
27.endproc

Notice that we perform a DMA transfer of whatever is already in memory range $0200-$02ff before calling our subroutine. The amount of time you have available to complete your NMI handler is very short, so putting your DMA transfer first ensures that at least something will be drawn to the screen each frame.

Finally, we need to update player_x each frame so that our sprites will actually move around the screen. For this example, we will keep player_y the same, but we will modify player_x so that the player's ship moves to the right until it is near the right edge of the screen and then moves left until it is near the left edge of the screen. To make this easier, we'll need to store what direction the player's ship is moving in. Let's add another zero page variable, player_dir. A 0 will indicate that the player's ship is moving left, and a 1 will indicate that the player's ship is moving right.

.segment "ZEROPAGE"
player_x: .res 1
player_y: .res 1
player_dir: .res 1
.exportzp player_x, player_y

I did not export player_dir because other files do not (yet) need to access it. Now we can write the code to update player_x. We could write out this code as part of the NMI handler directly, but in anticipation of more complicated player movement in the future, let's put it into its own subroutine, update_player:

.proc update_player
  PHP
  PHA
  TXA
  PHA
  TYA
  PHA

  LDA player_x
  CMP #$e0
  BCC not_at_right_edge
  ; if BCC is not taken, we are greater than $e0
  LDA #$00
  STA player_dir    ; start moving left
  JMP direction_set ; we already chose a direction,
                    ; so we can skip the left side check
not_at_right_edge:
  LDA player_x
  CMP #$10
  BCS direction_set
  ; if BCS not taken, we are less than $10
  LDA #$01
  STA player_dir   ; start moving right
direction_set:
  ; now, actually update player_x
  LDA player_dir
  CMP #$01
  BEQ move_right
  ; if player_dir minus $01 is not zero,
  ; that means player_dir was $00 and
  ; we need to move left
  DEC player_x
  JMP exit_subroutine
move_right:
  INC player_x
exit_subroutine:
  ; all done, clean up and return
  PLA
  TAY
  PLA
  TAX
  PLA
  PLP
  RTS
.endproc

This subroutine makes heavy use of the branching and comparison opcodes we saw in Chapter 11. We first load player_x into the accumulator and compare with $e0. CMP, as we learned earlier, subtracts its own operand from the accumulator, but only sets the carry and zero flags. We can use the resulting processor status register flags to tell us whether the value in the accumulator (in this case, player_x) was greater than, equal to, or less than CMP's operand. BCC not_at_right_edge tells the 6502 to skip ahead to not_at_right_edge if the carry flag is cleared. When performing a subtraction as part of a comparison, the 6502 first sets the carry flag, and it is only cleared if the accumulator is smaller than CMP's operand. In this case, if the accumulator is smaller than $e0, we know we are not near the right edge of the screen, so we can skip ahead to not_at_right_edge. If the accumulator is greater than $e0, the carry flag will still be set and the 6502 will continue with the next line. In that case, we are near the right edge of the screen, so we will need to update player_dir with a zero (to signify "moving left"). Then we use JMP to skip over the checks for whether or not we are near the left edge of the screen, because we already know that's not possible.

If the result of the first comparison was that player_x is not near the right edge of the screen, it's time to test if player_x is near the left edge of the screen. We compare player_x with $10, and this time we use BCS direction_set. BCS, as explained above, will activate if the accumulator (player_x) was larger than the comparison value ($10). In that case, we are not near the left edge and can skip forward to actually updating player_x. Otherwise, we need to update player_dir to be $01, indicating "move right". Note that, with the way update_player is structured, if the player's ship is not near either edge of the screen, player_dir will not be updated, but we will still increment or decrement player_x as appropriate.

Finally, it's time to actually use the results of our edge tests. We compare player_dir with $01 and look to see if the result is zero. If it is, BEQ move_right activates and we increment player_x. Otherwise, we decrement player_x. Having performed our update, we restore all of the registers and return from the subroutine.

Let's call our new subroutine inside of the NMI handler to finish off our example project:

20 ; update tiles *after* DMA transfer
21 JSR update_player
22 JSR draw_player

All that's left is to assemble and link the files into a NES ROM:

ca65 src/spritemovement.asm
ca65 src/reset.asm
ld65 src/reset.o src/spritemovement.o -C nes.cfg -o spritemovement.nes

If you open the resulting .nes file in an emulator, you should see this:

Homework

Now that you understand the basics of moving sprites around the screen, try these projects to explore and deepen your understanding.

To help you get started, you can download all of the code from this chapter.