Bare-Metal Coding: Software Optimization on the 6502 CPU
Welcome to my first blog post on Software Portability and Optimization (SPO 600)! Through this series, I’ll share my learnings and experiences as I explore the world of low-level programming and optimization. We will start by working on a 6502 emulator, which will lay the foundation for understanding more complex architectures like x86_64 and AArch64. Let’s dive into it!
The reason we choose 6502 CPU is due to limited instructions, because it isn’t good idea to read 7000 pages of instructions just to get work on it. And Intrestingly it was the one that powered Apple II, Commodore and even the Nintendo Entertainment System.
The 6502 CPU can directly addresses up to 64KB of memory using 16-bit addresses. The CPU interacts with memory through various addressing modes: Immediate address, Zero-Page addressing, Absolute Addressing, Indexed Addressing.
We can begam writing basic programs for the 6502 using an emulator. Here, How it actually looks:
Now, we will play with the following code in the 6502 emulator.
lda #$00 ; set a pointer in memory location $40 to point to $0200
sta $40 ; ... low byte ($00) goes in address $40
lda #$02
sta $41 ; ... high byte ($02) goes into address $41
lda #$07 ; colour number
ldy #$00 ; set index to 0
loop: sta ($40),y ; set pixel colour at the address (pointer)+Y
iny ; increment index
bne loop ; continue until done the page (256 pixels)
inc $41 ; increment the page
ldx $41 ; get the current page number
cpx #$06 ; compare with 6
bne loop ; continue until done all pages
This code will fill the screen up yellow colour. But we have to dive in a little deep to calculate the how long does this code take to execute assuming a 1MHz clock speed and calculate the memory usage.By analysis, this code should take 0.11319 seconds and 26 bytes to execute.
Optimization
How we can literally slices the execution time by more half? This is what optimized code looks like.
lda #$07
ldy #$00
loop:
sta $0200,y
sta $0300,y
sta $0400,y
sta $0500,y
iny
bne loop
New and different colour
To change the colour from Yellow to Pale Orange, we need to update whats on the accumulator.
lda #$07 ; old
lda #$08 ; new
To get different colours in every quater, we can explicitly specify the colour we want to use.
ldy #$00
loop:
lda #$12
sta $0200,y
lda #$13
sta $0300,y
lda #$14
sta $0400,y
lda #$15
sta $0500,y
iny
bne loop
To get random colour in each pixel, we can use the one-byte pseudo-random number generator (PRNG) at $fe.
ldy #$00
loop:
lda $fe
sta $0200,y
lda $fe
sta $0300,y
lda $fe
sta $0400,y
lda $fe
sta $0500,y
iny
bne loop
Challenges
We were challenged with writing a program which draws lines around the edge of the display:
A red line across the top
A green line across the bottom (which I make it purple for no special reason)
A blue line across the right side.
A purple line across the left size.
lda #$07
ldy #$00
loop:
sta $0200,y
sta $0300,y
sta $0400,y
sta $0500,y
iny
bne loop
lda #$2
loop2:
sta $0200,y
iny
cpy #$20
bne loop2
lda #$5
ldy #$e0
loop3:
sta $0500,y
iny
bne loop3
lda #$6
ldy #$e0
loop4:
sta $0500,y
iny
bne loop3
Reflection
It’s really intresting to see how colors get changed with simple assembly instructions. But, personally, i didn’t found it to be simple, there were lot of hit and trials. It was for good, by this i get one step more deep in understanding how registers and memory interact and how performance is executed by instruction execution time. And, we learnt 6502 processor too: syntax.