Teensy Z80 Homebrew Computer – Part 6 – Asynchronous Clocking Fail

This is the sixth part of a series of posts detailing steps required to get a simple Z80 based computer running, facilitated by a Teensy microcontroller. It’s a bit of fun, fuzing old and new hobbyist technologies. See Part 1, Part 2, Part 3, Part 4, and Part 5, if you’ve missed them.

Attempt 1

Making TeensyZ80 run with a faster, asynchronous clock seems a simple change at first, but it’s proving tricky. The high level plan is:

  1. The Clock signal is provided by another source (arduino nano at present)
  2. The MREQ and IOREQ lines are used to latch the WAIT line of the Z80 to allow the Teensy to respond to the request.
  3. The Teensy senses the WAIT line, performs any actions, and then resets the latch to bring the WAIT line high again (it’s active low).
  4. The Z80 continues as normal.

So with some simple 74 series logic, the MREQ and IOREQ pins are NAND’d together, producing a rising signal edge if either Z80 output go active low. This is fed into a 74HC74 flip flop as it’s clock, with the data pin tied logic high. This allows us to connect the Z80 WAIT input to the notQ output. The clear pin of the d-type flip flop is connected to the Teensy so it can reset it and allow the WAIT line to return high, letting the Z80 continue.

I had the Teensy set up to perform an interrupt routine on a falling edge of WAIT. Sadly, this didn’t seem to work. In fact, I could not confirm the interrupt was being called at all. I’ll have to look into this in detail but using interrupts really is an optimization in this case, so I soldiered on.

Teensy Rant

I’ve had several problems with Teensy microcontrollers during these posts. I had two units, one has completely bricked, and the other is very unstable. It seems to be due to the fact that if Pin 33 is low and an input when a program is uploaded to the Arm then the Mini54 chip can fail in some way. The Mini54 chip controls the bootloading process of uploading new code, so it effectively bricks the device. It is an issue that should really be given more prominence as if there was an announcement stating pin33 should never be used in certain ways I’d have two fully working Teensy devices. But sadly, all the documentation still states it as a fully configurable digital pin capable of input and output.

End Teensy Rant

Instead of using an interrupt, to try to get something working I created a tight loop() function that didn’t do anything while WAIT was high. As soon as it detected a low signal, it would perform the actions required. I disabled Z80 mode-2 interrupts for now, and removed the I/O debounce code. A very simple example seemed to work – but it was still quite slow, despite an arduino nano driving a clock at around 200KHz which is faster than what the Teensy was providing when running in synchronous mode.

I tried a larger example, one which printed text to the console, and it was obvious something was not right – the output slowly became corrupted. However, there were signs of promise. I was able to input a 4MHz clock and things were failing/corrupting in a somewhat similar way. Still corrupt, but it was the same behavior.

4mhzThe problem

The issue was that I failed to include the RD/WR lines from the Z80 in my latching circuit. You can see from the timing diagram that, especially in the write cases we need to WAIT when those are active too, not just MREQ or IOREQ.

timing_memoryAttempt 2

I redesigned the latch circuit.

wait_line_latchThis worked a lot better! I could only use the I/O port which put characters to the screen, but it was running well – and my simple test program, which printed “Welcome to TeensyZ80!” in an infinite loop, was stable even at 1MHz. I’d love to break the MHz barrier for this, but given we’re still on a breadboard and I don’t have a scope capable of inspecting this to the detail some of the issues require, I may need to settle for much less. So this simple test at 1MHz is very encouraging. I tried clocking it at 1.5MHz, but some artifacts in the printing arose.

latchcircuitThe previous design with I/O ports

When implementing my serial, display and filesystem devices which are accessed as I/O reads/writes, I created a system which relied on implied state behind the scenes on the Teensy. To set the colour of the characters being printed to the screen, two writes to the same port would write high and low values. It’s even worse for the serial device, where you had to write command packets to the I/O ports followed by a variable amount of data. I think i’m going to need to redesign all of the previous work, to operate on separate ports. For example, there will be a ColourHi and colourLow port which together define the 16-bit colour of the console. It’s not much work, but is something I’d overlooked and will take time.

This is a very quick update to Teensy Z80 work, It’s still very much ongoing. I’m also working on another project involving the miniSpartan6+ FPGA board. That’s another bit of fun – who doesn’t want to design their own processor?

Let me know any thoughts, as always, via twitter @domipheus !

Teensy Z80 Homebrew Computer – Part 5 – Implementing preemptive multithreading

This is the fifth part of a series of posts detailing steps required to get a simple Z80 based computer running, facilitated by a Teensy microcontroller. It’s a bit of fun, fuzing old and new hobbyist technologies. See Part 1, Part 2, Part 3 and Part 4 if you’ve missed them.

setupAt the moment, whilst running slowly due to the lock-step synchronous nature of the clock driving the Z80 from the Teensy, we do have a fairly well spec’d out little machine. So, in this fairly short post (it was short, then I went and implemented more than expected!), I thought I’d delve a bit into software, and in particular, multithreading.

Booting Teensy Z80, running C code

Before that, though, I wanted to share how the Teensy Z80 boots up, and how I am now using the Small Device C Compiler (sdcc) to compile my program code. The steps of how the Teensy initializes it’s ‘Z80 RAM’ and resets the Z80 to start executing code is as follows:

  1. The Teensy starts up with the Z80 unclocked. The Teensy has a global array to represent the Z80 RAM address space (there is no ROM). The Teensy has it’s Z80 RAM initialized to a small bootloader binary, which is assembled at offset 0h, and usually sets the stack pointer, defines some global data such as the interrupt vector table, and then jumps to a known location higher in memory.
  2. In the Teensy setup() routine, after mounting the SD card volume, tries to locate ‘kernel.bin’.
  3. If it’s found, it is loaded into the Z80 ram array at a known location. If it’s not loaded, the RAM remains in the initial state, which at the moment simply puts ‘?’ to the top left of the screen.
  4. The last thing the Teensy setup() routine does is reset the Z80 and start clocking it, so it starts executing from PC 0x0000 when the loop() routine starts running.
  5. The Z80 is now in control.

Previously the initial Z80 bootloader was the whole program. My simple shell example in the previous post was implemented this way, but it was tedious needing to recompile the Teensy code and re-upload the sketch every time I made a small code change. Now the ‘kernel.bin’ binary is compiled from C using sdcc.

A concern with SDCC is that I’ve yet to find comprehensive ABI details such as calling convensions and register use for it’s Z80 backend, so I’ve just had to play it by ear. Otherwise, it does have some really nice extensions so that ports can be represented by C variables:

__sfr __at 0x03 ioConsolePutChar;
__sfr __banked __at 0x07FFF ioVRAMBankDisable;

This is really useful, especially the __banked version that uses the 16-bit I/O as explaned in part 4. You can then use the names as though they were byte variables. Writing to the console is as simple as:

  ioConsolePutChar = 'H';
  ioConsolePutChar = 'i';
  ioConsolePutChar = '!';

I’ve been writing a TeensyZ80.h with all of the port definitions, but I’ve kept everything in a single C file for the following multithreading example. To build the binary, we simply compile without the crt, at code offset the same as the bootloader expects (0x800 in the following cases). SDCC generates an ihx file, which you need to convert to a binary with with hex2bin. Putting that in the root of the SD card as ‘kernel.bin’ runs it automatically.

Time-slice multithreading

The multithreading I want is time-slice multithreading, where different threads only run for a certain time called the time slice, before being preemptively swapped for another thread.

The high level idea is we have the Teensy fire an interrupt to the Z80 each ‘time slice interval’ and the interrupt handler will then context switch to a new thread. That should be all we need, really. We’ll use the same mode-2 Z80 interrupts as before. For this example all other interrupt vectors have been disabled.

Global State

We need some state stored globally. For our example we will assume a maximum of four threads. For each thread, we need to know the function it starts at, the arguments to that function, some flags, and a context containing the current running state. We throw all that in a struct, and make an array for our 4 possible threads. We will fix the main process thread as the first thread in this array. Global state like this is fine for this implementation. We can guarantee certain access patterns to ensure we don’t get any nasty race conditions, and define rules as to who owns and can write the thread structures to prevent locking requirements.

typedef char zthread_t;
typedef int (*startFunc_t) (void*);

typedef struct internal_thread_s {
  startFunc_t startFunc;
  void* arg;
  char flags;
  char active;
  unsigned short stack_start;
  internal_context_t ctx;
} internal_thread_t;

internal_thread_t threads[MAX_THREADS];
char num_threads;
char current_thread;

The main part of getting multithreading working in this style will be the interrupt handler which is fired every timeslice switch. The handler takes the following shape:

  • Disable interrupts
  • Save the current running state
  • Choose the next thread to run
  • Restore the state of the new thread
  • Enable interrupts
  • Return to the location where we were in the new thread to continue execution

Each thread, as you can see from the internal_thread_t structure above, has it’s own stack area, defined by stack_start. 256 bytes are reserved to each thread for their stack at fixed locations in Z80 RAM.To make things easier, the hl, bc, af, de, ix and iy registers will be pushed to the threads own stack as the context save. The stack pointer itself will be saved to the thread structure within the ctx field, though a write to a scratch memory location ( aka, ld (_stackLocationScratch), sp). The program counter itself does not need explicit saving, as it’s already on the stack. When an interrupt is signalled on the Z80 INT pin, after the current instruction has completed, the PC of the next instruction is placed on the stack, and then a vectored jump through the interrupt table lands you in the interrupt handler routine. We can use this fact to restore the PC incredibly simply, by just returning from the interrupt routine, with the stack pointer that of the new thread we want to execute.

The simplest interrupt routine, which will do round robin scheduling, and assumes 4 active threads, is listed below.

short stackLocationScratch;
void ihdr_timer_timeSlice( void ) __naked {
  // save state (PC is already on stack from interrupt ack
    push hl
    push bc
    push af
    push de
    push ix
    push iy
    ld (_stackLocationScratch), sp
    ex af, af'

  // save stack to current thread ctx
  threads[current_thread].ctx.sp = stackLocationScratch;

  // Choose next thread to run (doesn't check
  // if they are in a running state)
  if (current_thread > MAX_THREADS) current_thread = 0;

  // load stack of next thread ctx
  stackLocationScratch = threads[current_thread].ctx.sp;

  // restore registers
    ex af, af'
    ld sp, (_stackLocationScratch)
    pop iy
    pop ix
    pop de
    pop af
    pop bc
    pop hl

The Z80 actually has two banks of registers internally. the exx instruction, along with the ex af, af’ instruction, swaps the current active bank. This is useful in case we needed lots of registers and wanted no stack, but not essential here. If the code to choose the next thread was any more complex, we would need to load in a stack pointer into the sp register for use in kernel routines, as to not use up the thread stack – which may be nearly full. The restore registers body of code is the mirror of the state save, so the reti instruction should find the PC on the stack that is correct for the thread we have swapped to, as that thread itself, upon entry to the interrupt routine, would have had it’s PC pushed to stack.

Starting Threads

Using this makes starting threads rather easy. When we create a thread, it’s flagged as ZTHREAD_NOT_STARTED, so it’s not selected in the scheduler within the interrupt handler. When the zthread_start function is called, we know the first time the thread can actually be started is when it’s selected within the interrupt handler. Looking at the handler, and how the restore of state for a thread is performed, we can construct the stack of this thread to make it look as though it was preempted exactly at the entry to the start function.

Knowing this, before setting the thread as ZTHREAD_RUNNING, if we populate the stack locations of the thread as per the table below, we can let the interrupt handler take care of the rest!

stackPreparing the thread_t structure within the zthread_start function then looks like:

  threads[handle].ctx.sp = ((unsigned short)threads[handle].stack_start)-18;
  stack = (short*)threads[handle].ctx.sp;
  stack[0] = 0; //  pop iy
  stack[1] = 0; //  pop ix
  stack[2] = 0; //  pop de
  stack[3] = 0; //  pop af
  stack[4] = 0; //  pop bc
  stack[5] = 0; //  pop hl
  stack[6] = (short)threads[handle].startFunc;
  stack[7] = (short)_TZL_thread_exited;
  stack[8] = (short)threads[handle].arg;
  threads[handle].flags = ZTHREAD_RUNNING;

With this set up, when our thread is selected to run by the round robin scheduler within the interrupt routine, the registers will all be set to 0, and then the return from interrupt instruction will load startFunc into PC for the next instruction fetch. From here, the calling conventions dictate the return PC is next on the stack, followed by the function arguments. Therefore when startFunc() returns, we will load the _TZL_thread_exited() function address into the PC, to begin the thread exit logic. At this moment, we can just ignore that function, and try out what happens if we launch some threads which simply print characters.

int startFunc_print(void* args) {
  char c = (char)args;
  while (1) {

int main( int argc, char* argv[] ) {

  zthread_t threadA;
  zthread_t threadB;

  zthread_create(&threadA, startFunc_print, (char*)(short)'A');
  zthread_create(&threadB, startFunc_print, (char*)(short)'B');


  while(1) {

  return 0;

As our thread ID 0 is fixed to the ‘main thread’ we will have that function begin by calling main. We simply make a special case of this thread, and set it up manually before calling directly into the thread, after enabling interrupts. By registering our interrupt handler at a vector which the Teensy fires every few hundred milliseconds, enabling interrupts starts the scheduler. Hundreds of milliseconds is a very long timeslice, but TeensyZ80 is running very slowly in a synchronous clock mode, so it’s only running itself at tens of kilohertz. A larger timeslice allows us to also see what is happening much more clearly. (for this video, the scheduler assumes only 3 threads)

Thread Joining

Joining threads is a basic operation that must be supported. Joining is the act of suspending one thread until another has completed or exited. We can implement this in a very simple way, by having a ZTHREAD_WAIT_JOIN state, in which the thread will not be scheduled to run, and then when other threads exit, we can check in the _TZL_thread_exited() function if threads exist in a wait state that are waiting for the thread that has just completed. If we find threads that have the ZTHREAD_WAIT_JOIN flag, with state_data set to our zthread_t handle, we can set their flag to be runnable, and clear the state_data.

void _TZL_thread_exited( void ) {
  char idx = 0;
  zthread_t thisThread = zthread_getThread();

  // if any threads are joining to us, tell them they can
  // continue now
  for (; idx < MAX_THREADS; idx++) {
    if ((threads[idx].flags == ZTHREAD_WAIT_JOIN)
      && (threads[idx].state_data == thisThread)) {
      threads[idx].flags = ZTHREAD_RUNNING;
      threads[idx].state_data = 0;

  // For now, just set the flag as free.
  // Really we should set as exited and we can then
  // look to get any return value.
  threads[thisThread].flags = ZTHREAD_HDL_FREE;

  // this thread ends here. halt so we can be swapped out.
  while (1) {

Halting the Z80 means that no code will run until the timeslice interrupt fires. It’s placed in a while(1) block in case another interrupt which is not for the scheduler is fired. we do not encounter this in our example, though.

A side effect of this is now we have waiting, we can deadlock by having two threads join to each other. We can actually check for this directly in the join() call, but there can be chains that are harder to decipher. We can add code to the scheduler that detects when there are no threads available to run, and signal a deadlock.

  // Choose next thread to run
  thread_schedule_counter = 0;
  do {
    if (current_thread >= MAX_THREADS) {
      current_thread = 0;
  } while ((threads[current_thread].flags != ZTHREAD_RUNNING)
    && (thread_schedule_counter <= MAX_THREADS));

  if (thread_schedule_counter > MAX_THREADS) {
    // swap to the kernel stack for this
      ; load the stack pointer to the kernel stack
      ld sp, #0x07F0


The panic_deadlock() function can print a message to the user along with some state about each thread for easy debugging. Note the stack is modified to be at a safe known location as the thread stacks may not have enough size left in them to call the panic function, and also we may want to debug them at a later date, so it’s best to leave them unchanged. The complete join function is below.

int zthread_join(zthread_t handle) {
  zthread_t thisThread = zthread_getThread();
  if (threads[thisThread].flags != ZTHREAD_RUNNING) {

  // if the thread we want to join with is marked
  // as free, assume it's already exited and so
  // return. This should be the exited flag, really
  if (threads[handle].flags == ZTHREAD_HDL_FREE) {
    return 0;

  if (threads[handle].flags != ZTHREAD_RUNNING) {
    if (threads[handle].flags != ZTHREAD_WAIT_JOIN) {

  threads[thisThread].state_data = handle;
  threads[thisThread].flags = ZTHREAD_WAIT_JOIN;


  return 0;


Critical Sections

There will be times that we do not want other threads to run, or when we are manipulating multiple bytes of data. Examples of this are writing to the screen, setting colour and the row/column we are writing to. Those functions are not thread safe. The join function, too, may be better within a critical section, except from the halt at the end. This is to ensure all threads have updated and consistent state before they have a chance to run. On the Z80, byte writes will actually be atomic, as the interrupt pin is only sampled after a whole operation has completed.

Critical sections can be implemented very easily: we simply disable interrupts for the duration we need. This will stop all other threads running and stop things that depend on interrupts, so we need to account for that, but it’s easy to add and perfectly fine for this use case.

The end result

We have thread_create, thread_start, thread_join, the ability to create critical sections, and a round robin scheduler. The test below, runs as to the video (apologies for shaky-cam!).

int startFunc_print2(void* args) {
  char c = (char)args;
  short num = 400;
  while (num--) {
  return 0;

int startFunc_print_deadlock(void* args) {
  char c = (char)args;
  char num = 140;
  while (num--) {

  // main thread always id 0
  ASSERT(! zthread_join(0));
  return 0;

int main( int argc, char* argv[] ) {
  zthread_t threadA;
  zthread_t threadB;
  zthread_t threadC;


  ASSERT(! zthread_create(&threadA, startFunc_print2, (char*)(short)'A'));
  ASSERT(! zthread_create(&threadB, startFunc_print2, (char*)(short)'B'));
  ASSERT(! zthread_create(&threadC, startFunc_print2, (char*)(short)'C'));

  ASSERT(! zthread_start(threadA));
  ASSERT(! zthread_start(threadB));
  ASSERT(! zthread_start(threadC));

  ASSERT(! zthread_join(threadA));
  ASSERT(! zthread_join(threadB));
  ASSERT(! zthread_join(threadC));

  con_putString(" Thread A,B & C has exited, main thread can continue to deadlock detection test! ");

  ASSERT(! zthread_create(&threadA, startFunc_print_deadlock, (char*)(short)'!'));
  ASSERT(! zthread_start(threadA));


  ASSERT(! zthread_join(threadA));

  while(1) {

  return 0;

Things we would want implemented next are true exiting of the threads, with return value capture. I’d call that a good enough implementation for Teensy Z80. I don’t think I’ll be making much use of threads in anything I write for this, especially given the current speed of the system. The next thing on my to-do list is to get Teensy Z80 faster.

Code as always is on my github. I hope you’ve been enjoying this Teensy Z80 project. If you have, let me know on twitter @domipheus!

Teensy Z80 – Part 4 – VRAM explained, display modes, simple shell.

This is the fourth part of a series of posts detailing steps required to get a simple Z80 based computer running, facilitated by a Teensy microcontroller. It’s a bit of fun, fuzing old and new hobbyist technologies. See Part 1, Part 2 and Part 3, if you’ve missed them.

setupI mentioned ‘VRAM’ in the last post, which really was just an area of ram which I specified to the teensy through a port. I’ve now got something a bit more serious set up, which is completely separate from main RAM. It’s accessed via the I/O ports, after a flag has been set.

At the moment, I have all but one of the address bus pins connected on the z80. This means I can address 32KB of ram. The screen which is connected to the Teensy via SPI is a 320×240, 16-bit colour unit. Sadly, this means a full size framebuffer for this screen would be an eye-watering 150KB! Even half this size at 160×120 full colour is 37KB. I cannot add the additional address bus pin for a 16-bit address space due to running out of I/Os on the Teensy. I have a single one left, and it’s needed for something I hope to explain in the next few posts. I can use a 256 colour palette, which brings the memory requirements for 160×120 down to 18KB, but it’s still a large chunk of memory which can no longer be used for programs.

VRAM as a second address space

So I decided to use 16-bit (15 in my case)  i/o addressing to enable a secondary 32KB address space – to use for VRAM. The Z80 in/out instruction in which the port is the C register actually places register B onto the top half of the address bus, allowing access to the full address space. We have a specific entry in the standard 256 port I/O space which is used to set a flag which the Teensy interprets as an instruction to treat all further I/O requests as writes into a special VRAM memory. I then have the highest port possible (0x7FFF) as the disable VRAM port. Reading from this port resets the Teensy and I/O operations return to their standard state. This allows a completely separate memory space for VRAM, which allows for all of main RAM to remain for programs and data.

timing_ioThere is a downside to this – I/O writes have an additional wait cycle automatically inserted, so they are slower than normal RAM writes. Additionally, things such as loading images from SD cards into VRAM would need to go via RAM, unless additional flags are inserted into the file system requests to specify what memory spaces the buffers refer to. However, I do think that those downsides will be insignificant when I try to make the Z80 clock asynchronously with the Teensy operations, as there are likely to be many wait states for RAM as well as I/O operations forced by the use of the WAIT input to the Z80.

On the Teensy, the code for this is very simple. We have a second global array to use as the VRAM storage, and then have a ioVramBankSet flag which we check on i/o operations.

#define PORT_VRAM_BANK_SET       0xC8
#define PORT_VRAM_BANK_RESET     0x7FFF 

byte Z80_VRAM[Z80_VRAM_LENGTH] = {0};

void loop() {
  unsigned short portAddress = addressBus & 0x00FF;
  if (RD_val) {
      if (ioVramBankSet && ( addressBus == PORT_VRAM_BANK_RESET))
        // PORT_VRAM_BANK_RESET is a special case 16-bit port
        ioVramBankSet = 0;
  } else if (WR_val) {

      if (ioVramBankSet)
        Z80_VRAM[addressBus] = dataBus;
      else if (portAddress == PORT_VRAM_BANK_SET)
        ioVramBankSet = 1;

The above code is really all we need for this. The upside of using this I/O style system instead of say, RAM banking, is that the instruction stream and source data can remain in standard RAM and we do not need to do any mapping of the address space which would restrict us significantly with only 15 address bits. Now we need to make use of the data which is stored in that space for graphics!

Display Modes

I mentioned earlier the amount of memory needed for various resolutions and colour depths. The simple fact is that the Teensy 3.1 microcontroller I’m using only has 64KB ram. Within that, we need the Z80 RAM, VRAM, and then working memory for the teensy itself – for driving the display, and working with the SD card and handling the FAT filesystem. This pretty much means 160×120 8bpp is really the maximum we can achieve. When combined with a 256 entry palette, we can get a very generous range of colours, and come in at less than 20KB. So we’ll have the VRAM set to 20KB.

The first and most generous mode is as above, 160×120, with a 256-entry 16-bit colour palette. This is laid out in vram with the first 512bytes as the palette, and after that the pixel data. This remains true for all display modes to simplify implementation. There are modes additionally for whether the display is stretched or not. If it is not stretched, the offset in the TFT will be configurable so you can move it around the screen and combine it with console text. As I write this, the following modes are supported:

  • 40×30, 16bpp
  • 48×48, 16bpp
  • 48×48, 8bpp, 16-bit palette
  • 80×60, 8bpp, 16-bit palette
  • 160×120, 8bpp, 16-bit palette

The mode is set by an index value which is written to an IO port. A draw port exists, and a write to it initiates a full screen redraw. The data bus value is ignored. I may implement a sub-screen redraw which acts on a set area of the screen later as an optimization.

An additional feature is that the palette has an offset associated with it, which wraps the 256 bytes. So, to implement the palette-shifting effects of plasma, etc, it’s an incredibly easy hack. It also means that when I implement modes with smaller bits per pixel indices, there can be multiple palettes stored that can be switched with a single i/o write.

That is the trick used in the plasma example shown in the middle of this video. The Z80 is running slow at around 2KHz (remember, everything is still synchronous).

The Z80 code

The code is very simple to load pixel data into the VRAM space from RAM, and to do plasma palette cycling.

  ld a, 6
  out (PORT_VRAM_SETMODE), a    ; 'vram' displaymode 6: 80x60

  ; we can put out BC now to write VRAM
  ld bc, 0200h                  ; pixel mem offset, after palette
  ld hl, 012c0h                 ; size  of pixel data (80x60)
  ld de, image_pixels_80x60     ; pixel data in binary section
  call ram_2_vram

  ld bc, 0000h                  ; pixel mem offset, palette
  ld hl, 0200h                  ; size  of palette data (256 2-byte entries)
  ld de, palette_defn           ; pixel data in binary section
  call ram_2_vram

  ld hl, 0
  inc hl
  ld a, l
  out (PORT_VRAM_PALETTE_IDX), a     ; inc palette idx
  ld a, 0
  out (PORT_VRAM_DRAW), a            ; draw vram
  jr cycle_palette_idx

This will load the pixel and palette data which are stored in the binary already, into VRAM. The PORT_VRAM_PALETTE_IDX I/O port sets the ‘palette offset’ so it can be rotated incredibly easily, and PORT_VRAM_DRAW draws the contents of VRAM to the display given the current display mode set via the PORT_VRAM_SETMODE port.

  ; de = src in ram, bc = vram offset, hl = size
  push de
  push bc
  push hl
  push af
  ld a, 1

  ld a, (de)
  out (c), a
  dec hl
  inc de
  inc bc

  ld a, h
  or l
  jr nz,ram_2_vram_loop  

  ; return to non-vram
  in a, (c)
  pop af
  pop hl
  pop bc
  pop de

The ram_2_vram function shows how the VRAM memory space is enabled with the PORT_VRAM_BANK_SET port, and disabled with a PORT_VRAM_BANK_RESET read. Also note how I disable interrupts within this function – as interrupts may use i/o ports themselves, for instance the serial receive ‘data available’ interrupt, it’s important to disable interrupts whenever the VRAM is enabled for writing. Other than that, it works a treat!

These video modes also allow for some fun error screens. For example, if the SD card is not mounted correctly (and, on boot the teensy tries to locate a kernel.bin on the SD, so it needs to be there) we get a nice error graphic. This one uses the 40x30x16bpp mode, but I’ll make it a lot smaller soon with a 16-colour palette mode.

sd_errorA simple shell

You can also see from the video above that I have a very basic shell working, it simply takes input characters from serial and runs programs matching those names from SD. It does no argument passing, all it checks is if a file exists with that name, and if it does, loads it into offset 0x1000 of memory, before calling into it. The ls, cls and plasma binaries are all simply made in Z80 assembly and have no dependencies on any features that a kernel may need to provide.

At the moment, I do have the sdcc C compiler running with a compiled ‘kernel’ which allows for some real operating system services and true program loading with arguments to main, system calls, etc. Watch this space! I’ll talk more about that soon 🙂 Code as always is on my github. A full schematic is incoming, but it’s not difficult to decipher if you want to make your own.

I hope you’ve been enjoying this Teensy Z80 project. If you have, let me know on twitter @domipheus!

Part 5 is now available!


Teensy Z80 – Part 3 – File System, SD Card, VRAM?

This is the third part of a series of posts detailing steps required to get a simple Z80 based computer running, facilitated by a Teensy microcontroller. It’s a bit of fun, fuzing old and new hobbyist technologies. See Part 1 and Part 2, if you’ve missed them.

Now we have the base Z80 working, interrupts and a display connected which can be manipulated in a console/terminal fashion using the Z80 I/O ports. The next step? File storage!

The obvious choice for file storage here is an SD card. It uses ~3v3 logic, which is what we are running everything with, and also uses the SPI bus, which we already have set up for our LCD screen. We’d need another pin on the Teensy for the SD chip select, and also another for the MISO line reading data from the SD – the LCD only ever used MOSI for input.

schematic_tft_sdA peek of what is covered in this post:

Interfacing for SD cards

Now, as I keep stressing, this is just a little fun exercise. So to make things (a lot) easier, the Teensy will actually handle all of the FAT file system behind the scenes work. I’m sure I could get it all ported, or find a Z80 FAT16 implementation already, but I like the pace this project is moving at – so we will cheat more!

I exposed 3 I/O ports to the Z80. I could probably combine them, but for now, I will stick with three:

  1. Opening/closing files
  2. Read/writing files
  3. Performing ‘nextfile’ operations on directories

I won’t go into too much detail on the Teensy side of things. The code is all available on my github project page if you want to look. I Teensy code uses SdFat, which means I need to license the Teensy Z80 code as GPL (for those who don’t know, my stance on GPL is “ugh“, but will abide by its demands).

I will, however, detail the I/O ports, commands and the data structures used to communicate the operations. The major file system functions – open, read/write, next – are implemented in such a way that you place required information in a section of memory, give that memory address to a port, and then tell it to execute the operation. So, for the ‘Open’ command, we set aside an area of memory with the following structure:

  openfile_cmd_data {
 0:    uint8_t error;    // operation writes
 1:    uint32_t size;    // operation writes
 5:    uint8_t type;     // operation writes
 6:    uint8_t flags;    // operation reads
 7:    char    name[13]; // operation reads, 8.3, null-terminated

For open, we provide the name and flags in the structure before initiating the open command. Flags are whether we are opening for reading, writing, appending, etc. The open command itself will write the error, size and type fields. Error and size are self explanatory, the type field is for extra information, such as if this file is actually a directory.

Performing this in the Z80 assembly looks like the following:

      ; definitions of ports, commands, flags required
OPEN_READ               equ 0


      ; area of memory for openfile_cmd_data
  defb 0ffh, 0,0,0,0,  0,  OPEN_READ,  'README.TXT',0,0,0

  ld a, FILESYS_OPEN_SETMEMPTR       ; the 'Set Memory Pointer' command
  out (PORT_FILESYS_OPEN_CLOSE), a   ; tell the port we are giving it memory
  ld de, filesys_readme_open_read
  ld a, e
  out (PORT_FILESYS_OPEN_CLOSE), a   ; give the port 8 bits of address
  ld a, d
  out (PORT_FILESYS_OPEN_CLOSE), a   ; give the port the other 8 bits
  out (PORT_FILESYS_OPEN_CLOSE), a   ; initiate the OPENFILE command
                                     ; - operation is immediate.
  ld a, (de)                         ; load the error byte
  or a
  jr nz, Lfile_fail                  ; if non-zero jump to fail handler


I’ve left out the previous step to this, which is actually opening (and closing) the root directory. It’s implemented as a special case of open where “/” is the filename. But you can see the code is very straightforward.

We can now use the information written at filesys_readme_open_read to help read the file contents. The file size is written to the 4 bytes after the error byte, so we cam read that in as to see how much memory we need to read the whole file into memory.

  ; assume < 256b file for now
  ld de, filesys_readme_open_read + 1
  ld a, (de)

It we read the first byte, as the system is little-endian we will read the file size if the file is less than 256 bytes. We’ll assume README.TXT is (well, we know it is) and so for simplicity will just read this once.

With this information, we can set some memory aside to store the file contents. We could use the stack by removing the bytes from there, or in my case, just have a fixed address in memory where program RAM can begin. We now need to fill out a memory read/write request structure:

   read_write_command {
 0:    uint8_t error_code;      // writes
 1:    uint8_t op_type;         // reads, CMD_READ(0) or CMD_WRITE(1)
 2:    uint16_t file_offset;    // reads, ignored if OPEN_APPEND
 4:    uint16_t block_size;     // reads,
 6:    uint16_t mem_buffer_ptr; // reads,

The file_offset is where you’d typically seek() to before a read. The block size is the size of the read/write request, and the mem_buffer_ptr should point to a further area of memory that is at least of size block_size for the file system to write into (or read from given a write instruction).

  ; error_code, op_type, file_offset_lo,file_offset_hi,block_size_lo,block_size_hi
  defb 0ffh, 0,  0,0,  0,0
  defw scratch_mem

  dc 128,0 ; 128 bytes of zero for scratch memory

After opening the file, we can insert the size of the file into the block_size_lo part of the data, to read the whole file starting from offset 0.

  ; assume < 256b file for now
  ld de, filesys_readme_open_read + 1
  ld a, (de)

  ld hl, filesys_read_request+4
  ld (hl), a

Now we use the same style of port i/o to provide the read/write port with the location of this structure in memory, and fire off the EXEC command to perform the operation.

  out (PORT_FILESYS_READ_WRITE), a   ; give the i/o port the command ptr
  ld de, filesys_read_request
  ld a, e
  out (PORT_FILESYS_READ_WRITE), a   ; give the i/o port the command ptr
  ld a, d
  out (PORT_FILESYS_READ_WRITE), a   ; give the i/o port the command ptr
  out (PORT_FILESYS_READ_WRITE), a   ; execute the command
                                     ; - operation is instant
  ld a, (de)                         ; load the error byte
  or a
  jr nz, Lfile_fail                  ; if non-zero jump to fail 

        ; scratch_mem now contains file content, ascii text
  ld de, scratch_mem
  call print_string                  ; print that content to the screen
  call newline

  out (PORT_FILESYS_OPEN_CLOSE), a   ; close README.txt

This setup allows for quite a decent amount of functionality. I can read files much larger than the available RAM (16KB as I write this) due to providing the seek location in the file as a file_offset in the read command. Writing acts exactly the same, except that scratch_mem would contain what I wanted to write to the file, and the file would be opened for writing, and the command op_type set CMD_WRITE.

setupDirectory Traversal

I skipped the fact that before opening README.TXT I had to open the root directory. In this system, directories are simply files. You need to be in the correct working directory to open a file, the directory does not form part of the open request. In this way, internally a directory tree can be kept on the Teensy. I’ve not implemented this fully yet, as for now, a flat filesystem really is enough for me.

The NEXT command allows the discovery of files in a directory, like most other file systems. Unlike the other file operations, this operation has no arguments, and simply operates on the current open file – if it’s a directory!

  getnext_output {
 0:    uint8_t  error;
 1:    uint32_t filesize;
 5:    uint8_t  flags;
 6:    char     name[13]; // null-terminated

For this request, since this operation only writes memory, we can use the scratch_mem location from earlier, and initiate the GETNEXT command. We initiate GETNEXT operations until the error byte becomes non-zero.

  out (PORT_FILESYS_NEXT), a    ; set the memory area to scratch_mem
  ld de, scratch_mem
  ld a, e
  ld a, d
  ld hl, scratch_mem
  ld de, scratch_mem + 6        ; scratch_mem+6 is char name[13]
  call newline                  ; new line on the console
  out (PORT_FILESYS_NEXT), a    ; initiate the GETNET command
  ld a, (hl)
  or a
  jr nz, nomorefiles            ; non-zero error? no more files.
  call print_string             ; prints string at de (scratch_mem+6)
  jr getnextfile
  ld a, FILESYS_CLOSE_FILE      ; close the directory file


That really is all there is to it! At present, only one file can be open at any one time. This is quite limiting, but for now, it will do. I can always add some sort of file descriptor system later.

Lets display an image!

The filesystem test I wrote lists the root directory, reads README.TXT, writes ‘0123456’ to TEST.TXT, and then reads a file speccy.565. Speccy.565 is an image file!

This was done fairly quickly, for some visual eye candy. speccy.565 is a tiny 48, 16-bit colour image of a ZX Spectrum keyboard. It’s in the 565 format, in that there are 5 bits for red, 6 bits for green, and 5 bits for blue. The file has no header, and the teensy is currently hard coded to have a 48x48x16bit ‘vram’. All I have working at the moment is a port for setting the start of this vram in memory, aligned to 256 bytes so we only need to set the top 8 bits of the address. I also have a port which ignores what is on the dataBus, it just initiates a draw of the vram to the LCD screen. It’s pretty primitive just now, but I hope to add more display and colour modes – as at the moment this 64×64 image in vram takes up a whopping 25% of the total RAM available to me!

The code looks like this:


  ... read speccy.565 into ram at VRAM_BEGIN ...

  ld a, 0
  out (PORT_VRAM_DRAW), a           ; Draw VRAM to screen

On the teensy, all we need is:

if (portAddress == PORT_VRAM_BUFFER_LOC)
  ioVramBuffer = ((unsigned short)dataBus) << 8U;
else if (portAddress == PORT_VRAM_DRAW)
  uint16_t* vram = (uint16_t*)&Z80_RAM[ioVramBuffer];

  // 48 X 48 test
  for (int y = 0; y < 48; y++)
    for (int x = 0; x < 48; x++)
      tft.drawPixel(VRAM_START_X+x, VRAM_START_Y+y, vram[y*48+x]);


The result:

This video is showing the real time execution of this test program on the Z80. It’s running probably around the 50KHz mark – I’m now beginning to think about an asynchronous clock – but that’s for yet another post 🙂

Wrapping up

That’s it for this post. We have a decent amount of filesystem functionality available, and I’ll be using that significantly in the future posts! I hope you’ve been enjoying this Teensy Z80 project. If you have, let me know on twitter @domipheus!

Part 4 is now available!