This is part of a series of posts detailing the steps and learning undertaken to design and implement a CPU in VHDL. Previous parts are available here, and I’d recommend they are read before continuing.
Now we have text-mode HDMI/DVI-D output, it’s about time we started writing more code for TPU. However, we’ve not delved into too much detail yet about the memory subsystem – the part of the puzzle which reinterprets the various busses from the TPU module in VHDL and manages how data flows between different memories and/or mapped ‘registers’.
TPU memory interface
TPU has an address bus output, a data input bus and a data output bus. Generally CPUs have a single data bus and it’s bidirectional, but I opted for this current setup early on and have stuck with it.
For the most part, memory on the TPU ‘System on Chip’ is made up of Xilinx Block Rams. These are 2KB in size, and dual-ported, allowing them to be used as VRAM and TRAM (for an explanation of TRAM see the previous part in this series of posts). The rest of the memory subsystem is addressing logic for memory mapping the UART, switches, LEDs, and other peripheral I/O.
Due to most memory blocks being 2KB, memory is divided up into 2KB blocks/banks. The address bus selects a bank, which has it’s own chip select line.
-- Embedded ram MEM_CS_ERAM_1 <= '1' when (MEM_BANK_ID = X"0"&'0') else '0'; -- 0x00 bank MEM_CS_ERAM_2 <= '1' when (MEM_BANK_ID = X"0"&'1') else '0'; -- 0x08 bank MEM_CS_ERAM_3 <= '1' when (MEM_BANK_ID = X"1"&'0') else '0'; -- 0x10 bank MEM_CS_ERAM_4 <= '1' when (MEM_BANK_ID = X"1"&'1') else '0'; -- 0x18 bank -- system i/o maps MEM_CS_SYSTEM <= '1' when (MEM_BANK_ID = X"9"&'0') else '0'; -- 0x90 bank -- 4KB of font bitmap ram MEM_CS_FRAM_1 <= '1' when (MEM_BANK_ID = X"A"&'0') else '0'; -- 0xA0 bank MEM_CS_FRAM_2 <= '1' when (MEM_BANK_ID = X"A"&'1') else '0'; -- 0xA8 bank -- 4KB of text character ram MEM_CS_TRAM_1 <= '1' when (MEM_BANK_ID = X"B"&'0') else '0'; -- 0xB0 bank MEM_CS_TRAM_2 <= '1' when (MEM_BANK_ID = X"B"&'1') else '0'; -- 0xB8 bank
We have signals for the masked off address within any given bank, and the bank ID.
-- mem brams banks are 2KB. The following addresses within a BRAM MEM_2KB_ADDR <= MEM_O_addr and X"07FF"; MEM_BANK_ID <= MEM_O_addr(15 downto 11);
Then the block rams “ebram” entities are connected as so:
ebram_1: ebram Port map ( I_clk => cEng_clk_core, I_cs => MEM_CS_ERAM_1, I_we => MEM_WE, I_addr => MEM_2KB_ADDR, I_data => MEM_O_data, I_size => MEM_REQ_SIZE, O_data => MEM_DATA_OUT_ERAM_1 );
You’ll notice the data output from the ebram is to it’s own signal, MEM_DATA_OUT_ERAM_1. The actual signal that gets selected for input into the TPU core is chosen via a big asynchronous conditional:
-- select the correct data to send to tpu MEM_I_data <= INT_DATA when O_int_ack = '1' else MEM_DATA_OUT_ERAM_1 when MEM_CS_ERAM_1 = '1' else MEM_DATA_OUT_ERAM_2 when MEM_CS_ERAM_2 = '1' else MEM_DATA_OUT_ERAM_3 when MEM_CS_ERAM_3 = '1' else MEM_DATA_OUT_ERAM_4 when MEM_CS_ERAM_4 = '1' else MEM_DATA_OUT_ERAM_5 when MEM_CS_ERAM_5 = '1' else MEM_DATA_OUT_ERAM_6 when MEM_CS_ERAM_6 = '1' else MEM_DATA_OUT_ERAM_7 when MEM_CS_ERAM_7 = '1' else MEM_DATA_OUT_ERAM_8 when MEM_CS_ERAM_8 = '1' else MEM_DATA_OUT_FRAM_1 when MEM_CS_FRAM_1 = '1' else MEM_DATA_OUT_FRAM_2 when MEM_CS_FRAM_2 = '1' else MEM_DATA_OUT_TRAM_1 when MEM_CS_TRAM_1 = '1' else MEM_DATA_OUT_TRAM_2 when MEM_CS_TRAM_2 = '1' else MEM_DATA_OUT_VRAM_1 when MEM_CS_VRAM_1 = '1' else IO_DATA ;
We could implement all of this with bidirectional/tristate signals, but maybe that’s a discussion for another post. I’ve intentionally kept bidirectional communication to the minimum, as it can easily cause confusing situations.
So you can see it’s fairly easy to move things around and see how to attach block rams into the TPU ‘memory map’. But we also have I/O!
Memory mapped I/O
Part 9 showed how memory mapped I/O was handled, and that does not change at all. There is a process in the top-level module which monitors for memory requests at certain addresses, and manipulates the IO_DATA signal in the case of any memory reads. You can see above that the data into TPU selects IO_DATA when no memory selects or interrupts are active. If we add another peripheral, we simply edit the process hanling this part of memory to update the relevant signals for any given address.
if MEM_O_addr = X"9000" and MEM_O_we = '1' then -- onboard leds IO_LEDS <= MEM_O_data(7 downto 0); end if; if MEM_O_addr = X"9001" and MEM_O_we = '0' then -- onboard switches IO_DATA <= X"000" & IO_SWITCH; end if;
You’ll notice that the chip selects above don’t map all banks to block rams just now. It would be useful to know about memory locations that are not currently mapped, and that should be a simple case of making sure a chip select line is active on a memory command. If a chip select is not active, we can assume the address is unmapped, and request an interrupt to the TPU core.
First, we need to OR together all those chip select lines into one allseeing MEM_ANY_CS signal. Then, we check that signal in the I/O handler process – and if it is ever inactive during a memory operation, we know that we’re accessing unmapped memory.
MEM_proc: process(cEng_clk_core) begin if rising_edge(cEng_clk_core) then if MEM_readyState = 0 then if MEM_O_cmd = '1' then if MEM_ANY_CS = '0' then -- a memory command with unmapped memory -- throw interrupt MEM_Access_error <= '1'; MEM_Access_error_bank <= MEM_O_addr(15 downto 8); end if; ...snip...
In the code above, whats missing is that at the end of the memory command, we de-assert the MEM_Access_error signal. This means that if another process sees this MEM_Access_error signal as active, we can use that to request an interrupt.
Memory Interrupt Process
This is what the memory interrupt process checks for, and acts upon.
exception_notifier: process (cEng_clk_core, MEM_Access_error) begin if rising_edge(cEng_clk_core) and MEM_access_int_state = 0 then if MEM_Access_error = '1' then I_int <= '1'; MEM_access_int_state <= 1; INT_DATA <= X"80" & MEM_Access_error_bank; elsif MEM_access_int_state = 1 and I_int = '1' and O_int_ack = '1' then I_int <= '0'; MEM_access_int_state <= 2; elsif MEM_access_int_state = 2 then MEM_access_int_state <= 3; elsif MEM_access_int_state = 3 then MEM_access_int_state <= 0; end if; end if; end process;
This process checks each clock cycle for a memory access error, and if it notices one, it requests an interrupt, and saves the current memory bank into the lower half of the INT_DATA signal. This signal is what becomes the Interrupt Event Field, which is accessible in user code. We set he high byte of this to 0x80 – to identify the interrupt type to the interrupt handler. The rest of the code in this process is simply following the exception workflow – it waits for the interrupt acknowledge, and then waits for cycles of latency before completing.
With these changes, it now means if an un-mapped memory access occurs, our interrupt vector code is called, and when we issue the gief instruction to obtain the Event Field into a register, we’ll be able to know it’s a memory violation – and what bank the attempted access came from.
The beginnings of a BIOS
So, now that we have our text output, our UART, and a decent memory system, it’s time to start implementing a BIOS which we can leverage when building real TPU programs. So far, my bootloader contains several functions:
- Reset vector function (which just jumps to main)
- Interrupt vector function handler
- Main function
- mul16 – 16 bit multiply
- div16 – 16 bit divide with remainder
- putc – put character
- puts – put string
- setcursor – set the current cursor location in the textual screen
- setcolour – set the current color of any characters (Set the attribute byte)
- cls – clear screen
- uitoa – unsigned integer to ascii
- getc – get char from UART
At the moment, when printing the bios header (using a custom glyph set for a TPU logo), the size of this code amounts to around 1.1KB – which is pretty massive when you think about it.
You can see in that above image a memory figure – and this is checked at runtime. It iteratively searches through memory, reading every byte location, until an unmapped memory violation occurs. The memory test assumes that the first contiguous block of ram is the usable memory. With 8 2KB block rams connected to those first bank addresses, we have 16KB to play with.
This memory test really brought back old memories of how long you sometimes had to wait for the memory test to complete. The uitoa function relies heavily on divide/mod operations, and with software-only divide, things are slow. It’s a few seconds to work through that 16KB window. But, I quite like the fact it is slow enough that you can see the searches happen in real time.
So tempted to hook up the DAC for a POST beep… pic.twitter.com/l1tdBxm70u
— Colin Riley (@domipheus) May 16, 2016
With that, I was tempted to integrate a startup beep like old times. And, well, I’m going to do what @mmalex tells me to do in this instance!
@domipheus do it!
— alex (@mmalex) May 16, 2016
The audio is a simple square wave through the headphone jack which I’d forgotten existed on the miniSpartan6+ board. I now have a memory-mapped register which controls it’s operation, allowing you to activate the left or right output channels. I’ll add the ability to control the tone later – for now, it’s just a cool bios beep!
That pretty much explains the current memory subsystem in a bit more detail, and hopefully shows that TPU is now starting to really behave like an old vintage computer. I aim to develop the BIOS more, and have another post up my sleeve talking about a new instruction, and further BIOS progress.
Thanks for reading, as always let me know your thoughts via twitter @domipheus.