Designing a CPU in VHDL, Part 1: Rationale, tools, method

Why design my own CPU, with associated ISA, assembler and other tools?

Because, I can! Why not? I’ll learn a load of stuff!

The above is the fundamental reason for this series of posts. As a software developer, and in particular, a compiler/debugger engineer, you are exposed to low level architectural details, latencies, hazards and of course, hardware bugs. In the past I’ve been part of teams who have been able to feedback details of architectural quirks that, if modified, can improve throughput in certain workloads – sometimes, completely new features have been added to hardware due to feedback. However, as a software engineer, you are limited in exposure to what that actually boils down to at the hardware level. It’s an area of computer science which fascinates me and I’d very like to get more involved in. So, a few years ago I downloaded the Xilinx ISE webpack software and started to learn VHDL – a hardware description language (HDL).

VHDL, really, is simple. It’s the safer choice when it comes to HDL – going by what I’ve read. Verilog is the C99 of the HDL world, and you can get in quite a mess as a beginner if you don’t understand it well enough. So, starting as a novice to HDL concepts, VHDL was the obvious choice.

minispartan6Last year, I backed the miniSpartan6+ FPGA Kickstarter project. I now have the end product at home, based on the Xilinx LX25 Spartan6 FPGA. It’s a nice little board, and I’ve managed to flash a small hello world style LED blinker to it successfully. You can get many other types of board (even from Amazon), and entry level ones are pretty affordable.

Over the past two months, I’ve spent my train journey into work designing and implementing a very small 16-bit CPU. I’ve codenamed it TPU, for Test Processing Unit. In the next series of posts, I will be explaining how I’ve gone from an empty VHDL source file to a project which runs code processed through my c# assembler within the Xilinx ISim simulator. As I write this now, the project is running code under simulation with basic arithmetic operations, addition, branching and memory access. The end goal is to fix some issues identified during simulation, and get it on the miniSpartan6+ hardware.

I hope to learn much along the way whilst writing these articles. If you are an experienced hardware engineer and see me doing something a) stupid, b) inefficient, c) unwise or d) stupid, please do tell me by ways of twitter at @domipheus. Efficiency isn’t a goal of this, but I’d still like to know!

arrrrghinterconnectspaghettiIn many ways, the TPU can be classed the Terrible Processing Unit – but it needs to be this way. It is worth noting I’ve tried making a CPU design before, but always got into a spaghetti mess by trying to do too much, and not knowing the underlying gotchas of how to link all the internal components together.

This is not a superscalar processor. This will be a CPU that takes multiple instructions to execute the simplest of instructions. It’s aimed at a level to educate myself, and hopefully at a level others can gain knowledge from.

So, what we want from/in the TPU:

  • A 16-bit CPU core
  • At the start using synthesized ram but hopefully later using the SDRAM on the miniSpartan6+ board
  • An in-order CPU with no real pipelining as such
  • Basic arithmetic operations, shifts, additions
  • Branching, including conditional branches
  • A register file
  • A control unit to keep everything in lock-step
  • Design an ISA and create a small assembler
  • Ultimately, be an educational project to learn from

The tools I have used are the free (yes, I find it quite insane too) Xilinx WebPack tools. It comes with a very nice IDE, and associated toolchain – including a simulator. The other tools I will use are those for creating the assembler (Visual Studio C# – as with most of my hobby coding these days) and the tools to load files onto the miniSpartan6+ board.

As for a method, I’m just winging it. If problems appear, they will be solved.

The next part will be about implementing the ram and register file, and testing it in the simulator. Then I’ll discuss the ISA, in preparation for looking at the decoder. For now, here is a spoiler of the TPU in some simulator action, bonus points goes to the people who realise there is an odd thing about the form of the baz (branch if Ra is zero) instruction.

mul_sim

Thanks for reading, send all comments to @domipheus on twitter!

Ohh, and before I forget, this willl be completely open source. I’ll throw it up on github soon!

Part 2 in this series is now available.

Comments are closed.