This is part of a series of posts detailing the steps and learning undertaken to design and implement a CPU in VHDL. Previous parts are available here, and I’d recommend they are read before continuing.
Interrupts and exceptions are important events that any CPU needs to handle. The usual definition is that interrupts happen outside of the CPU – timer events, for example. Exceptions occur within the CPU, like trying to execute an invalid instruction. These events are handled all the time within a system, and whilst some signify faults and error conditions – most are just handling system functionality.
I mentioned earlier in the series how my previous CPU (TPU) had interrupt handling. However the way it was implemented needed significant modification to work in a RISC-V environment. RPU now supports many more types of exception/interrupt, and as such is more complex.
Before we go further, in the RPU code I use the term interrupt to refer to both interrupts and exceptions. Unless I explicitly mention exceptions it, assume I mean both types.
The Local Interrupt Unit
RPU will implement the timer interrupts as external, similar to how TPU did it. It will also support in invalid instruction, system calls, breakpoints, invalid CSR access (and ALU), and misaligned jump/memory. These generally fit into 4 categories:
- Decoder Exceptions
- CSR/ALU unit exceptions
- Memory Exceptions, and
- External interrupts
There are more subcategories to these, defined by an additional 32bit data value describing the cause further, but these 4 categories can fit nicely into 4 interrupt lines. The CPU can only handle one at a time, so with this in mind I created a Local Interrupt unit, the LINT unit, which will take all the various interrupt request and associated data lines, and decide which one actually makes its way into the control unit for handling. Internally, it is implemented as a simple conditional check of the different input categories, and then forwarding the data to the control unit, waiting for a acknowledge reset signal before going on to the next interrupt, if multiple were being requested at once. The LINT also handles ack/reset forwarding to the original input units.
With this unit complete, we can add an O_int and O_intData output, as well as an acknowledge input for reset, to our decoder unit. This will attempt to raise an exception and set the intData output to be the cause as defined by the RISC-V standard, which will let any interrupt handler know which kind of request – invalid instruction, ecall/system call, breakpoint – caused the exception.
The CSR unit from the previous part already has a facility to raise an exception – it can check the CSR op and address to ensure the operation is valid. For instance, attempting to write a read only CSR would raise an access exception. Whilst this is all implemented and connected up, the compliance suite of tests does not test access interrupts, so its not extensively tested. We will need to reinvestigate that once RPU is extended to fully support the different runtime privilege levels.
Memory exceptions from misaligned instruction fetch are found by testing the branch targets for having the 1st bit set. The 0th bit by the specification is always cleared, and we don’t support the compressed instruction set, so a simple check is all we need for this.
Lastly, external interrupts have the signal lines directly routed outside of the CPU core, so the SoC implementation can handle those. In the ArtyS7-RPU-SoC, timer interrupts are implemented via a 12MHz clock timer and compare register manipulated via MMIO. We could also implement things like UART receive data waiting interrupts through this.
Now, we know what can trigger interrupts in the CPU, but we need to lay down exactly the steps and dataflow required both when we enter an interrupt handler, and exit from it. The control unit handles this.
As a reminder – here is how a traditional external interrupt was handled when ported simply through to RPU from my old TPU project. You can see the interrupt had to wait until a point in the pipeline which was suitable, which is okay in this instance. However, exceptions require significant changes to the control unit flow.
Interrupt entry / exit
On a decision being made to branch to the interrupt vector – the location of which is stored in a CSR – several other CSR contents need modified:
- The previous interrupt enable bit is set to the current interrupt enable value.
- the interrupt enable bit is set to 0.
- the previous privilege mode is set to the current privilege
- the privilege mode is set to 11
- the mcause CSR is set to the interrupt data value
- the mepc CSR is set to the PC where the interrupt was encountered.
- the mvtval CSR is set to the location of any exception- specific data, like the address of a misaligned load.
On exit from an interrupt via mret, the previous enable and privilege values are restored. These csr manipulations will occur internal to the CSR unit, using int_exit and int_entry signals provided to it by the control unit.
The control unit
The previous TPU work implemented interrupts by checking for assertion of the interrupt signal at the end of the CPU pipeline, just before the fetch stage. This works fine for external interrupts, and it keeps complexity low, due to not having to pause the pipeline mid-execution. However, we now have different classes of interrupt which need mid-pipeline handling:
- Decoder interrupts
- CSR/ALU interrupts
- Misalignment/Memory interrupts
For decoder interrupts, we can check for an INT signal from the decoder, and ensure any memory/register writes don’t occur a few cycles later. The misalignment interrupts can be triggered at fetch and memory pipeline stages and are more complex.
In the previous part of this series, where I added CPU trace support, I discussed some of the logic flow that a decoder interrupt takes. It followed on that how different types of interrupt have higher priority, and a priority system is needed. The LINT was supposed to handle this priority system – and did in general – in that, decoder exceptions are higher priority than external exceptions. However, the LINT has no concept of where execution is in the pipeline, and how at certain execution stages actually require certain exceptions are handled immediately, regardless of how many cycles previously another interrupt was requested. This required another rather clunky set of conditions be provided to the LINT unit as a set of enables bits for the various types. Some enables were only active during certain pipeline stages. My decision to not separate out how Interrupts (like external timers) and Exceptions (CPU internal faults that must be immediately handled) has bitten me here by requiring specific enable overrides, and some more workarounds.
Memory Misalignment Exceptions
I thought the memory misalignment exceptions would be a fairly simple addition, however they presented an interesting challenge, due to the fixed timing that is inherent within the memory/fetch parts of the core.
Discovering whether a misalignment exception should assert is fairly simple, we can have a process which checks addresses and asserts the relevant INT line with associated mcause value to indicate the type of exception:
The LINT unit has a cycle of latency, and when it’s a memory exception we are talking about, that cycle of latency means it’s too late – the memory operation or fetch will have already been issued. The latency is acceptable for a decoder interrupt, because the writeback phase is still 2 cycles away and the interrupt will be handled by that point, avoiding register file side-effects.
The side effects of an invalid memory operation are hard to track, so instead we forward a hint to the control unit that a misalignment exception is likely to occur shortly. It’s rather clumsy, but this hint gets around the LINT latency and allows the control unit to stall any memory operations. This stall is just long enough for the operations to not issue, and the exception be successfully handled once the LINT responds.
These stalls are implemented by a cycle counter in the control unit, counting down a fixed number of stall cycles if a misalignment hint is seen. During each of these cycles, the interrupt signal is checked in case we need to jump to the handler. The control unit is definitely complicated significantly by these checks. There is a lot of copy and paste nonsense going on here.
Lastly, to further complicate things; my memory-mapped IO range of addresses (currently 0xF0000000 – 0xFFFFFFFF) has various addresses which I write from the bootloader firmware in an unaligned way. To fix this I’ve excluded this memory range from the misalignment checks. I’ll fix it another time.
So now we have this align_hint shortcut which bypasses the LINT and allows for correct handling of the various memory exceptions.
An issue that was discovered during a simulation of the decoder interrupts specifically for ecall and ebreak. The method for acknowledging interrupt requests was that the LINT would assert an ACK signal for the unit it selected, and that unit would then de-assert it’s INT signal. The problem is that I’d placed this de-assertion insider the rest of the decoder handling process, which was only active when the pipeline decoder enable was active. By the time the LINT wanted to acknowledge the decoder interrupt, the enable was not active. This resulted in an infinite loop of interrupt requests from the decoder, which was not much use!
Another bug was that I was latching the wrong PC value to the mepc register. This was not caught until I started debugging actual code, but would have shown up pretty obviously in the simulator. The fix was to not grab the current PC value for mepc but to latch the correct value at the time the interrupt was fired.
Lastly, as I was testing the riscv-compliance misalignment exception test, I realised an exception was being raised when it shouldn’t. Turns out I had missed a point in the ISA spec, whereby jump branch targets always have bit 0 masked off. An easy thing to fix, but annoying I had this bug for so long.
But, RPU now passes risc-v compliance 🙂
With the interrupt support now in the CPU design, I have realised just how many mistakes I have made in this area. Not separating out different types for interrupts and exceptions (which need handled immediately in the pipeline) meant the LINT needed these ugly overriding hints for the control unit in order to operate correctly. It all seems a bit messy.
However, it does work. I can fix its design later, now I have a working implementation to base changes off of.
Thanks for reading, as always – feel free to ask any follow up questions to me on twitter @domipheus.