1. Overview

1.1. System

The default version of the system is made up of a controller, two nodes, a memory controller and a NoC. To keep things simple, only one node is shown in the images. The controller, node and memory controller are also referred to as components – more high level modules without any logic which should make the documentation more organized.

1.2. Licenses

There are a lot of different licenses used for the different parts. With exception of the NoC everything is licensed with a free software compatible license. The license of the NoC restricts the use to research.

1.3. Design decisions

  1. The whole system should be as easy to understand as possible. Code is often written in a very verbose way and clever tricks are avoided.
  2. It should be as reusable as possible. Every module of the system has a specific function an could be used in a different system. The additional communication needed is accepted.
  3. The NoC bridges are not considered to be part of the NoC, but of the nodes. This should make it easier to swap the NoC for a different one.
  4. It was made with simulation in mind. There are parts of the system (like the defines_xxx.vh files) that might be cumbersome to use for synthesis especially if the block designer is used.
  5. For FPGAs the only proprietary software considered is Vivado because as the old saying goes: You either love Vivado or you have never used Quartus Prime before.

1.4. Control scheme

The control is done over the NoC and uses the control module to record what programs the nodes should execute. The control contains a simple 32-bit flag register where each node is identified via it’s id and a 1 means that the node is currently busy. Additionally to this flag register an array holds the programs assigned to the nodes.

Once the system is started, the self-aware modules of the nodes start to read from the control to see if there are any programs their respective CPUs should execute. The controller can set programs for the nodes and once a program has been set the respective node is considered busy until the self-aware module of the node signals the completion.

The program is represented by its starting address in memory and returned to the self-aware module in the rdata field. If rdata is 0, there is no program set for the node. 0 is reserved and the start of the controller program.

Once the self-aware module reads a valid address it turns on the CPU and sets the AXI offset to the program address. This offset is needed as every CPU starts reading from address 0 and each program is compiled the same way. The offset to the address essentially moves all the reads and writes to the right memory space. So instead of reading from 0, the CPU reads form 0+offset.

The same signal that is used to turn on the CPU is also connected to the AXI_joiner and switches the AXI communication from the self-aware module to the CPU.

The CPU is now executing the program. Instead of reading from the control, the CPU is reading from the memory. The AXI_splitter takes care of sending certain requests to the control and some to the memory depending on the address.

While the CPU is running, the controller can read from the control to get the state of the nodes. This simply returns the busy flag register. There is no way for the controller to interfere with the execution in any way or terminate it. It is possible for the controller to set a program for a busy node, but this will never be executed.Once the CPU is finished it writes a certain value to a certain address. This is detected by the aptly named detector who signals the completion to the self-aware module. The self-aware modules turns the CPU off and writes a 0 to the control, representing that there is now no program set for this node. Once this request has been received, the control also changes the flag in the busy register where the controller will learn of the completion the next time it reads the busy flag register.

1.5. Known problems

There are a few problems that are known and tolerated at the moment. If something should not work in different situations (e.g. on FPGA) they might be the cause of the problem. Some might be related to a bug in the tools and are going to be reported once the issues can be condensed into a smaller examples.

1.5.1. HDL

  1. In the contoller component one AXI Light interface requires a parameter.
    The line in question: if_axi_light #( .AXI_WSTRB_WIDTH(`AXI_WSTRB_WIDTH) ) if_axi_light_debugger();
    Without the parameter this causes the following error:
    %Error: Internal Error: ../../rtl/ ../V3LinkDot.cpp:1317: No symbol for interface alias rhs
    Solution: Just provide the parameter as this is only redundant information.
  2. The read_resp task in the AXI Light interface requires non-blocking assignments (<= instead of =)
    The lines in question: rresp <= t_rresp; rdata <= t_rdata;
    Solution: Ignore for the time being and hope for the best.
    Resolved: This problem apparently just disappeared. It was most likely a side effect caused by a different bug.

1.5.2. Software

  1. The debugging system causes a segmentation fault in the testbench.
    In the sim_main.cpp the chars from the debuggers are collected in a string and printed to the terminal once a newline has been received. On one computer this is not possible as the char array causes a segmentation fault.
    Solution: Print every char directly without collecting them in a string.
  2. The debug function print_dec() does not always work.
    Solution: Use print_hex instead.
  3. It is not possible to set the entry point during compilation. The compiler always defaults to calling the main function.
    Solution: The function in the assembly startup file has been renamed to main to make sure this one is called. The main in main.c has been renamed to my_main and called in the aforementioned assembly file.
  4. Stack pointer is used before it is set.
    When libraries are linked the stack pointer is used during some initialization before it can be set in the main function located in start.S.
    Solution: The stack pointer is set in the CPU and constant for every program.

2. Requirements

2.1. RISC-V GNU toolchain

Specifically the toolchain for RV32I.

The Makefiles expect the toolchain to be installed in /opt/riscv32i/. It is advised that the following guide is used for the installation:

2.2. Verilator

For simulating the system.

The newest version available is recommended:

2.3. GTKwave – optional

To display the tracefile and only used for debugging.

Any version your package manager offers should suffice.

3. Makefiles

3.1. Main

The main Makefile can be run from the project root.

make compiles the HDL
make run executes the simulation
make wave executes the simulation with a tracefile enabled
make clean removes the compiled simulation environment and any tracefile
make sw compiles all the programs and the controller software
make clean_sw removes all the compiler output of the programs and controller software
make programs compiles all the programs
make clean_programs removes all the compiler output of the programs
make controller compiles the controller software
make clean_controller removes all the compiler output of the controller software

3.2. Programs

Located in ./sw/programs

This Makefile is used by all the programs and should not be called from the ./sw/programs directory. Instead each program directory contains a Makefile where specific additions can be made like the inclusion of an addition library.

make small compiles the code for a small node (rv32i)
make big compiles the code for a big node (rv32im)
make clean removes compiler output

This Makefile produces many different files for debugging purposes. The file rv32i_main.hex and rv32im_main.hex are the ones used by the system.

3.3. Controller

Located in ./sw/controller

Similar to the software Makefile but separated should the need for a greater difference arise.

4. Configurations

Configurations are more elaborate ways to deal with different top files. Additionally to the top files they also contain the testbench and GTKwave tcl script. The way relatives path work in Verilator also means that the NoC .hex files should be placed in this directory.

To change the configuration a variable in the Makefile has to be changed.

There are currently two different configurations.

4.1. mult

This is the default multiprocessor system consisting of one small and one big node

4.2. single

A simple system using only one big CPU connected to a memory making it very suitable for debugging software.

5. Components

Components are made up of different modules to make them easier to handle and do not contain any logic themselves. They are listed here apart form the modules to make the documentation more organized.

5.1. controller

The controller is used to control the execution for the nodes. It reads instructions from the memory via the NoC like the nodes and sets the programs the nodes should execute. It contains a CPU as well as a slave bridge. Additionally a debugger can be added.

5.2. node

A node is used to execute the programs and comes it two versions: small and big.

The big node contains the mul and div additions or more specifically:
small = rv32i instruction set
big = rv32im instruction set

Additionally every node contains a self_awareness module as well as a slave bridge among other more minor modules.

A node has 2 ways to be identified.

  1. NoC interface id
  2. Control id

The NoC interface id is only used to determine the sender and receiver of the flits and is not used outside the NoC bridges. This is assigned by the NoC itself via a signal the node is connected to.

The Control id is used to assign programs to the node and is set via a parameter. If the controller sets a program for node 0, the node with this id will execute it.

5.3. memory_controller

The memory controller contains the memory as well as the master bridge.

In addition to the memory itself it contains the control that is used to record the programs the nodes should execute.

6. Interfaces

Located in ./rtl/interfaces/

Two interfaces are included to make connecting the different modules of the system easier.

6.1. AXI Light

The AXI Light interface does not only contain the port definitions, but also tasks that can be used to send and receive data easily. Tasks are made to be called repeatedly until the request or response has been completed. This is done to avoid timing constructs that might make a synthesis of the code difficult. Therefore each task returns a “done” signal indicating the completion of a request or response. This value is then used in the calling module to determine the next step e.g. to change a state.

6.2. NoC

In contrast to the AXI Light interface, the NoC one does not contain any tasks at the moment.

7. Modules

7.1. self_awareness

This modules is used to make sure that the nodes execute the programs that the controller has assigned to them. Once the system starts, the CPUs in the nodes are deactivated and the self_awareness starts to read from the control to see if there is a program set for the CPU to execute.

It is also responsible for turining off the CPU of a node once it finishes.

7.2. detector

Once a certain WDATA and AWADDR combination is detected an output signal is asserted for 1 clk cycle.

This is used to notify the control of the completion of the program execution.

Please consult the test program in ./sw/mul/ to see how it is used.

7.3. AXI_offset

The AXI_offset is used to change the address of AXI requests. this is used to change the address from which the CPU reads and can be used to define the program the CPU should execute.

Two version of this exist:

  1. Offset is an external signal.
  2. Offset is set via a parameter.

7.4. AXI_joiner

This modules selects one of two request sources to forward to one receiver. This allows two AXI masters to talk to one AXI slave.

The receiver is chosen by an input signal.

In the system this is used to change the module that is sending AXI requests to the NoC. In the beginning, the self-aware module is allowed to communicate and once a program has been set for the CPU the communication switches to the CPU.

7.5. AXI_splitter

This module splits AXI requests between two receivers. This allows one AXI master to talk to two different AXI slaves.

The address after which the requests are sent to receiver 2 (m_axi_1) is set via the CUT_OFF parameter.

It is used to forward some requests to the control module. Requests below the CUT_OFF are forwarded to the memory, while requests above CUT_OFF are sent to the control module that is responsible for recording the state of the nodes and as well as the programs that they execute.

7.6. slave_bridge / master_bridge

The bridge translates an AXI request / response into the NoC protocol and back.

The slave / master labels refer to the AXI interface part they have. A module with an AXI master interface has to be connected to a slave bridge (instead of a AXI slave).

7.7. control

Explained under the control scheme.

7.8. debugger

The debugger is listening for a specific write that uses a predefined address.

If this address has been detected, the write data is set to the top as well as a signal puls indicating the arrival of a new char to be outputted.

In the simulation environment this is handled in the sim_main.cpp file where the chars of the respective debugger modules are collected until a newline is received indicating that the debug string is complete. At this time the debug string is printed to the terminal.

7.9. flit_buffer

The flit buffer collects every flit that is received via the NoC. Once all the flits of one AXI requests have been collected all data of the flit is merged together and passed to the master bridge where the AXI request is extracted and reproduced.

Once the master bridge has received a response the response is merged and sent to the flit buffer which sends it back over the NoC.

7.10. NoC

A 2×2 NoC created with CONNECT ( CONfigurable NEtwork Creation Tool )
by Michael Papamichael:

For “Network and Router Options” only the following is supported by the bridges:

  • Router Type: Simple Input Queued (IQ)
  • Flow Control Type: Peek Flow Control

7.11. PicoRV32

A small RISC-V CPU
by Clifford Wolf:

8. Software

8.1. Programs

There are a few things to keep in mind:

  1. Due to difficulties to setting the entry point, the main function has to be called my_main at the moment.
  2. There are various print functions in the util.h (found in _libs) that can be used with the debugger. They should be used as little as possible as they can greatly increase the size of the program.
  3. At the end the function signal_fin() has to be called to signal the self_awareness, that the program execution is finished. An endless loop afterwards is recommended. This should be moved into the start.S in the future.
  4. The system does not support malloc. There is a library provided called memmgr (found in _libs) that can be used to replace the usual functions. Please have a look at dhrystone to see how it can be used.
  5. No optimization (-O0) is advised. -O1 optimizes the mul away and anything higher causes EBREAKs / ECALLs. Normally the latter gives control to an underlying system, but as nothing is there the CPU crashes.

8.2. Controller

The source code should tell you everything you need to know, especially the defines.h. If anything is unclear, please feel free to contact me.

9. FPGA (using Vivado)

A IP core can be created for each module or for each component. The latter requires less work and results in a more clear block design as less boxes have to be connected, but makes debugging using the Integrated Logic Analyzer more difficult.

To package an IP core please follow these steps:

  1. Create a wrapper that does not contain any interfaces as input or output as this is not allowed by Vivado. There is a wrapper for the AXI offset included you can use as a reference.
  2. Create a Vivado project and include the following files:
    1. The module and the wrapper
    2. Every interface used (found in /rtl/interfaces)
    3. The define files for the interfaces (found in /configurations/x like “”)
  3. Click on the files and make sure that they are recognized as the correct type under “Type” in “Source File Properties”. Should Vivado complain about assignments this might be the cause of the issue.
    1. The module, wrapper and interfaces should be “System Verilog”
    2. The define files should be “Verilog Header”
  4. Open all the System Verilog files and include the Verilog Headers at the beginning: e.g. `include “”. If the interfaces are not be shown under “Sources” in the “Hierarchy” tab, select the “Library” tab to find them.
  5. Make sure that every parameter has a default value.
  6. Synthesize the code to make sure it is working. If the Verilog Headers have not been included correctly the synthesis might still work. However you will get an error during the next step.
  7. Package the IP core with the following recommendation:
    1. Remove all the memory mapped stuff under “Addressing and Memory”. Vivado likes to assign everything AXI related a memory space. This should not be needed most of the time.


The CONNECT NoC requires special attention.

  1. Rename the .hex files to .data.
  2. Open the mkNetworkSimple.v file where the .hex (now .data) files are read and update the path.
  3. Create a Vivado project and include all .v and .data files.
  4. Synthesize the code to make sure it is working.
  5. Package the IP core.