1. Overview
1.1. System
The default version of the system is made up of a controller, two nodes, a memory controller and a NoC. To keep things simple, only one node is shown in the images. The controller, node and memory controller are also referred to as components – more high level modules without any logic which should make the documentation more organized.
1.2. Licenses
There are a lot of different licenses used for the different parts. With exception of the NoC everything is licensed with a free software compatible license. The license of the NoC restricts the use to research.
1.3. Design decisions
- The whole system should be as easy to understand as possible. Code is often written in a very verbose way and clever tricks are avoided.
- It should be as reusable as possible. Every module of the system has a specific function an could be used in a different system. The additional communication needed is accepted.
- The NoC bridges are not considered to be part of the NoC, but of the nodes. This should make it easier to swap the NoC for a different one.
- It was made with simulation in mind. There are parts of the system (like the defines_xxx.vh files) that might be cumbersome to use for synthesis especially if the block designer is used.
- For FPGAs the only proprietary software considered is Vivado because as the old saying goes: You either love Vivado or you have never used Quartus Prime before.
1.4. Control scheme
The control is done over the NoC and uses the control module to record what programs the nodes should execute. The control contains a simple 32-bit flag register where each node is identified via it’s id and a 1 means that the node is currently busy. Additionally to this flag register an array holds the programs assigned to the nodes.
Once the system is started, the self-aware modules of the nodes start to read from the control to see if there are any programs their respective CPUs should execute. The controller can set programs for the nodes and once a program has been set the respective node is considered busy until the self-aware module of the node signals the completion.
The program is represented by its starting address in memory and returned to the self-aware module in the rdata field. If rdata is 0, there is no program set for the node. 0 is reserved and the start of the controller program.
Once the self-aware module reads a valid address it turns on the CPU and sets the AXI offset to the program address. This offset is needed as every CPU starts reading from address 0 and each program is compiled the same way. The offset to the address essentially moves all the reads and writes to the right memory space. So instead of reading from 0, the CPU reads form 0+offset.
The same signal that is used to turn on the CPU is also connected to the AXI_joiner and switches the AXI communication from the self-aware module to the CPU.
The CPU is now executing the program. Instead of reading from the control, the CPU is reading from the memory. The AXI_splitter takes care of sending certain requests to the control and some to the memory depending on the address.
While the CPU is running, the controller can read from the control to get the state of the nodes. This simply returns the busy flag register. There is no way for the controller to interfere with the execution in any way or terminate it. It is possible for the controller to set a program for a busy node, but this will never be executed.Once the CPU is finished it writes a certain value to a certain address. This is detected by the aptly named detector who signals the completion to the self-aware module. The self-aware modules turns the CPU off and writes a 0 to the control, representing that there is now no program set for this node. Once this request has been received, the control also changes the flag in the busy register where the controller will learn of the completion the next time it reads the busy flag register.
1.5. Known problems
There are a few problems that are known and tolerated at the moment. If something should not work in different situations (e.g. on FPGA) they might be the cause of the problem. Some might be related to a bug in the tools and are going to be reported once the issues can be condensed into a smaller examples.
1.5.1. HDL
- In the contoller component one AXI Light interface requires a parameter.
The line in question: if_axi_light #( .AXI_WSTRB_WIDTH(`AXI_WSTRB_WIDTH) ) if_axi_light_debugger();
Without the parameter this causes the following error:
%Error: Internal Error: ../../rtl/controller.sv:40: ../V3LinkDot.cpp:1317: No symbol for interface alias rhs
Solution: Just provide the parameter as this is only redundant information. -
The read_resp task in the AXI Light interface requires non-blocking assignments (<= instead of =)The lines in question: rresp <= t_rresp; rdata <= t_rdata;Solution: Ignore for the time being and hope for the best.
Resolved: This problem apparently just disappeared. It was most likely a side effect caused by a different bug.
1.5.2. Software
- The debugging system causes a segmentation fault in the testbench.
In the sim_main.cpp the chars from the debuggers are collected in a string and printed to the terminal once a newline has been received. On one computer this is not possible as the char array causes a segmentation fault.
Solution: Print every char directly without collecting them in a string. - The debug function print_dec() does not always work.
Solution: Use print_hex instead. - It is not possible to set the entry point during compilation. The compiler always defaults to calling the main function.
Solution: The function in the assembly startup file has been renamed to main to make sure this one is called. The main in main.c has been renamed to my_main and called in the aforementioned assembly file. - Stack pointer is used before it is set.
When libraries are linked the stack pointer is used during some initialization before it can be set in the main function located in start.S.
Solution: The stack pointer is set in the CPU and constant for every program.
2. Requirements
2.1. RISC-V GNU toolchain
https://github.com/riscv/riscv-gnu-toolchain
Specifically the toolchain for RV32I.
The Makefiles expect the toolchain to be installed in /opt/riscv32i/. It is advised that the following guide is used for the installation:
https://github.com/cliffordwolf/picorv32
2.2. Verilator
https://www.veripool.org/wiki/verilator
For simulating the system.
The newest version available is recommended:
https://www.veripool.org/projects/verilator/wiki/Installing
2.3. GTKwave – optional
http://gtkwave.sourceforge.net/
To display the tracefile and only used for debugging.
Any version your package manager offers should suffice.
3. Makefiles
3.1. Main
The main Makefile can be run from the project root.
make | compiles the HDL |
make run | executes the simulation |
make wave | executes the simulation with a tracefile enabled |
make clean | removes the compiled simulation environment and any tracefile |
make sw | compiles all the programs and the controller software |
make clean_sw | removes all the compiler output of the programs and controller software |
make programs | compiles all the programs |
make clean_programs | removes all the compiler output of the programs |
make controller | compiles the controller software |
make clean_controller | removes all the compiler output of the controller software |
3.2. Programs
Located in ./sw/programs
This Makefile is used by all the programs and should not be called from the ./sw/programs directory. Instead each program directory contains a Makefile where specific additions can be made like the inclusion of an addition library.
make small | compiles the code for a small node (rv32i) |
make big | compiles the code for a big node (rv32im) |
make clean | removes compiler output |
This Makefile produces many different files for debugging purposes. The file rv32i_main.hex and rv32im_main.hex are the ones used by the system.
3.3. Controller
Located in ./sw/controller
Similar to the software Makefile but separated should the need for a greater difference arise.
4. Configurations
Configurations are more elaborate ways to deal with different top files. Additionally to the top files they also contain the testbench and GTKwave tcl script. The way relatives path work in Verilator also means that the NoC .hex files should be placed in this directory.
To change the configuration a variable in the Makefile has to be changed.
There are currently two different configurations.
4.1. mult
This is the default multiprocessor system consisting of one small and one big node
4.2. single
A simple system using only one big CPU connected to a memory making it very suitable for debugging software.
5. Components
Components are made up of different modules to make them easier to handle and do not contain any logic themselves. They are listed here apart form the modules to make the documentation more organized.
5.1. controller
The controller is used to control the execution for the nodes. It reads instructions from the memory via the NoC like the nodes and sets the programs the nodes should execute. It contains a CPU as well as a slave bridge. Additionally a debugger can be added.
5.2. node
A node is used to execute the programs and comes it two versions: small and big.
The big node contains the mul and div additions or more specifically:
small = rv32i instruction set
big = rv32im instruction set
Additionally every node contains a self_awareness module as well as a slave bridge among other more minor modules.
A node has 2 ways to be identified.
- NoC interface id
- Control id
The NoC interface id is only used to determine the sender and receiver of the flits and is not used outside the NoC bridges. This is assigned by the NoC itself via a signal the node is connected to.
The Control id is used to assign programs to the node and is set via a parameter. If the controller sets a program for node 0, the node with this id will execute it.
5.3. memory_controller
The memory controller contains the memory as well as the master bridge.
In addition to the memory itself it contains the control that is used to record the programs the nodes should execute.
6. Interfaces
Located in ./rtl/interfaces/
Two interfaces are included to make connecting the different modules of the system easier.
6.1. AXI Light
The AXI Light interface does not only contain the port definitions, but also tasks that can be used to send and receive data easily. Tasks are made to be called repeatedly until the request or response has been completed. This is done to avoid timing constructs that might make a synthesis of the code difficult. Therefore each task returns a “done” signal indicating the completion of a request or response. This value is then used in the calling module to determine the next step e.g. to change a state.
6.2. NoC
In contrast to the AXI Light interface, the NoC one does not contain any tasks at the moment.
7. Modules
7.1. self_awareness
This modules is used to make sure that the nodes execute the programs that the controller has assigned to them. Once the system starts, the CPUs in the nodes are deactivated and the self_awareness starts to read from the control to see if there is a program set for the CPU to execute.
It is also responsible for turining off the CPU of a node once it finishes.
7.2. detector
Once a certain WDATA and AWADDR combination is detected an output signal is asserted for 1 clk cycle.
This is used to notify the control of the completion of the program execution.
Please consult the test program in ./sw/mul/ to see how it is used.
7.3. AXI_offset
The AXI_offset is used to change the address of AXI requests. this is used to change the address from which the CPU reads and can be used to define the program the CPU should execute.
Two version of this exist:
- Offset is an external signal.
- Offset is set via a parameter.
7.4. AXI_joiner
This modules selects one of two request sources to forward to one receiver. This allows two AXI masters to talk to one AXI slave.
The receiver is chosen by an input signal.
In the system this is used to change the module that is sending AXI requests to the NoC. In the beginning, the self-aware module is allowed to communicate and once a program has been set for the CPU the communication switches to the CPU.
7.5. AXI_splitter
This module splits AXI requests between two receivers. This allows one AXI master to talk to two different AXI slaves.
The address after which the requests are sent to receiver 2 (m_axi_1) is set via the CUT_OFF parameter.
It is used to forward some requests to the control module. Requests below the CUT_OFF are forwarded to the memory, while requests above CUT_OFF are sent to the control module that is responsible for recording the state of the nodes and as well as the programs that they execute.
7.6. slave_bridge / master_bridge
The bridge translates an AXI request / response into the NoC protocol and back.
The slave / master labels refer to the AXI interface part they have. A module with an AXI master interface has to be connected to a slave bridge (instead of a AXI slave).
7.7. control
Explained under the control scheme.
7.8. debugger
The debugger is listening for a specific write that uses a predefined address.
If this address has been detected, the write data is set to the top as well as a signal puls indicating the arrival of a new char to be outputted.
In the simulation environment this is handled in the sim_main.cpp file where the chars of the respective debugger modules are collected until a newline is received indicating that the debug string is complete. At this time the debug string is printed to the terminal.
7.9. flit_buffer
The flit buffer collects every flit that is received via the NoC. Once all the flits of one AXI requests have been collected all data of the flit is merged together and passed to the master bridge where the AXI request is extracted and reproduced.
Once the master bridge has received a response the response is merged and sent to the flit buffer which sends it back over the NoC.
7.10. NoC
A 2×2 NoC created with CONNECT ( CONfigurable NEtwork Creation Tool )
by Michael Papamichael: http://users.ece.cmu.edu/~mpapamic/connect/
For “Network and Router Options” only the following is supported by the bridges:
- Router Type: Simple Input Queued (IQ)
- Flow Control Type: Peek Flow Control
7.11. PicoRV32
A small RISC-V CPU
by Clifford Wolf: https://github.com/cliffordwolf/picorv32/
8. Software
8.1. Programs
There are a few things to keep in mind:
- Due to difficulties to setting the entry point, the main function has to be called my_main at the moment.
- There are various print functions in the util.h (found in _libs) that can be used with the debugger. They should be used as little as possible as they can greatly increase the size of the program.
- At the end the function signal_fin() has to be called to signal the self_awareness, that the program execution is finished. An endless loop afterwards is recommended. This should be moved into the start.S in the future.
- The system does not support malloc. There is a library provided called memmgr (found in _libs) that can be used to replace the usual functions. Please have a look at dhrystone to see how it can be used.
- No optimization (-O0) is advised. -O1 optimizes the mul away and anything higher causes EBREAKs / ECALLs. Normally the latter gives control to an underlying system, but as nothing is there the CPU crashes.
8.2. Controller
The source code should tell you everything you need to know, especially the defines.h. If anything is unclear, please feel free to contact me.
9. FPGA (using Vivado)
A IP core can be created for each module or for each component. The latter requires less work and results in a more clear block design as less boxes have to be connected, but makes debugging using the Integrated Logic Analyzer more difficult.
To package an IP core please follow these steps:
- Create a wrapper that does not contain any interfaces as input or output as this is not allowed by Vivado. There is a wrapper for the AXI offset included you can use as a reference.
- Create a Vivado project and include the following files:
- The module and the wrapper
- Every interface used (found in /rtl/interfaces)
- The define files for the interfaces (found in /configurations/x like “defines_axi.sv”)
- Click on the files and make sure that they are recognized as the correct type under “Type” in “Source File Properties”. Should Vivado complain about assignments this might be the cause of the issue.
- The module, wrapper and interfaces should be “System Verilog”
- The define files should be “Verilog Header”
- Open all the System Verilog files and include the Verilog Headers at the beginning: e.g. `include “defines_axi.sv”. If the interfaces are not be shown under “Sources” in the “Hierarchy” tab, select the “Library” tab to find them.
- Make sure that every parameter has a default value.
- Synthesize the code to make sure it is working. If the Verilog Headers have not been included correctly the synthesis might still work. However you will get an error during the next step.
- Package the IP core with the following recommendation:
- Remove all the memory mapped stuff under “Addressing and Memory”. Vivado likes to assign everything AXI related a memory space. This should not be needed most of the time.
9.1. CONNECT NoC
The CONNECT NoC requires special attention.
- Rename the .hex files to .data.
- Open the mkNetworkSimple.v file where the .hex (now .data) files are read and update the path.
- Create a Vivado project and include all .v and .data files.
- Synthesize the code to make sure it is working.
- Package the IP core.