Practical 3b - Vitis HLS Problem

You will find the Vitis HLS Knowledge Base useful as it contains tips and reminders. Remember to always use the HLS component template from the previous practical.

Game of Life

Conway's Game of Life (henceforth Life) is a well-known cellular automaton. If you are not familiar with Life, it is a 2-dimensional grid of cells in which each cell may be in two states, either 'alive' or 'dead'. On each time step, some cells die, some come to life, and some stay the same. Fixed rules determine how each cell changes from one time step to the next. You can see an example of the game on this page. The rules are as follows.

For every cell on the grid:
- Count the number of neighbouring cells that are alive. A neighbour is the 8 cells around the cell (Northwest, North, Northeast, West, East, Southwest, South, Southeast).
- Any live cell with fewer than two live neighbours dies.
- Any live cell with two or three live neighbours lives on to the next generation.
- Any live cell with more than three live neighbours dies.
- Any dead cell with exactly three live neighbours becomes a live cell.

You should create an HLS component which, when fed a Life grid state, calculates the next grid state and returns it.

Use a grid of size 10 by 10. It is up to you how you represent this grid and what protocol the testbench will use to communicate with your hardware. It is also up to you how you treat the edges of the grid (wrap around or not).

Subtask 1

Construct a testbench that prints a test grid, and then sends it to the hardware. One naïve way of representing the grid is shown below, but you can likely think of a better way.

int inputgrid[10][10] = {
        {0,0,0,0,0,0,0,0,0,0},
        {0,0,0,0,0,0,0,0,0,0},
        {0,0,0,0,0,0,0,0,0,0},
        {0,0,0,0,0,0,0,0,0,0},
        {0,0,0,0,1,0,0,0,0,0},
        {0,0,0,1,1,1,0,0,0,0},
        {0,0,0,1,0,0,0,0,0,0},
        {0,0,0,0,0,0,0,0,0,0},
        {0,0,0,0,0,0,0,0,0,0},
        {0,0,0,0,0,0,0,0,0,0}
};

It should then receive the calculated grid from the hardware and display it. You will need to be able to calculate multiple states sequentially to show that they are being correctly calculated.

Remember the structure that was given to you in practical 3a. The toplevel function should look like this:

uint32 toplevel(uint32 *ram, uint32 *arg1, uint32 *arg2) {
    #pragma HLS INTERFACE m_axi port=ram offset=slave bundle=MAXI
    #pragma HLS INTERFACE s_axilite port=arg1 bundle=AXILiteS register
    #pragma HLS INTERFACE s_axilite port=arg2 bundle=AXILiteS register
    #pragma HLS INTERFACE s_axilite port=return bundle=AXILiteS register
}

You can add and rename additional arguments (arg3, arg4 etc.), but (at least at first) they should all be of type uint32 and don't forget the pragmas. This is because the ARM core is a 32 bit processor and the RAM is 32 bits wide, so that is how things are stored in the hardware. If we changed the type to uint1 then it would work fine, but we would only be using one bit of each 32 bit word and it would take 32 times longer to send data around.

You should also have a look at Sharing data between the ARM and HLS on the Software API page.

Remember what we talked about in the lecture. You are not trying to pass in your grid through the AXI slave interface (arg1, arg2 etc.) The grid is already in ram.

Subtask 2

Once your hardware is working, use the directives in HLS to optimise the design. You must aim for a design with a latency below 900 clock cycles (it is possible to get down to around 200-300 with some work). Don't worry about how much hardware you are using. Go straight for speed!

When you have completed this task show a demonstrator who will mark it as completed in your logbook.

Important Tips:

Despite the text above, many of you will have changed the definition of toplevel. It is very tempting to write your toplevel as something like this:
- uint100 toplevel(uint100 *ram)
- But problem here is two-fold:
  - One, we don’t need to return the new array because it is already stored in RAM. You can have the hardware component modify-in-place, or you could write it to somewhere else in RAM and then read it from that place in your ARM code. For example you could say that the input array is location 0 to 99 (i.e. ram[0] to ram[99]), and for the output array your HLS component will write at location 100 to 199. You have plenty of RAM, use it.
  - Two, the physical connection between the CPU and RAM on the development board is 32 wires. This is why we used uint32 to tell HLS to make 32 wires. If you do the above, HLS will happily create an IP core with 100 wires, which you will then fail to wire up in the next practical.
You will want to use the ARRAY_PARTITION directive to split up your arrays, because if they are implemented using BRAMs then can only support 2 accesses in parallel. If you apply this to a multidimensional array you will need to set the dimension to 0 for the tools to partition all dimensions.
Your hardware should use the contents of its input grid to determine the state of a completely new grid, and return that new grid. If you try to keep it all in one grid and start editing it 'live' you'll get incorrect results.

Real-Time Systems

Practical 3b - Vitis HLS Problem

Game of Life

Subtask 1

Subtask 2

Important Tips:

Related content