You will find the Vitis HLS Knowledge Base useful as it contains tips and reminders. Remember Remember to always use the HLS component template from the previous practical.
Game of Life
Conway's Game of Life (henceforth Life) is a well-known cellular automaton. If you are not familiar with Life, it is a 2-dimensional grid of cells in which each which each cell may be in two states, either 'alive' or 'dead'. On each time step, some cells die, some come to life, and some stay the same. Fixed rules determine how each cell changes from one time step to the next. You can see an example of the game on this page. The The rules are as follows.
For every cell on the grid:
Count the number of neighbouring cells that are alive. A neighbour is the 8 cells around the cell (Northwest, North, Northeast, West, East, Southwest, South, Southeast).
Any live cell with fewer than two live neighbours dies.
Any live cell with two or three live neighbours lives on to the next generation.
Any live cell with more than three live neighbours dies.
Any dead cell with exactly three live neighbours becomes a live cell.
...
Construct a testbench that prints a test grid, and then sends it to the hardware. One naïve way of representing the grid is shown below, but you can likely think of a better way.
...
Code Block |
---|
...
int inputgrid[10][10] = { |
...
{0,0,0,0,0,0,0,0,0,0}, |
...
{0,0,0,0,0,0,0,0,0,0}, |
...
{0,0,0,0,0,0,0,0,0,0}, |
...
{0,0,0,0,0,0,0,0,0,0}, |
...
{0,0,0,0,1,0,0,0,0,0}, |
...
{0,0,0,1,1,1,0,0,0,0}, |
...
{0,0,0,1,0,0,0,0,0,0}, |
...
{0,0,0,0,0,0,0,0,0,0}, |
...
{0,0,0,0,0,0,0,0,0,0}, |
...
{0,0,0,0,0,0,0,0,0,0} |
...
}; |
It should then receive the calculated grid from the hardware and display it. You will need to be able to calculate multiple states sequentially to show that they are being correctly calculated.
Remember the structure that was given to you in practical 4. The toplevel
function should look like this:
Code Block | ||
---|---|---|
| ||
uint32 toplevel(uint32 *ram, uint32 *arg1, uint32 *arg2) {
#pragma HLS INTERFACE m_axi port=ram offset=slave bundle=MAXI
#pragma HLS INTERFACE s_axilite port=arg1 bundle=AXILiteS register
#pragma HLS INTERFACE s_axilite port=arg2 bundle=AXILiteS register
#pragma HLS INTERFACE s_axilite port=return bundle=AXILiteS register
} |
You can add and rename additional arguments (arg3
, arg4
etc.), but (at least at first) they should all be of type uint32
and don't forget the pragma
s. This is because the ARM core is a 32 bit processor and the RAM is 32 bits wide, so that is how things are stored in the hardware. If we changed the type to uint1
then it would work fine, but we would only be using one bit of each 32 bit word and it would take 32 times longer to send data around.
You should also have a look at Sharing data between the ARM and HLS on the Software API page.
...
Once your hardware is working, use the directives in HLS to optimise the design. You must aim for a design with a latency below 900 clock cycles (it is possible to get down to around 200-300 with some work). Don't worry about how much hardware you are using. Go straight for speed!
When you have completed this task show a demonstrator who will mark it as completed in your logbook.
Tips:
You will want to use the
ARRAY_PARTITION
directive to split up your arrays, because if they are implemented using BRAMs then can only support 2 accesses in parallel. If you apply this to a multidimensional array you will need to set the dimension to 0 for the tools to partition all dimensions.Your hardware should use the contents of its input grid to determine the state of a completely new grid, and return that new grid. If you try to keep it all in one grid and start editing it 'live' you'll get incorrect results.