Practical 5 - Bringing it all together

In the final two weeks, we are going to bring it all together in an embedded development case study. We will use parts of the previous practicals as well as some content from the lectures to create a system which:

  • Uses FreeRTOS as its operating system to give us multitasking and easier programming.

  • Uses lwIP and ethernet networking to read messages from the network.

  • Passes those messages into a custom IP core which calculates their MD5 hash.

  • Coordinates all of this over a serial interface to the user.

Part 1 - FreeRTOS

Setup

First, let’s start playing with FreeRTOS. As you will remember from the lectures, FreeRTOS is a lightweight real-time operating system that provides tasking and priority-based scheduling.

  • In Vitis, create a new application project and when it asks which Platform to use, select your hardware project from the previous practical in which we deployed your Collatz Conjecture evaluation IP.

  • Call the project something sensible like freertos_practical and select "Create new..." to create a new System.

  • When it asks you to select a Domain (where do you want your code to run) select “Create new..” and set the operating system to freertos10_xilinx and the cpu to ps7_cortexa9_0.

  • Select “Empty application” (or the C++ version if you want) and Finish.

Now we will need to include the lwIP library. It might already have been added by the tools but we can check by doing:

  • Double click on freertos_practical.prj and click Navigate to BSP settings.

  • Ensure the Board Support Package for your Domain freertos10_xilinx_ps7_cortexa9_0 is selected and click Modify BSP settings.

  • Ensure that lwip211 is checked

  • Important: You must also go to the lwIP settings and make the following two changes:

    • api_mode set to SOCKET API

    • dhcp_options.lwip_dhcp set to true

Untitled.png

Now we can add our code. In the Explorer pane, under the freertos_practical/src directory, right click and select New File and create two files, main.c and network.c. Copy in the contents from here:

main.c

#include <stdio.h> #include <string.h> #include "xparameters.h" #include "netif/xadapter.h" #include "xuartps_hw.h" #include "xil_printf.h" #include "FreeRTOS.h" #include "task.h" #include "lwip/sockets.h" #include "lwipopts.h" #include <lwip/ip_addr.h> #include <lwip/tcp.h> #include <lwip/udp.h> #define THREAD_STACKSIZE 1024 #define PORT 51000 //Put your MAC address here! unsigned char mac_ethernet_address[] = { 0xxx, 0xxx, 0xxx, 0xxx, 0xxx, 0xxx}; //Function Prototypes void network_init(unsigned char* mac_address, lwip_thread_fn app); void application_task(void *); void udp_get_handler(void *arg, struct udp_pcb *pcb, struct pbuf *p, const ip_addr_t *addr, u16_t port); //------ int main() { //Initialise the network with our MAC address, and the function that should be started as a FreeRTOS task network_init(mac_ethernet_address, application_task); vTaskStartScheduler(); //Start the scheduler //we will only get to here if someone calls vTaskEndScheduler() return 0; } void application_task(void *p) { //This task will set things up and then remove itself once that is done xil_printf("application_task started\n\r"); //Bind a network receiver as we did before struct udp_pcb *recv_pcb = udp_new(); udp_bind(recv_pcb, IP_ADDR_ANY, PORT); udp_recv(recv_pcb, udp_get_handler, NULL); //Create any other tasks you might need here //... //... vTaskDelete(NULL); //Set up complete so delete ourselves } void udp_get_handler(void *arg, struct udp_pcb *pcb, struct pbuf *p, const ip_addr_t *addr, u16_t port) { if (p) { char msg[p->len + 1]; memcpy(msg, p->payload, p->len); msg[p->len] = '\0'; printf("Message: %s\n", msg); pbuf_free(p); } }

network.c

#include <stdio.h> #include "xparameters.h" #include "netif/xadapter.h" #include "xil_printf.h" #include "lwip/dhcp.h" void lwip_init(); #define THREAD_STACKSIZE 1024 //Function prototypes void print_ip(const char *msg, const ip_addr_t *ip); void network_thread(void *p); int network_startup_task(); //Structures and globals static struct netif server_netif; struct netif *echo_netif; unsigned char* mac_addr; lwip_thread_fn application_task; TaskHandle_t startuptask, nettask, apptask, rcv_task; //--------------- //Initialise network task //This is called by user code to provide the mac address we should use, //and the code that we should run once the network is ready. void network_init(unsigned char* mac_address, lwip_thread_fn app) { mac_addr = mac_address; application_task = app; xTaskCreate((lwip_thread_fn) network_startup_task, "startup_task", THREAD_STACKSIZE, NULL, DEFAULT_THREAD_PRIO, &startuptask); } //Created by network_init(). Initialises lwIP, runs DHCP, and once connected, starts the application_task that the user provided //when they called network_init int network_startup_task() { lwip_init(); //Create lwIP's network handling thread (as described by lwIP documentation) xTaskCreate(network_thread, "nw_task", THREAD_STACKSIZE, NULL, DEFAULT_THREAD_PRIO, &nettask); //This task just waits until we get an IP address via DHCP, then creates our application task while (1) { vTaskDelay(DHCP_FINE_TIMER_MSECS / portTICK_RATE_MS); //wait 500ms if (server_netif.ip_addr.addr) { //Do we have an IP address? xil_printf("DHCP request success\r\n"); print_ip("Board IP: ", &server_netif.ip_addr); print_ip("Netmask : ", &server_netif.netmask); print_ip("Gateway : ", &server_netif.gw); xil_printf("\r\n"); xTaskCreate(application_task, "app_task", THREAD_STACKSIZE, NULL, DEFAULT_THREAD_PRIO, &apptask); break; } } //DHCP is connected and we've created the main task so delete ourselves vTaskDelete(NULL); //Passing NULL says to delete this task return 0; } //lwIP's network handling thread (as described by lwIP documentation) //Started from the network_startup_task void network_thread(void *p) { struct netif *netif = &server_netif; ip_addr_t ipaddr, netmask, gw; int mscnt = 0; ipaddr.addr = 0; gw.addr = 0; netmask.addr = 0; // Add our network interface to lwIP and set it as default if (!xemac_add(netif, &ipaddr, &netmask, &gw, mac_addr, XPAR_XEMACPS_0_BASEADDR)) { xil_printf("Error adding network interface\r\n"); return; } netif_set_default(netif); netif_set_up(netif); // Start packet receive thread, this is part of lwIP xTaskCreate((void(*)(void*))xemacif_input_thread, "xemacif_input_thread", THREAD_STACKSIZE, netif, DEFAULT_THREAD_PRIO, &rcv_task); // Start DHCP. This task will now loop forever calling dhcp_fine_tmr and dhcp_coarse_tmr every so often xil_printf("\r\nStart DHCP lookup...\r\n"); dhcp_start(netif); while (1) { vTaskDelay(DHCP_FINE_TIMER_MSECS / portTICK_RATE_MS); dhcp_fine_tmr(); mscnt += DHCP_FINE_TIMER_MSECS; if (mscnt >= DHCP_COARSE_TIMER_SECS*1000) { dhcp_coarse_tmr(); mscnt = 0; } } return; } void print_ip(const char *msg, const ip_addr_t *ip) { xil_printf(msg); xil_printf("%d.%d.%d.%d\n\r", ip4_addr1(ip), ip4_addr2(ip), ip4_addr3(ip), ip4_addr4(ip)); }

 

Remember to change the MAC address at the top of main.c!

Most of this code is the same as what we used in https://uoy.atlassian.net/wiki/spaces/RTS/pages/35688834. However you will note that it is now wrapped in FreeRTOS tasks instead of all just being in the main() function.

Once you have put your MAC address in the code, this should listen for Cat Facts as before and will print the facts out on the screen.

Extending the tasking

Now extend the application to add two additional tasks.

You can look inside network.c to see how FreeRTOS tasks are created, or look in the FreeRTOS reference manual. Now, instead of having to create fiddly state machines, keep track of lots of global state variables, and remembering to always call handle_ethernet() we can finally handle multiple jobs cleanly in separate tasks.

Hint: Remember that FreeRTOS will run whichever task is the highest priority and is runnable. If a task is always runnable, it will always run. Use delays to avoid starving other tasks.

Part 2 - Custom hardware for MD5 hashing

Now that our OS is working, we are going to create a new custom IP core which we can use to get the MD5 hash of a string of text. Once this is working, we will then operate this IP core from another FreeRTOS task. First, the IP core.

One of the strengths of HLS is that it can work with almost any C code. In this practical we are going to take a software algorithm from the internet and turn it into a hardware co-processor that can calculate MD5 hashes. Do not worry about trying to understand the implementation details of MD5; it is enough to know that it takes input data and hashes it to a 16-byte hash value. To start with, copy the following example C implementation of MD5 (adapted from example code previously on Wikipedia). Your task is to get this code compiling through Vitis HLS to create an AXI peripheral that can be used by the ARM to hash some example data. You can check that your output is correct with an online tool, such as this one.

/* * Simple MD5 implementation * * Compile with: gcc -o md5 md5.c */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <stdint.h> // Constants are the integer part of the sines of integers (in radians) * 2^32. const uint32_t k[64] = { 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee , 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501 , 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be , 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821 , 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa , 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8 , 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed , 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a , 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c , 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70 , 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05 , 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665 , 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039 , 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1 , 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1 , 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391 }; // r specifies the per-round shift amounts const uint32_t r[] = {7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 7, 12, 17, 22, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 5, 9, 14, 20, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 4, 11, 16, 23, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21, 6, 10, 15, 21}; uint32_t left_rotate(uint32_t x, uint32_t c) { return (x << c) | (x >> (32 - c)); } void to_bytes(uint32_t val, uint8_t *bytes) { bytes[0] = (uint8_t) val; bytes[1] = (uint8_t) (val >> 8); bytes[2] = (uint8_t) (val >> 16); bytes[3] = (uint8_t) (val >> 24); } uint32_t to_int32(const uint8_t *bytes) { return (uint32_t) bytes[0] | ((uint32_t) bytes[1] << 8) | ((uint32_t) bytes[2] << 16) | ((uint32_t) bytes[3] << 24); } void md5(const uint8_t *initial_msg, size_t initial_len, uint8_t *digest) { // These vars will contain the hash uint32_t h0, h1, h2, h3; // Message (to prepare) uint8_t *msg = NULL; size_t new_len, offset; uint32_t w[16]; uint32_t a, b, c, d, i, f, g, temp; // Initialize variables - simple count in nibbles: h0 = 0x67452301; h1 = 0xefcdab89; h2 = 0x98badcfe; h3 = 0x10325476; //Pre-processing: //append "1" bit to message //append "0" bits until message length in bits ≡ 448 (mod 512) //append length mod (2^64) to message for (new_len = initial_len + 1; new_len % (512/8) != 448/8; new_len++); msg = malloc(new_len + 8); memcpy(msg, initial_msg, initial_len); msg[initial_len] = 0x80; // append the "1" bit; most significant bit is "first" for (offset = initial_len + 1; offset < new_len; offset++) msg[offset] = 0; // append "0" bits // append the len in bits at the end of the buffer. to_bytes(initial_len*8, msg + new_len); // initial_len>>29 == initial_len*8>>32, but avoids overflow. to_bytes(initial_len>>29, msg + new_len + 4); // Process the message in successive 512-bit chunks: //for each 512-bit chunk of message: for(offset=0; offset<new_len; offset += (512/8)) { // break chunk into sixteen 32-bit words w[j], 0 ≤ j ≤ 15 for (i = 0; i < 16; i++) w[i] = to_int32(msg + offset + i*4); // Initialize hash value for this chunk: a = h0; b = h1; c = h2; d = h3; // Main loop: for(i = 0; i<64; i++) { if (i < 16) { f = (b & c) | ((~b) & d); g = i; } else if (i < 32) { f = (d & b) | ((~d) & c); g = (5*i + 1) % 16; } else if (i < 48) { f = b ^ c ^ d; g = (3*i + 5) % 16; } else { f = c ^ (b | (~d)); g = (7*i) % 16; } temp = d; d = c; c = b; b = b + left_rotate((a + f + k[i] + w[g]), r[i]); a = temp; } // Add this chunk's hash to result so far: h0 += a; h1 += b; h2 += c; h3 += d; } // cleanup free(msg); //var char digest[16] := h0 append h1 append h2 append h3 //(Output is in little-endian) to_bytes(h0, digest); to_bytes(h1, digest + 4); to_bytes(h2, digest + 8); to_bytes(h3, digest + 12); } int main(int argc, char **argv) { char *msg = argv[1]; size_t len; int i; uint8_t result[16]; if (argc < 2) { printf("usage: %s 'string'\n", argv[0]); return 1; } len = strlen(msg); md5((uint8_t*)msg, len, result); // display result for (i = 0; i < 16; i++) printf("%2.2x", result[i]); puts(""); return 0; }

Build a design in Vitis HLS which synthesises and can, when in a testbench, be passed data and show the correct MD5 output.

Tips:

  • The first problem you'll see is that the example implementation uses some types that HLS does not know about, like uint32_tsize_t etc. Either change these to appropriate types, or create appropriate typedefs.

  • The next, is that this implementation uses malloc() and free() to create buffers on the heap. This doesn't work in hardware because any buffers you use have to be declared statically as arrays and cannot be malloced. This version of the md5 function takes a pointer to an initial_msg which it copies into an newly allocated buffer, msg. Instead, read the msg from shared memory. (Assume a maximum size of 256 bytes.)

  • Depending on how you do the above, you may find the to_bytes function becomes unimplementable and also needs correcting. In general, try to avoid using pointers. Instead of a char* pointer, you can always just pass an index.

  • Remember that your toplevel function should keep the same format:

    • uint32 toplevel(uint32 *ram, uint32 *arg1, uint32 *arg2, uint32 *arg3, uint32 *arg4);

    • It is tempting to change these to bytes or chars or other 8-bit types because MD5 expects bytes, but do not do this. The physical memory bus on the system is 32 wires wide, and so the ram pointer must remain a uint32 or HLS will create some hardware with the wrong number of wires! Pack and and unpack the bytes manually.

Once your testbench is giving you the right answer, show it to Ian.

Now that we have functional correctness, we can compile it to hardware as you did in the previous practical, integrate it into your hardware design, and call it from FreeRTOS.

  • You can now create a third task in the system. You will now have running at any one time:

    • ui_task - Handling user serial input

    • leds_task - Handling buttons and LEDs

    • hw_task - Responsible for sending data to, and reading data from, your IP core

Your final goal is to use our tasking and custom hardware to calculate the MD5 hashes of each Cat Fact as they come in. You will need to devise a way of communicating between the udp_get_handler and hw_task. As udp_get_handler is effectively an interrupt handler, it shouldn’t block or sleep. Create a global data structure that it can write incoming messages into and can be checked periodically by hw_task. FreeRTOS has a number of data structures for task coordination, for example queues.

Once all of this works, congratulations, you have built a multitasking, networked, embedded system with hardware accelerated MD5 hashing!