Skip to the content.

GEMM Accelerator FPGA Emulation

In order to aid measuring the real execution time on the real board with the GEMM implemented in FPGA, we have implemented a cycle-accurate model of the GEMM Acc in FPGA. It behaves identical to the GEMM ACC described in lab 3 in terms of timing. However, the GEMM ACC does not compute any output, when done it simply sets back the LSB of the CSR. Again, it does not compute anything. Whatever is written into the scratchpad remains in there. Note also, that the GEMM ACC while having a 512kByte SRAM address range, it is internally implemented as a 64kByte SRAM that just mirrors the same content every 64kByte. Since we don’t model functionality anyway, this should not matter.

Memory Map

Due to system constraints, the address map is slightly different than modeled. The base address is different, and SRAM comes first while MMRs follow. This is due to restrictions of the base address of the SRAM scratchpad.

Base address: 0x41780000

Address Space Offset Name Description
0x00 - 0x7FFFF DATA GemmProc ScratchPad (512KByte)
0x80000 CSR The control and status register
0x80004 PC Current Descriptor / Instruction (Read Only) starting with 1
0x80008 01_MA_START_ADDR Start address of MA
0x8000C 01_MB_START_ADDR Start address of MB
0x80010 01_MC_START_ADDR Start address of MC
0x80014 01_MA_NR_ROW Number of rows in MA
0x80018 01_MB_NR_ROW Number of rows in MB and number of columns in MA
0x8001C 01_MB_NR_COL Number of columns in MB (Note that MC’s dimensions will be [01_MA_NR_ROW][01_MB_NR_COL])
0x80020 01_INST_NEXT reference to next instruction (set to 0)
0x80024 … 02_* second descriptor (not implemented)

Data Panel

Address: 4178_0000h base + 0h offset = 4178_0000h

image

Control Panel

Address: 4178_0000h base + 8_0000h offset = 4180_0000h

image

How to Adjust for the difference in Memory Map

Suggested Method 1

We recommend you keep everything stable on your main branch. Therefore, create a new branch for the address modification. In this case, you should test both your HW and SW implementation on your development environment before running such a task on jenkins since memory operations are much easier to observe on QEMU and HW_testbench.

Method 2

To account for this, check for GEMM_ACC_FPGA preprocessor directive.

#ifdef GEMM_ACC_FPGA
/* define base and offets based on FPGA emulation */ 
#else
/* define base and offets based on SystemC */
#endif

ifdef checks whether the GEMM_ACC_FPGA token has been #defined earlier (either in file or command line). If so, it includes everything between it and the closing #else or, if no #else is present, the closing #endif. (see 1 and 2).

To compile with the flag, add an additional argument to gcc.

gcc -DGEMM_ACC_FPGA ...

Running the Code on Jenkins

Introducing the new job.

A new job on marble-jenkins called project_target_hwacc has already been created for each Team. This is the basic structure for you to test your SW implementation on a real zedboard with time-accurate HW GEMM Accelerator attached. image

Check the configuration before emitting the task.

The most important items are the Branch Specifier and Execute shell - Command. The comments inside Execute shell would be helpful if you need to modify the instructions for jenkins to do. (e.g. profiling, copy files.) image image * All your changes to the job configuration will be archived in the job history.

Multiple configuration.

If you are not satisfied with the provided job, or there are multiple configurations you need to run, duplicate this and create your own task.

  1. Create a New Item under your TeamN folder. image
  2. Enter a new job name, and copy the configuration from project_target_hwacc. image

Other suggestion

Try to run and profile gemm_tb first as almost nothing needs to be changed other than the address of registers. It should be easy to get the timing result and calculate the speedup etc. darknet application we modified, on the other hand, needs to be further optimized (especially the output layer and picture-render) due to the pseudo returned data we got from the HW GEMM Accelerator.