GEMM Accelerator FPGA Emulation
In order to aid measuring the real execution time on the real board with the GEMM implemented in FPGA, we have implemented a cycle-accurate model of the GEMM Acc in FPGA. It behaves identical to the GEMM ACC described in lab 3 in terms of timing. However, the GEMM ACC does not compute any output, when done it simply sets back the LSB of the CSR. Again, it does not compute anything. Whatever is written into the scratchpad remains in there. Note also, that the GEMM ACC while having a 512kByte SRAM address range, it is internally implemented as a 64kByte SRAM that just mirrors the same content every 64kByte. Since we don’t model functionality anyway, this should not matter.
Memory Map
Due to system constraints, the address map is slightly different than modeled. The base address is different, and SRAM comes first while MMRs follow. This is due to restrictions of the base address of the SRAM scratchpad.
Base address: 0x41780000
Address Space Offset | Name | Description |
---|---|---|
0x00 - 0x7FFFF | DATA | GemmProc ScratchPad (512KByte) |
0x80000 | CSR | The control and status register |
0x80004 | PC | Current Descriptor / Instruction (Read Only) starting with 1 |
0x80008 | 01_MA_START_ADDR | Start address of MA |
0x8000C | 01_MB_START_ADDR | Start address of MB |
0x80010 | 01_MC_START_ADDR | Start address of MC |
0x80014 | 01_MA_NR_ROW | Number of rows in MA |
0x80018 | 01_MB_NR_ROW | Number of rows in MB and number of columns in MA |
0x8001C | 01_MB_NR_COL | Number of columns in MB (Note that MC’s dimensions will be [01_MA_NR_ROW][01_MB_NR_COL]) |
0x80020 | 01_INST_NEXT | reference to next instruction (set to 0) |
0x80024 … | 02_* | second descriptor (not implemented) |
Data Panel
Address: 4178_0000h
base + 0h
offset = 4178_0000h
Control Panel
Address: 4178_0000h
base + 8_0000h
offset = 4180_0000h
How to Adjust for the difference in Memory Map
Suggested Method 1
We recommend you keep everything stable on your main
branch. Therefore, create a new branch for the address modification. In this case, you should test both your HW and SW implementation on your development environment before running such a task on jenkins since memory operations are much easier to observe on QEMU and HW_testbench.
Method 2
To account for this, check for GEMM_ACC_FPGA
preprocessor directive.
#ifdef GEMM_ACC_FPGA
/* define base and offets based on FPGA emulation */
#else
/* define base and offets based on SystemC */
#endif
ifdef checks whether the GEMM_ACC_FPGA
token has been #defined earlier (either in file or command line). If so, it includes everything between it and the closing #else or, if no #else is present, the closing #endif. (see 1 and 2).
To compile with the flag, add an additional argument to gcc
.
gcc -DGEMM_ACC_FPGA ...
Running the Code on Jenkins
Introducing the new job.
A new job on marble-jenkins called project_target_hwacc
has already been created for each Team. This is the basic structure for you to test your SW implementation on a real zedboard with time-accurate HW GEMM Accelerator attached.
Check the configuration before emitting the task.
The most important items are the Branch Specifier
and Execute shell - Command
. The comments inside Execute shell
would be helpful if you need to modify the instructions for jenkins to do. (e.g. profiling, copy files.)
* All your changes to the job configuration will be archived in the job history.
Multiple configuration.
If you are not satisfied with the provided job, or there are multiple configurations you need to run, duplicate this and create your own task.
- Create a
New Item
under yourTeamN
folder. - Enter a new job name, and copy the configuration from
project_target_hwacc
.
Other suggestion
Try to run and profile gemm_tb
first as almost nothing needs to be changed other than the address of registers. It should be easy to get the timing result and calculate the speedup etc. darknet
application we modified, on the other hand, needs to be further optimized (especially the output layer and picture-render) due to the pseudo returned data we got from the HW GEMM Accelerator.