SPRADM6 December   2024 AM62D-Q1

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
    1. 1.1 Load Binaries to AM62D
  5. 2Processor Core Benchmarks
    1. 2.1 C7x DSP Benchmark
      1. 2.1.1 Fast Fourier Transform
      2. 2.1.2 Digital Signal Processing
        1. 2.1.2.1 FIR
        2. 2.1.2.2 Cascade Biquad
        3. 2.1.2.3 Dot Product
      3. 2.1.3 Mathematical Operations
    2. 2.2 Dhrystone on A53 cores
  6. 3Memory System Benchmarks
    1. 3.1 Critical Memory Access Latency
    2. 3.2 UDMA: DDR to DDR Data Copy
    3. 3.3 C7x DRU Performance: Block Copy with DMA
  7. 4Application Specific Benchmarks
    1. 4.1 SBL Boot Time
    2. 4.2 IPC Performance
    3. 4.3 Flash
    4. 4.4 Application Specific Latency
  8. 5Summary
  9. 6References

C7x DRU Performance: Block Copy with DMA

The Data Routing Unit (DRU) available within the C7x is employed to transfer data between DDR and L2SRAM of the C7x, effectively allowing for DMA. The Texas Instruments Signal Processing (TISP) middle-ware library provides several examples on how to wrap various kernels from DSPLIB and FFTLIB of the C7x with DMA. TISP is included in the FREERTOS-SDK of AM62D with documentation to build and run the examples. The TISP_blockCopy example within TISP provides performance results when moving data between DDR and L2SRAM of the C7x. In the TISP_blockCopy example, we read data from DDR into L2SRAM of the C7x while simultaneously writing data from L2SRAM to DDR. There is a block copy kernel that copies the same data, read into L2SRAM via DRU, from one location in L2SRAM to another location in L2SRAM. The block copy kernel employs the streaming engine (SE) to read data from the L2SRAM. To write the same data read via SE, the kernel employs the write path of the C7x into L2SRAM, while the address offsets for the write are generated via the Streaming Address (SA) generator. Few notes on this example are listed below:

  • DDR spec: 3200 MT/s, 32 bits per transaction, which results in a peak theoretical DDR bandwidth of 12.8GB/s (4B x 3200MT/s).
  • The DRU transfer property was setup to be 4D.
  • One channel to read data from DDR into L2SRAM while another channel to writes data from L2SRAM into DDR, simultaneously. Figure 3-1 shows the details of the TISP_blockCopy example. In the example, we DMA 16 MB of data to and from the DDR, simultaneously. Amounting for a total data movement of 32 MB.
  • Note that there is no computation involved.
 DRU, SE, and SA Date Movement ExampleFigure 3-1 DRU, SE, and SA Date Movement Example

Table 3-4 shows performance measurement of moving 16MB of data achieving bandwidth of 10.4GB/s with efficiency of 81% of the total DDR bandwidth.

Table 3-4 DRU Performance: Data Movement from DDR to C7x and Bmoack to DDR

Data Type

Data Size

EVM Cycles

Data Transfer

Efficiency

Float

2048x2048x4=16MB

3185174

5.2x2 = 10.4GB/s

10.4/12.8 = 81%