SPRADO9 March   2025 AM62L

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
  5. 2Processor Core and Compute Benchmarks
    1. 2.1 Dhrystone
    2. 2.2 Whetstone
    3. 2.3 Linpack
    4. 2.4 NBench
    5. 2.5 CoreMark-Pro
    6. 2.6 Fast Fourier Transform
    7. 2.7 Cryptographic Benchmarks
  6. 3Memory System Benchmarks
    1. 3.1 Memory Bandwidth and Latency
      1. 3.1.1 LMBench
      2. 3.1.2 STREAM
    2. 3.2 Critical Memory Access Latency
    3. 3.3 UDMA: DDR to DDR Data Copy
  7. 4Summary
  8. 5References

Critical Memory Access Latency

This section provides round-trip read latency measurements for processors in AM62Lx to various memory destinations in the system. The measurements where made on the AM62Lx platform using bare-metal silicon verification tests. The tests execute on A53 processor out of LPDDR4. Each test includes a loop of 8192 iterations to read a total of 32 KiB of data. The number of cycles for each access were counted and divided by the respective processor clock frequency to obtain latency time.

For this latency measurement purpose, disable all automatic clock gating. By default, automatic clock gating (nogate control) is enabled to save power on the interconnect, at a minor cost to latency performance (around a dozen nanoseconds). Automatic clock gating is done in a distributed fashion, with each IP having one or more nogate control. These parameters are typically configured by the device management at initialization time and upon the entry & exit of a low power mode. Table 3-4 shows the average latency results.

Tests were done on 1.25Ghz A53 cores and 1600MT/s LPDDR4. ARM architecture provides a local internal low latency path and also allows external access to the memory through SoC bus infrastructure.

Table 3-4 Critical Memory Access Latency of A53
Memory

Arm-Cortex-A53 (Avg ns)

SoC Address

LPDDR4

155

0x80000000
OCSRAM MAIN

42

0x70800000
OCSRAM WKUP

108

0x707f0000