SPRACV1B February   2022  – January 2024 AM2434 , AM6411 , AM6412 , AM6421 , AM6441 , AM6442

 

  1.   Abstract
  2.   2
  3.   Trademarks
  4. 1Introduction
  5. 2Processor Core Benchmarks
    1. 2.1 Dhrystone
    2. 2.2 Trigonometric Functions
  6. 3Compute and Memory System Benchmarks
    1. 3.1 Memory Bandwidth and Latency
      1. 3.1.1 LMBench
      2. 3.1.2 STREAM
      3. 3.1.3 Cortex-R5 Memory Access Latency
    2. 3.2 CoreMark®-Pro
    3. 3.3 Fast Fourier Transform
    4. 3.4 Cryptographic Benchmarks
  7. 4Application Benchmarks
    1. 4.1 Machine Learning Inference
    2. 4.2 Field Oriented Control (FOC) Loop
    3. 4.3 PCIE to DDR Performance Using BCDMA
      1. 4.3.1 Test Setup
      2. 4.3.2 Result and Observation
    4. 4.4 DDR to DDR Performance Using BCDMA
      1. 4.4.1 Test Setup
      2. 4.4.2 Result and Observation
  8. 5References
  9. 6Revision History

STREAM

STREAM is a microbenchmark for measuring data memory system performance without any data reuse. It is designed to miss on caches and exercise data prefetcher and speculative accesses. it uses double precision floating point (64 bit) but in most modern processors the memory access will be the bottleneck. The four individual scores are copy, scale as in multiply by constant, add two numbers, and triad for multiply accumulate. For bandwidth, a byte read counts as one and a byte written counts as one resulting in a score that is double the bandwidth LMBench will show. The table below shows the measured bandwidth and the efficiency compared to theoretical wire rate. The wire rate used is the DDR MT/s rate times the width. To get overall maximum achieved throughput the command used is stream -M 16M -P 2 -N 10, which means two parallel threads and 10 iterations.

DDR4-1600MT/s-16-Bit Bandwidth DDR4-1600MT/s-16-Bit Efficiency LPDDR4-1600MT/s-16-Bit Bandwidth LPDDR4-1600MT/s-16-Bit Efficiency
copy 2482MB/s 78% 2221MB/s 69%
scale 2516MB/s 79% 2268MB/s 71%
add 2350MB/s 73% 2130MB/s 67%
triad 2355MB/s 74% 2139MB/s 67%