SPRADO9 Application note

SPRADO9 March 2025 AM62L

3.1.2 STREAM

STREAM is a microbenchmark for measuring data memory system performance without any data reuse. STREAM is designed to miss on caches and exercise the data prefetcher and speculative accesses. STREAM uses double precision floating point (64 bit), but in most modern processors the memory access is the bottleneck. The four individual scores are copy, scale as in multiply by constant, add two numbers, and triad for multiply accumulate.

Copy: measures memory transfer rate without arithmetic operation, a[i] = b[i]
Scale: includes a simple arithmetic operation, a[i] = k × b[i]
Add: includes three memory access in addition to arithmetic operation, a[i] = b[i] + c[i]
Triad: combines scale and add in one operation, a[i] = b[i] + k × c[i]

For bandwidth, a byte read counts as one and a byte written counts as one resulting in a score that is double the bandwidth LMBench. Table 3-3 shows the measured bandwidth and the efficiency compared to theoretical wire rate. The wire rate used is the LPDDR4 MT/s rate times the width. To get overall maximum achieved throughput the command used is stream -M 16M -P 2 -N 10, which means two parallel threads and 10 iterations. The Arm-Cortex-A53 clock frequency is setup to 1.25GHz in this test.

root@am62lxx-evm:~# stream -M 16M -P 2 -N 10
STREAM copy latency: 13.64 nanoseconds
STREAM copy bandwidth: 2346.27 MB/sec
STREAM scale latency: 13.59 nanoseconds
STREAM scale bandwidth: 2354.55 MB/sec
STREAM add latency: 21.72 nanoseconds
STREAM add bandwidth: 2209.49 MB/sec
STREAM triad latency: 22.20 nanoseconds
STREAM triad bandwidth: 2162.58 MB/sec

Table 3-3 Stream Benchmarks

	LPDDR4-1600MT/s-16-Bit Latency [ns]	LPDDR4-1600MT/s-16-Bit Bandwidth [MB/s]	LPDDR4-1600MT/s-16-Bit Efficiency[%]
copy	13.64	2,346	73
scale	13.59	2,354	73
add	21.72	2,209	69
triad	22.20	2,162	67