Processors

Core Benchmarks

The benchmarks in the table below are for a single core. See device benchmarks for multicore performance.

All benchmarks measured with data located in L2 SRAM.
1 C66x FFT code benchmarked is an optimized version of the FFT kernel code from FFTLIB using L2 memory.
2 A15 benchmarks with data in OCMC RAM. Data and program cache enabled. Compiler flags used for ARM Neon optimizations are -mfpu = vfpv4 –mfloat-abi = hard -03. The A15 outputs not verified for accuracy and precision. No hand written intrinsics used in the code

Processor core C66x DSP core C674x DSP core ARM® Cortex®-A15  
Hardware platform used C6657 EVM C6748 LCDK AM5728 EVM  
Devices featuring benchmarked core C66x DSPs
66AK2x DSPs
Sitara AM57x SoC's
OMAP-L138
C6748
66AK2x DSPs
Sitara AM57x SoC's
 
Function benchmarked C66x execution time C674x execution time ARM Cortex-A15 execution time2 Associated TI library
C66x cycles C66x μS @ 1GHz C674x cycles C674x μS @ 456Mhz Cortex-A15 cycles Cortex-A15 μS @ 1GHz2
Complex FFT (256 pts) - SP floating point1 1782 1.78 2401 5.27 8644 8.64 FFTLIB for C66x
DSPLIB for C674x
Complex FFT (1k pts) - SP floating point1 6269 6.27 10950 24.01 43916 43.92
Real block FIR - fixed point 128 samples, 16 coeff 262 0.26 386 0.85 2152 2.15 DSPLIB
Real block FIR - SP floating point 128 samples, 16 coeff 1345 1.35 1406 3.08 6971 6.97 DSPLIB
Real block FIR - SP floating point 256 samples, 16 coeff 2625 2.63 2735 6 13879 13.88 DSPLIB
Complex block FIR - SP floating point 64 samples, 16 coeff 1334 1.33 2221 4.87 13039 13.04 DSPLIB
Complex block FIR - SP floating point 128 samples, 16 coeff 2646 2.65 4397 9.64 26072 26.07 DSPLIB
Real Matrix SGEMM 16x16 2405 2.41 3505 7.69 14662 14.66 DSPLIB
Complex Matrix SGEMM 16x16 4113 4.11 10884 23.87 26388 26.39 DSPLIB
Matrix Math DGEMM 16x16 5061 5.06 -- -- 14669 14.67 DSPLIB
Autocorrelation - fixed point N=32 , IMG_corr_3x3_i16s_c16s 140 0.14 189 0.41 946 0.95 IMGLIB
ArcTan2 - SP floating point 24 0.02 31 0.07 49 0.05 MATHLIB
Log10 - Single precision 14 0.01 18 0.04 56 0.06 MATHLIB
Square Root - single precision float 6 0.01 6 0.01 5 0.01 MATHLIB

Download the TI DSP Benchmarking application note to learn how to reproduce these benchmarks on TI hardware.

These charts show relative core performance on selected routines based on the benchmark information above.

For the below chart comparing the performance of the C66x DSP core to the C674x DSP core, the performance of the C674x has been normalized to 1. The C66x core performance is shown relative to the C674x. This comparison takes processor speed into account.

Performance comparison of C66x DSP core to C674x DSP core

* Complex FFT, 1k points, single precision, floating point.
** Complex block FIR, single precision, floating point, 128 samples, 16 coefficients.
*** Complex matrix SGEMM 16x16.

For the below chart comparing the performance of the C66x DSP core, the C674x DSP core and the ARM Cortex-A15 core, the performance of the Cortex-A15 has been normalized to 1. The C66x core performance and the C674x core performance are shown relative to the Cortex-A15. This comparison takes processor speed into account.

Performance comparison of C66x DSP, C674x DSP and ARM Cortex-A15 core

* Complex FFT, 1k points, single precision, floating point.
** Complex block FIR, single precision, floating point, 128 samples, 16 coefficients.
*** Complex matrix SGEMM 16x16.