Device Benchmarks
The benchmarks in this section help illustrate device level performance. See core benchmarks for performance of a single core.
TMS3206678 FFT performance vs. number of cores used
The following graph shows the relative increase in performance as the number of cores used in FFT processing is increased using the C6678 multicore DSP. Note that as the number of cores increases to 8, the system performance asymptotes to ~6x the single core performance. In this case we have hit the DDR bandwidth limit and the processing in the DSP is limited to the data throughput in and out of external memory. For different types of data processing this limit may or may not be hit depending on the specific algorithms data and processing requirements.
FFT computation time in ms as a function of the number of cores used
FFT size | 1x C66x | 2x C66x | 4x C66x | 8x C66x |
---|---|---|---|---|
16k | 0.473 | 0.261 | 0.159 | 0.131 |
32k | 0.915 | 0.478 | 0.278 | 0.198 |
64k | 1.857 | 0.922 | 0.508 | 0.315 |
128k | 4.1 | 2.004 | 1.06 | 0.641 |
256k | 8.795 | 4.323 | 2.228 | 1.186 |
512k | 18.669 | 9.291 | 4.704 | 3.103 |
1024k | 38.557 | 19.328 | 9.605 | 6.403 |
TMS320C6678
Multicore Fixed and Floating-Point Digital Signal Processor