>> Semiconductor Home > Products > Digital Signal Processors > DSP Overview > TMS320C6000 Platform Overview >

Resources  TMS320C6000™ Highest Performance DSP PlatformGray Rule

c64x DSP Update

> Platform Summary
> VelociTI™ Architecture
> Applications
> Development Tools
> Technical Documentation
   Search
> Platform Benchmarks
   > C62x DSPs
   > C64x DSPs
   > C67x DSPs
   > C6000 Compiler
      Benchmarks

> C62x™ Fixed-Point DSPs
> C67x™ Floating-Point DSPs

> C6000 Compiler
> MultiChannel Vocoder
   Technology Design Kit
> Foundation Software
> Training
> DSP References

C6000 Roadmap
Click here to view C6000 roadmap

  
C62x™ DSP Benchmarks

         Filters
         Vector
         FFTs
         Search
         Math
         Imaging
         Telecom

FILTERS
Benchmark Description Formula
FIR-coefficients a multiple of 4This FIR assumes the number of filter coefficients is a multiple of 4 and the number of output samples is a multiple of 2. It operates on 16-bit data with a 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N coefficients.M*(N+8)/2 + 6
For N=32 and M=100
2006 cycles or 10.03 µsec
FIR-coefficients a multiple of 8This FIR assumes the number of filter coeficients is a multiple of 8 and the number of output samples is a multiple of 2. It operates on 16-bit data with a 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N h coefficients.M*N/2 + 13
For N=32 and M=100
1613 cycles or 8.06 µsec
Complex FIRFIR operates on complex 16-bit data with a complex 32-bit accumulate. This routine has no memory hits regardless of where x, h, and y arrays are located in memory. The filter is M output samples and N coefficients.2*M*N + 10
For M = 100 and N = 32:
6410 cycles or 32 µsec
LMS FIR - coefficients a multiple of 2Least Mean Square Adaptive Filter. Computes an update of all N coeficients by adding the weighted error times the inputs to the original coefficients. This assumes single sample input followed by the last N-1 inputs and N coefficients.1.5*N+16
For N=30
61 cycles or 305 nsec
LMS FIR - coefficients a multiple of 8Least Mean Square Adaptive Filter. Computes an update of all N coeficients by adding the weighted error times the inputs to the original coefficients followed by an FIR with N coefficients and M output samples and an error calculation. This assumes that N is a multiple of 8. (N=number of data samples, multiple of 8 >=8)M*(9/8*N+15)+5
IIR filterPerforms an Auto-regressive moving-average (ARMA) filter with 4 auto-regressive filter coefficients and 5 moving-average filter coefficients for M output samples. Output vector is stored to two locations. This routined is used as a high pass filter in the VSELP vocoder.(M*5 + 16)
For M = 160:
816 cycles or 4.08 µsec
FIR CircularFinite Impulse Response Filter. Uses circular addressing with initial index. Performs filtering 2 samples at a time.
(N=number of data samples, even >=2)
(M=number of filter coefficients, multiple of 4 >=4)
M*(N+11)/2+13
For N=32 and M=32
701 cycles or 3.505 µsec
Lattice AnalysisLattice Filter - Inverse - Analysis.
(N=number of coefficients)
1.5*N+10
For N=10
25 cycles or 125 nsec
Lattice SynthesisLattice Filter - Forward - Synthesis.
(N=number of data samples, even >= 6)
2N+18
For N=10
38 cycles or 190 nsec
IIR with 4 biquads cascadedInfinite Impulse Response Filter. Direct Form II - 4 Multiplies. Processes 2 samples at a time. (N=number of cascaded biquads)4N+16
For N=10
56 cycles or 280 nsec
AutocorrelationPerforms autocorrelation of a 16-bit vector. Nested loop with M inner loop multiply accumulates and outer loops.(N/2) *M + 16 + M/4
For N=160 and M=10;
816 cycles or 4.08 µsec

Return to top

VECTOR
Benchmark Description Formula
dot productDot product of two vectors of length NN/2 + 8
For N = 100
58 cycles or 290 nsec
Weighted vector sumPerforms an N element vector sum of two vectors with one vector weighted by constant. The result is stored in a third vector.N+10
For N = 40:
49 cycles or 245 nsec
Vector dot product and squarePerforms an N element dot product and each of the N elements of one of the vectors is squared and accumulated. This is used to compute G in the VSELP coder.N + 8
For N = 40:
48 cycles or 240 nsec
Block moveMove N 16-bit elements from one memory location to another.N/2 + 5
For N = 40:
25 cycles or 125 nsec
Sum of squaresEach of N elements in a vector is squared and accumulated. This particular loop is used to compute Gl in the VSELP vocoder codebook search.(N-1)/2 + 9
For N = 21:
19 cycles

Return to top

FFTs
Benchmark Description Formula
Two-level-cache efficient Complex Radix 4 FFT Complex Radix 4 FFT of size N. This FFT uses a redundant sequence (N twiddles for N-point FFT) of twiddle factors to allow a linear access through the data. This linear access consumes the entire contents of a cache line before it uses another one resulting in efficient cache usage.10 * log4(N) * (0.25 * N + 3) + 22
for N = 1024,
cycles = 12972
Complex Radix 4 FFT Complex Radix 4 FFT of size NLog(base4)N * (10 * N/4 + 33) + 7 + N/4
For N = 1024:
13228 cycles or 66 µsec
Complex Radix 2 FFTComplex Radix 2 FFT of size NLog(base2)N * (4 * N/2 + 7) + 9 + N/4
For N = 1024:
20815 cycles or 104 µsec
Bit ReverseThe Bit-Reverse routine performs the bit-reversal of length N on an array of 16-bit complex data length N.Cycle Count: 7*(N/4 + 2) + 14
For N = 1024 Cycle Count = 1820 or 9.1µs
Lookup Table Size: 32 Halfwords (64 Bytes)

Return to top

SEARCH
Benchmark Description Formula
Minimum energy error searchPerforms a dot product on 256 pairs of 9 element vectors and searches for the pair of vectors which produces the maximum dot product result. This is a large part of the VSELP vocoder codebook search.(256/2)*9 + 14
1166 cycles or 5.83 µsec
Vector MaxFinds the maximum value in a vector of length N.N/2 + 13
For N = 100:
64 cycles or 320 nsec
Vector Max IndexFinds the maximum value in a vector of length N and stores the index of that location.2N/3 + 12
For N = 100:
79 cycles or 395 nsec
codebook search for VSELPPerforms VSELP vocoder codebook search. The C source code for this was written by Motorola Systems Research Laboratories and is authorized by Motorola for the use of development of North American digital cellular standards. As such, the C code cannot be shown here. This routine performs the entire v_srch.c function as written by Motorola. It involves calculating correlations between weighted basis vectors and weighted speech vector (Rm's), C0, and 0.25 * sum of Djj for G0. It then calculates all Dmj and finishes calculating G0. It then initializes the best vector to be code vector zero and performs search by finding the vector that produces the highest C^2/G value.
Loop1  Loop2  Loop3 
342 + 639 + 2087 = 3068cycles

Return to top

MATH
Benchmark Description Formula
ADD40Adds two 40-bit values to produce a 40-bit result. This code sample is not a complete N/A
ADD64Adds two 64-bit values to produce a 64-bit result. This code sample is not a complete function!N/A
SUB40Subtracts one 40-bit value from another 40-bit value to produce a 40-bit result. This code sample is NOT a complete function!N/A
SUB64Subtracts one 64-bit value from another 64-bit value to produce a 64-bit result. This code sample is NOT a complete function!N/A
DIVMOD32This routine divides two 32 bit values and returns their quotient and remainder.  The inputs are 32-bit numbers, and the result is a 32-bit number. Cycles (Min execution 16 cycles, Max execution 41 cycles). This code sample is NOT a complete function!N/A
DIVMODU32This routine divides two unsigned 32 bit values and returns their quotient and remainder.  The inputs are unsigned 32-bit numbers, and the result is a unsigned 32-bit number. Cycles (Min execution 18 cycles, Max execution 42 cycles) This code sample is NOT a complete function!N/A
MPY32This routine takes two 32 bit integer values and calculates their product. The inputs are 32-bit integer, and the result is a 32-bit integer. Cycles (See routine)  put the note. This code sample is NOT a complete function!N/A
MPY3240This routine takes two 32 bit integer values and calculates their product.  The inputs are 32-bit integer, and the result is a 40-bit integer. Cycles (See routine)This code sample is NOT a complete function!N/A
MPYU3240This routine takes two 32 bit unsigned integer values and calculates their product.  The inputs are 32-bit unsigned integer, and the result is a 40-bit unsigned integer.Cycles (See routine)This code sample is NOT a complete function!N/A
MPY40This routine takes two 40 bit integer values and calculates their product.  The inputs are 40-bit integer, and the result is a 40-bit integer. Cycles (See routine)This code sample is NOT a complete function!N/A
MPY3264This routine takes two 32 bit integer values and calculates
their product. The inputs are 32-bit integer, and the result is a 64-bit
integer.
Cycles (See routine)
N/A
MPYU3264This routine takes two 32 bit unsigned integer values and
calculates their product. The inputs are 32-bit unsigned integers, and
the result is a 64-bit unsigned integer.
Cycles (See routine) 
N/A

Return to top

GRAPHICS
Benchmark Description Formula
8x8 Block IDCT - IEEE-1180 CompliantThe idct_8x8 algorithm performs an IEEE-1180 compliant IDCT, complete with rounding and saturation to signed 9-bit quantities. The array should be aligned to a 32-bit boundary, and be laid out equivalently to the C array idct_data[num_idcts+1][8][8]. The input coefficients are assumed to be signed 12-bit cosine terms.Cycles = 62 + 168 * num_idcts for num_idcts >= 1
230 cycles or 1.15 µs for one 8x8 Block of Data
8x8 Block FDCT With RoundingThe fdct routine accepts a list of 8x8 pixel blocks and performs FDCTs on each. The array should be laid out equivalently to the C array dct_data[num_fdcts+1][8][8]. All operations in this array are performed entirely in-place. Input values are stored in shorts, and may be in the range [-512,511]. Input terms are expected to be signed 11Q0 values, producing signed 15Q0 results.Cycles = 48 + 160 * num_fdcts for num_fdcts >= 1
208 cycles or 1.04 µs for one 8x8 Block of Data
GouraudGouraud Shading of a scanline of pixels. Four pixels of a line at a time are processed. (N=pixels >=4, multiple of 4 pixels)2N+7
For 1024 pixels taken 4 pixels at a time
2055 cycles or 10.275 µsec

Return to top

TELECOM
Benchmark Description Formula
Viterbi EqualizationViterbi Equalizer - GSM (N=number of data points)43N + 2
For N=120
5162 cycles or 25.810 µsec
Viterbi GSMViterbi Channel Decoder (GSM) (N=number of data points)38N + 12 + N/4
For N=189
7242 cycles or 36.21 µsec
Viterbi IS54Viterbi Channel Decoder (IS54)
(N=number of data points)
66.5*N+16
For N=189
5934 cycles or 29.67µsec
Viterbi V.32Viterbi V.32 PSTN Trellis Decoder. (N=number of data points)64 cycles or 320nsec

Return to top