>> Semiconductor Home > Products > Digital Signal Processors > DSP Overview > TMS320C6000 Platform Overview >

Resources  TMS320C6000™ Highest Performance DSP PlatformGray Rule

c64x DSP Update

> Platform Summary
> VelociTI™ Architecture
> Applications
> Development Tools
> Technical Documentation
   Search
> Platform Benchmarks
   > C62x DSPs
   > C64x DSPs
   > C67x DSPs
   > C6000 Compiler
      Benchmarks

> C62x™ Fixed-Point DSPs
> C67x™ Floating-Point DSPs

> C6000 Compiler
> MultiChannel Vocoder
   Technology Design Kit
> Foundation Software
> Training
> DSP References

C6000 Roadmap
Click here to view C6000 roadmap

  
C67x™ Floating-Point Benchmarks

         Filters
         Vector
         FFTs
         Search
         Math
         3D-Graphics and Imaging

FILTERS
Benchmark Description Formula
Block FIR The FIR assumes that the number of filter coefficients (numH) is a multiple of 2 and greater than or equal to 4 and the number of outputs (numY) is a multiple of 4 and greater than or equal to 4.  The input, output, and coefficient arrays must start on the same double-word boundary to avoid memory bank hits.  ((2*numH)+10)*(numY/4)+8 
For numH=64 and numY=64 
2216 cycles or 13.296 µsec
Block IIR The IIR assumes that the order is a multiple of 2 and greater than or equal to 4, and the number of outputs (numY) is a multiple of 2 and greater than or equal to order+2.  To avoid bank hits, the input and output arrays must be aligned on opposite double-word boundaries, and the a and b coefficient arrays must be aligned on opposite double-word boundaries. (order+10)*(numY-order)+15 
For order=16 and numY=64 
1263 cycles or 7.578 µsec 
Cascaded IIR Biquads The Biquad assumes that the number of biquads (numB) is a multiple of 2 and greater than or equal to 2, and it processes one input and produces one output.  There are no memory bank hits regardless of where the arguments are placed in memory. 4*(numB)+29 
For numB=8 
61 cycles or 366 nsec
Circular Block FIR The circular FIR assumes that the number of filter coefficients (hsize) is a multiple of 2 and greater than or equal to 4 and the number of outputs (ysize) is a multiple of 4 and greater than or equal to 4.  The input, output, and coefficient arrays must start on the same double-word boundary to avoid memory bank hits.  Circular addressing is used for the input array (x) with a circular buffer size 2^(size+1) and the routine uses "index" to define the initial offset into the buffer. ((2*hsize)+10)*(ysize/4)+9 
For hsize=64 and ysize=64 
2217 cycles or 13.302 µsec
Convolution The convolution assumes that the output array length (nr) is a multiple of 4 and greater than or equal to 4, and the second input array length (nb) is a multiple of 2 and greater than or equal to 4.  The first input array length should be (nr+nb-1) where the first nb-1 and last nb-1 values are zero.  If all three arrays are aligned on the same double-word boundary and nb is not a multiple of 4 there will be no memory bank hits (if it is a multiple of 4 there will be nr/4 bank hits). (nb/2)*nr+(nr/2)*5+8 
For nb=8 and nr=20 
138 cycles or 828 nsec
Cross Correlation The Correlation assumes that the output array length (nr) is a multiple of 4 and greater than or equal to 4, and the second input array length (nb) is a multiple of 2 and greater than or equal to 4.  The first input array length should be (nr+nb-1) where the first nb-1 and last nb-1 values are zero.  If all three arrays are aligned on the same double-word boundary and nb is not a multiple of 4 there will be no memory bank hits (if it is a multiple of 4 there will be nr/4 bank hits). (nb/2)*nr+(nr/2)*5+8 
For nb=8 and nr=20 
138 cycles or 828 nsec
Autocorrelation Autocorrelation assumes that the correlation is length M, the output array is length M and the input array is length (M+N) where the first M values are zero.  The value of N should be a multiple of 2 and greater than or equal to 4.  The value of M should be a multiple of 4 and greater than or equal to 4.  To prevent memory bank hits, the input array should be aligned on an even double-word boundary (bank 0), and the output array should be aligned on the next word boundary (bank 2).  (N/2)*M+(M/2)*5+9 
For M=8 and N=18 
101 cycles or 606 nsec
LMS FIR FilterThe Least Mean Squares adaptive FIR filter assumes that the number of coefficients (numH) is a multiple of 4 and at least 4. The number of inputs must be equal to numH+numY-1, where numY is the number of outputs.((5*numH)/4+27)*numY+17
For numH=64 and numY=64
6865 cycles or 41.19 µsec
Complex FIR FilterThe complex FIR filter assumes that the number of complex coefficients (numH) is a multiple of 2 and at least 4. The number of complex inputs must be equal to numH+numY-1, where numY is the number of complex outputs.((2*numH)+14)*numY+17+numY-1
For numH=64 and numY=64
9168 cycles or 55.008 µsec
Inverse Analysis Lattice FilterThis routine implements an inverse analysis lattice filter (FIR filter or IIR filter with no poles) and stores the result in f. The filter consists of n stages. The value of f is calculated by doing a multiply accumulate on the backward error coefficients, b, and filter gains, k. New backward error coefficients are also calculated.4*n+22
For n=8
54 cycles or 324 nsec
Forward Synthesis Lattice FilterThis routine implements a forward synthesis lattice filter (IIR filter with no zeros) and stores the result in f. The filter consists of n stages. The value of f is calculated by doing a multiply accumulate on the backward error coefficients, b, and filter gains, k. New backward error coefficients are also calculated. The value of n must be at least 4.4*n+24
For n=8
56 cycles or 336 nsec

Return to top

VECTOR
Benchmark Description Formula
dot product The function performs the dot product of two vectors of length N where N is a multiple of 2 and greater than or equal to 10.  No memory bank hits occur if the arrays are aligned on opposite double-word boundaries. N/2 + 24 
For N=100 
74 cycles or 444 nsec
Matrix-Vector Multiply (any size) The function performs the multiplication of a n x m matrix by a m x 1 vector.  The a and b arrays should be placed on opposite double-word boundaries to prevent memory bank hits. (n+20)*m+1 
For m=3 and n=3 
70 cycles or 420 nsec 
 
Matrix-Vector Multiply (with even number of columns) The function performs the multiplication of a n x m matrix by a m x 1 vector.  The column dimension (m) must be greater than or equal to 2 and a multiple of 2.  The a and b arrays should be placed on opposite double-word boundaries to prevent memory bank hits. ((n/2)+24)*m+7 
For m=3 and n=20 
109 cycles or 654 nsec 
 
Weighted vector sum The function performs an N element vector sum of two vectors with one vector weighted by a constant. The result is stored in a third vector.  The value of N must be a multiple of 2 and greater than or equal to 12.  To prevent bank hits, the two input vectors should be aligned on opposite double-word boundaries. N+12 
For N=100 
112 cycles or 672 nsec
Vector Sum The function calculates the sum of two vectors of length N where N is a multiple of 2 and greater than or equal to 6.  To avoid memory bank hits, the vectors should be aligned on opposite double-word boundaries. N+8 
For N=100 
108 cycles or 648 nsec
Sum of squares The function calculates the sum of the squares of the N elements of the vector.  The value N must be a multiple of 2 and greater than or equal to 12.  This function performs extraneous loads. N/2 + 24 
For N=100 
74 cycles or 444 nsec

Return to top

FFTs
Benchmark Description Formula
Complex Radix 4 FFT  The function calculates the complex Radix 4 DIF FFT of size N with digit-reversed output and normal order input. (log4(N))*(14*N/4+23)+20 
For N=1024 
18,055 cycles or 108.33 µsec 
Complex Radix 2 FFT The function calculates the complex Radix 2 DIT FFT of size N with bit-reversed output, and coefficients, and normal order input.  ((2*N)+23)*log2 (N)+6
For N=1024 
20,716 cycles or 124.30 µsec 
Inverse Complex Radix 2 FFT  The function calculates the inverse complex Radix 2 DIF FFT of size N with bit-reversed input, normal order output, and bit-reversed coefficients.. ((2*N)+16)*log2(N)+25 
For N=1024 
20,665 cycles or 124 µsec 
Complex Bit-Reverse  The function performs the bit-reversal for an array of N complex SP floating point numbers. N must be a power of 2. (N/4)*11+9 
For N=1024 
2,825 cycles or 16.95 µsec 
Two-level-cache efficient mixed-radix forward FFT  The function performs a mixed radix forward FFT for floating point input and coefficient data using a special sequence of coefficients. This FFT uses a redundant sequence of twiddle factors to allow a linear access through the data. 3.25 * ceil(log4(N) -1) * N + 3*N + 179
for N = 1024,
cycles = 16,563 

Return to top

SEARCH
Benchmark Description Formula
Vector Max The function finds the maximum value in a vector of length N where N is a multiple of 5 and greater than or equal to 10.  No memory bank hits occur regardless of where arguments are in memory. 3*N/5+14 
For N=100 
74 cycles or 444 nsec

Return to top

MATH
Benchmark Description Formula
Single Precision Floating Point Reciprocal The function performs the reciprocal using the RCPSP instruction and 2 iterations of the Newton-Rhapson algorithm to produce 23 bits of accuracy. 8 bits of accuracy can be achieved by simply using the RCPSP instruction by itself. 16 bits of accuracy is achieved with only one Newton-Rhapson iteration. 28 cycles
Double Precision Floating Point Reciprocal The function performs the reciprocal using the RCPDP instruction and 2 iterations of the Newton-Rhapson algorithm. 84 cycles
Single Precision Floating Point Reciprocal Square RootThe function performs the reciprocal using the RCPSP instruction and 2 iterations of the Newton-Rhapson algorithm to produce 23 bits of accuracy. 8 bits of accuracy can be achieved by simply using the RCPSP instruction by itself. 16 bits of accuracy is achieved with only one Newton-Rhapson iteration.34 cycles
Double Precision Floating Point Reciprocal Square RootThis function performs the DP square root reciprocal using the RSQRDP instruction and 3 iterations of the Newton-Rhapson algorithm.113 cycles

Return to top

3D GRAPHICS AND IMAGING
Benchmark Description Formula
3D Geometry TransformationThis function performs the "front end" of a 3D graphics transformation pipeline.  It performs geometry transformation, clipping preprocessing, perspective projection, and viewpoint mapping.Approx 10.4M vertices/second
Collision Detection This function takes a vector of 3D points and translates them in one dimension.  The 1D distance from the translated point to the parameter "point" is calculated.  If the distance is less than the parameter "distance", a collision is detected and the address of point is returned.  There are no memory bank hits regardless of where the function parameters are placed in memory; but, the function performs extraneous loads. (N/2)*3+32 (worst case) 
For N=10,000 
15,032 cycles or 90.192 µsec

Return to top