SPRACN4 August   2019 66AK2G12 , 66AK2H06 , 66AK2H12 , 66AK2H14 , OMAP-L132 , OMAP-L138 , TMS320C6452 , TMS320C6454 , TMS320C6455 , TMS320C6457 , TMS320C6652 , TMS320C6654 , TMS320C6655 , TMS320C6657 , TMS320C6672 , TMS320C6674 , TMS320C6678 , TMS320C6742 , TMS320C6743 , TMS320C6745 , TMS320C6746 , TMS320C6747 , TMS320C6748

 

  1.   Using DSPLIB FFT Implementation for Real Input and Without Data Scaling
    1.     Trademarks
    2. 1 Real Input Introduction
      1. 1.1 Prerequisites
      2. 1.2 Computing a Length N/2 Complex FFT From a Length N Real Input Sequence
      3. 1.3 Returning to a Length N Real Sequence Using a Length N/2 Complex IFFT
      4. 1.4 Benchmark of the Efficient Compute of FFT
    3. 2 Fixed Point FFT With No Data Scaling
      1. 2.1 Suggested Change
      2. 2.2 Example Application
    4. 3 Summary
    5. 4 References

Suggested Change

The change to both the routines (DSP_fft16x16 and DSP_ifft16x16) is similar. The below description suggests modifications to the serial assembly (SA) implementation of the kernels. The kernels are located at:

  • [DSPLIB_INSTALLATION_DIR]\packages\ti\dsplib\src\DSP_fft16x16\c64P\DSP_fft16x16_sa.sa
  • [DSPLIB_INSTALLATION_DIR]\packages\ti\dsplib\src\DSP_fft16x16\c64P\DSP_fft16x16_sa.sa

Change 1: Identify the below code in the SA files:

;----------------------------------------------------------; ; Compute first set of outputs: ; ; ; ; x0[0]= xh0_0 + xh20_0 + 1 >> 1 ; ; x0[1]= xh1_0 + xh21_0 + 1 >> 1 ; ; x0[2]= xh0_1 + xh20_1 +1 >> 1 ; ; x0[3]= xh1_1 + xh21_1 +1 >> 1 ; ;----------------------------------------------------------; AVG2.2 B_xh1_0_xh0_0, A_xh21_0_xh20_0, B_x_1o_x_0o AVG2.2 B_xh1_1_xh0_1, A_xh21_1_xh20_1, B_x_3o_x_2°

Update the code to:

ADD2.2 B_xh1_0_xh0_0, A_xh21_0_xh20_0, B_x_1o_x_0o ADD2.2 B_xh1_1_xh0_1, A_xh21_1_xh20_1, B_x_3o_x_2°

Note replacement of AVG2 instruction with ADD2.

Change 2: Identify the below code in the SA files:

;---------------------------------------------------------; ; The following code computes intermediate results for: ; ; ; ; si10' = -si10 twiddle table has -sin factors ; ; ; x2[h2 ] = (co10 * xt0_0 + si10'* yt0_0 + 0x8000) >> 16 ; ; x2[h2+1] = (co10 * yt0_0 - si10'* xt0_0 + 0x8000) >> 16 ; ; x2[h2+2] = (co11 * xt0_1 + si11'* yt0_1 + 0x8000) >> 16 ; ; x2[h2+3] = (co11 * yt0_1 - si11'* xt0_1 + 0x8000) >> 16 ; ;---------------------------------------------------------; FFT Implementation With No Data Scaling 2 CMPYR .M1 A_co10_si10, B_yt1_0_xt1_0, A_xh2_1_0; CMPYR .M1 A_co11_si11, B_yt1_1_xt1_1, A_xh2_3_2; ;---------------------------------------------------------; ; ; x2[l1 ] = (co20 * xt1_0 + si20'* yt1_0 + 0x8000) >> 16 ; ; x2[l1+1] = (co20 * yt1_0 - si20'* xt1_0 + 0x8000) >> 16 ; ; x2[l1+2] = (co21 * xt1_1 + si21'* yt1_1 + 0x8000) >> 16 ; ; x2[l1+3] = (co21 * yt1_1 - si21'* xt1_1 + 0x8000) >> 16 ; ; ; These four results are retained in registers and a ; ; double word is formed so that it can be stored with ; ; one STDW. ; ;---------------------------------------------------------; ; This equation ONLY has minus sign for x, y components CMPYR .M1 A_co20_si20, A_myt0_0_mxt0_0, A_xl1_1_0; CMPYR .M1 A_co21_si21, A_myt0_1_mxt0_1, A_xl1_3_2; ;---------------------------------------------------------; ; The following code computes intermediate results for: ; ; ; x2[l2 ] = (co30 * xt2_0 + si30'* yt2_0 + 0x8000) >> 16 ; ; x2[l2+1] = (co30 * yt2_0 - si30'* xt2_0 + 0x8000) >> 16 ; ; x2[l2+2] = (co31 * xt2_1 + si31'* yt2_1 + 0x8000) >> 16 ; ; x2[l2+3] = (co31 * yt2_1 - si31'* xt2_1 + 0x8000) >> 16 ; ;---------------------------------------------------------; CMPYR .M2 B_co30_si30, B_yt2_0_xt2_0, B_xl2_1_0 CMPYR .M2 B_co31_si31, B_yt2_1_xt2_1, B_xl2_3_2

Update the code to:

;---------------------------------------------------------; ; The following code computes intermediate results for: ; ; ; si10' = -si10 twiddle table has -sin factors ; ; ; ; x2[h2 ] = (co10 * xt0_0 + si10'* yt0_0 + 0x8000) >> 16 ; ; x2[h2+1] = (co10 * yt0_0 - si10'* xt0_0 + 0x8000) >> 16 ; ; x2[h2+2] = (co11 * xt0_1 + si11'* yt0_1 + 0x8000) >> 16 ; ; x2[h2+3] = (co11 * yt0_1 - si11'* xt0_1 + 0x8000) >> 16 ; ;---------------------------------------------------------; CMPYR1 .M1 A_co10_si10, B_yt1_0_xt1_0, A_xh2_1_0; CMPYR1 .M1 A_co11_si11, B_yt1_1_xt1_1, A_xh2_3_2; ;---------------------------------------------------------; ; ; x2[l1 ] = (co20 * xt1_0 + si20'* yt1_0 + 0x8000) >> 16 ; ; x2[l1+1] = (co20 * yt1_0 - si20'* xt1_0 + 0x8000) >> 16 ; ; x2[l1+2] = (co21 * xt1_1 + si21'* yt1_1 + 0x8000) >> 16 ; ; x2[l1+3] = (co21 * yt1_1 - si21'* xt1_1 + 0x8000) >> 16 ; ; ; FFT Implementation With No Data Scaling 3 ; These four results are retained in registers and a ; ; double word is formed so that it can be stored with ; ; one STDW. ; ;---------------------------------------------------------; ; This equation ONLY has minus sign for x, y components CMPYR1 .M1 A_co20_si20, A_myt0_0_mxt0_0, A_xl1_1_0; CMPYR1 .M1 A_co21_si21, A_myt0_1_mxt0_1, A_xl1_3_2; ;---------------------------------------------------------; ; The following code computes intermediate results for: ; ; ; x2[l2 ] = (co30 * xt2_0 + si30'* yt2_0 + 0x8000) >> 16 ; ; x2[l2+1] = (co30 * yt2_0 - si30'* xt2_0 + 0x8000) >> 16 ; ; x2[l2+2] = (co31 * xt2_1 + si31'* yt2_1 + 0x8000) >> 16 ; ; x2[l2+3] = (co31 * yt2_1 - si31'* xt2_1 + 0x8000) >> 16 ; ;---------------------------------------------------------; CMPYR1 .M2 B_co30_si30, B_yt2_0_xt2_0, B_xl2_1_0 CMPYR1 .M2 B_co31_si31, B_yt2_1_xt2_1, B_xl2_3_2

Note the CMPYR instruction has been replaced with CMPYR1 intrinsic The updates to the FFT routines can be incorporated in the application in two ways:

  • The DSPLIB SW can be recompiled so that the generated library includes the updated kernels. To do this, recompile the library project [DSPLIB_INSTALLATION_DIR] \dsplib_v210\dsplib64plus.pjt. The updated library will be generated at [DSPLIB_INSTALLATION_DIR]\Release\dsplib64plus_rebuild.lib.
  • The updated kernels can be directly included in the application project. Including the updated kernels will override the kernels that are included in the dsplib library.