SPRUJG0B December   2024  – November 2025 F29H850TU , F29H859TU-Q1

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
  5. 2Performance Optimization
    1. 2.1 Compiler Settings
      1. 2.1.1 Enabling Debug and Source Inter-Listing
      2. 2.1.2 Optimization Control
      3. 2.1.3 Floating-Point Math
      4. 2.1.4 Fixed-Point Division
      5. 2.1.5 Single vs Double Precision Floating-Point
      6. 2.1.6 Link-Time Optimization (LTO)
    2. 2.2 Memory Settings
      1. 2.2.1 Executing Code From RAM
      2. 2.2.2 Executing Code From Flash
      3. 2.2.3 Data Placement
    3. 2.3 Code Construction and Configuration
      1. 2.3.1 Inlining
      2. 2.3.2 Intrinsics
      3. 2.3.3 Volatile Variables
      4. 2.3.4 Function Arguments
      5. 2.3.5 Enabling Wider Data Accesses
      6. 2.3.6 Auto Code-Generation Tools
      7. 2.3.7 Accurately Profiling Code
    4. 2.4 Application Code Optimization
      1. 2.4.1 Optimized SDK Libraries
      2. 2.4.2 Optimizing Code-Size With Libraries
      3. 2.4.3 C29 Special Instructions
      4. 2.4.4 C29 Parallelism
      5. 2.4.5 32-Bit Variables and Writes Preferred
      6. 2.4.6 Coding Style and Impact on Performance
  6. 3References
  7. 4Revision History

Coding Style and Impact on Performance

The way the developer writes C code can have an impact on performance. This section illustrates specific example scenarios where this can occur.

  • With loops, performance can vary depending on whether the loop counter is a fixed or a variable value. With a fixed value, the compiler has complete knowledge of the loop, and can determine the approach that maximizes performance - whether that means unrolling the loop, software pipelining the loop, and so forth. For example, with matrix multiplication, the performance is significantly better when the matrix row and column sizes are specified in the loops, versus passing them in as function arguments.
  • In some cases, merging independent loops into a single loop can speed up performance. The first code block below generates sub-optimal code. The second code block is more optimized.
uint8_T Bit_Manipulation_Test_Case(void) 
{ 
uint32_T result; 
uint32_T i; 
uint8_T valid; 
result = 0u; 
valid = TC_OK; 
i = 0u; 
/* Or Test Case */
for(i=0; i<BIT_MANIPULATION_ARRAY_SIZE; i++) 
{ 
    result = (Swc1_Bit_Manipulation.Operand_A[i] | Swc1_Bit_Manipulation.Operand_B[i]); 
    if(result != Swc1_Bit_Manipulation.Result_Or[i]) 
    { 
        valid = TC_NOK; 
    } 
} 
/* And Test Case */
for(i=0; i<BIT_MANIPULATION_ARRAY_SIZE; i++) 
{ 
    result = (Swc1_Bit_Manipulation.Operand_A[i] & Swc1_Bit_Manipulation.Operand_B[i]); 
    if(result != Swc1_Bit_Manipulation.Result_And[i]) 
    { 
        valid = TC_NOK; 
    } 
} 
/* Xor Test Case */
for(i=0; i<BIT_MANIPULATION_ARRAY_SIZE; i++) { 
    result = (Swc1_Bit_Manipulation.Operand_A[i] ^ Swc1_Bit_Manipulation.Operand_B[i]); 
    if(result != Swc1_Bit_Manipulation.Result_Xor[i]) { 
        valid = TC_NOK; 
    } 
} 
    return valid; 
}
uint8_T Bit_Manipulation_Test_Case(void) 
{ 
uint32_T result_or,result_and,result_xor; 
uint32_T i; 
uint8_T valid; 
result_or = 0u; 
result_and = 0u; 
result_xor = 0u; 
valid = TC_OK; 
i = 0u; 
/* Or, And, Xor Test Case */ 
for(i=0; i<BIT_MANIPULATION_ARRAY_SIZE; i++) 
{ 
    result_or = (Swc1_Bit_Manipulation.Operand_A[i] | Swc1_Bit_Manipulation.Operand_B[i]); 
    if(result_or != Swc1_Bit_Manipulation.Result_Or[i]) 
    { 
        valid = TC_NOK; 
    } 
    result_and = (Swc1_Bit_Manipulation.Operand_A[i] & Swc1_Bit_Manipulation.Operand_B[i]); 
    if(result_and != Swc1_Bit_Manipulation.Result_And[i]) 
    { 
        valid = TC_NOK; 
    } 
    result_xor = (Swc1_Bit_Manipulation.Operand_A[i] ^ Swc1_Bit_Manipulation.Operand_B[i]); 
    if(result_xor != Swc1_Bit_Manipulation.Result_Xor[i]) 
    { 
        valid = TC_NOK; 
    } 
} 
return valid; 
}
  • Also, if conditional statements involve loads from memory or access to global variables, it may be helpful for them to be pre-loaded into local variables if possible. This allows for increased use of the wider register set on the C29 CPU. It also prevents pipeline stalls that occur from loading a value from a memory and immediately performing a conditional check on it. The first code block below generates sub-optimal code. The second code block is more optimized.

// Variables are globals
if(xx ==FALSE) 
{ 
    A = b * c + d; 
    E = f * c + d;
    if(dd > high) 
    { 
        D = high; 
    } elseif (dd < low) {
        if(kk == RUN)
        { 
            D = low; 
        } else { 
            D = dd; 
        } 
    } else { 
        D=dd; 
    } 
}
// Local copies of globals
float b_temp=b, c_temp=c, d_temp=d, f_temp=f, high_temp=high, low_temp=low, dd_temp=dd, kk_temp=kk, D_temp=D, g_temp=g, h_temp=h;
if(xx==FALSE) 
{ 
    A = b_temp * c_temp + d_temp; 
    E = f_temp * c_temp + d_temp;
    if(dd_temp > high_temp) 
    { 
        D_temp = high_temp; 
    } elseif (dd_temp < low_temp) {
        if(kk_temp == RUN)
        { 
            D_temp = low_temp; 
        } else { 
            D_temp = dd_temp; 
        } 
    } else { 
        D_temp=dd_temp; 
    } 
}