SPRUJG0B December   2024  – November 2025 F29H850TU , F29H859TU-Q1

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
  5. 2Performance Optimization
    1. 2.1 Compiler Settings
      1. 2.1.1 Enabling Debug and Source Inter-Listing
      2. 2.1.2 Optimization Control
      3. 2.1.3 Floating-Point Math
      4. 2.1.4 Fixed-Point Division
      5. 2.1.5 Single vs Double Precision Floating-Point
      6. 2.1.6 Link-Time Optimization (LTO)
    2. 2.2 Memory Settings
      1. 2.2.1 Executing Code From RAM
      2. 2.2.2 Executing Code From Flash
      3. 2.2.3 Data Placement
    3. 2.3 Code Construction and Configuration
      1. 2.3.1 Inlining
      2. 2.3.2 Intrinsics
      3. 2.3.3 Volatile Variables
      4. 2.3.4 Function Arguments
      5. 2.3.5 Enabling Wider Data Accesses
      6. 2.3.6 Auto Code-Generation Tools
      7. 2.3.7 Accurately Profiling Code
    4. 2.4 Application Code Optimization
      1. 2.4.1 Optimized SDK Libraries
      2. 2.4.2 Optimizing Code-Size With Libraries
      3. 2.4.3 C29 Special Instructions
      4. 2.4.4 C29 Parallelism
      5. 2.4.5 32-Bit Variables and Writes Preferred
      6. 2.4.6 Coding Style and Impact on Performance
  6. 3References
  7. 4Revision History

C29 Parallelism

  • The C29 compiler can leverage the parallelism of the C29 architecture, executing multiple instructions in parallel especially in cases where independent operations occur sequentially. For example, the code block below demonstrates two identical PID operations that occur sequentially. If DCL_runPID is declared as a static function in a header file, the compiler can perform inlining and then perform the two PID operations in parallel.
    Note: However, in order to achieve performance improvement with the parallelized operations, it may also be necessary to place memory objects in different RAM blocks so as to avoid memory stalls when simultaneously accessing objects associated with independent execution (e.g. PID) instances.
    float run_dualPID(DCL_PID *restrict p1, DCL_PID *restrict p2,float32_t rk1, float32_t yk1, float32_t lk1,float32_t rk2, float32_t yk2, float32_t lk2)
    {
       float x = DCL_runPID_C3(p1, rk1, yk1, lk1);
       float y = DCL_runPID_C3(p2, rk2, yk2, lk2);
       return x+y;
    }
  • Binary LUT search - binary look-up table searches are common in motor control applications, and can be optimized by changing the conditional loop to a fixed iteration loop. The F29-SDK provides an example in examples/rtlibs/fastmath/binary_lut_search.