SPRUIV4C May   2020  â€“ December 2023

 

  1.   1
  2.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  3. 2Introduction
    1. 2.1 C7000 Digital Signal Processor CPU Architecture Overview
    2. 2.2 C7000 Split Datapath and Functional Units
  4. 3C7000 C/C++ Compiler Options
    1. 3.1 Overview
    2. 3.2 Selecting Compiler Options for Performance
    3. 3.3 Understanding Compiler Optimization
      1. 3.3.1 Software Pipelining
      2. 3.3.2 Vectorization and Vector Predication
      3. 3.3.3 Automatic Use of Streaming Engine and Streaming Address Generator
      4. 3.3.4 Loop Collapsing and Loop Coalescing
      5. 3.3.5 Automatic Inlining
      6. 3.3.6 If Conversion
  5. 4Basic Code Optimization
    1. 4.1  Signed Types for Iteration Counters and Limits
    2. 4.2  Floating-Point Division
    3. 4.3  Loop-Carried Dependencies and the Restrict Keyword
      1. 4.3.1 Loop-Carried Dependencies
      2. 4.3.2 The Restrict Keyword
      3. 4.3.3 Run-Time Alias Disambiguation
    4. 4.4  Function Calls and Inlining
    5. 4.5  MUST_ITERATE and PROB_ITERATE Pragmas and Attributes
    6. 4.6  If Statements and Nested If Statements
    7. 4.7  Intrinsics
    8. 4.8  Vector Types
    9. 4.9  C++ Features to Use and Avoid
    10. 4.10 Streaming Engine
    11. 4.11 Streaming Address Generator
    12. 4.12 Optimized Libraries
    13. 4.13 Memory Optimizations
  6. 5Understanding the Assembly Comment Blocks
    1. 5.1 Software Pipelining Processing Stages
    2. 5.2 Software Pipeline Information Comment Block
      1. 5.2.1 Loop and Iteration Count Information
      2. 5.2.2 Dependency and Resource Bounds
      3. 5.2.3 Initiation Interval (ii) and Iterations
      4. 5.2.4 Constant Extensions
      5. 5.2.5 Resources Used and Register Tables
      6. 5.2.6 Stage Collapsing
      7. 5.2.7 Memory Bank Conflicts
      8. 5.2.8 Loop Duration Formula
    3. 5.3 Single Scheduled Iteration Comment Block
    4. 5.4 Identifying Pipeline Failures and Performance Issues
      1. 5.4.1 Issues that Prevent a Loop from Being Software Pipelined
      2. 5.4.2 Software Pipeline Failure Messages
      3. 5.4.3 Performance Issues
  7. 6Revision History

Performance Issues

You can find the following issues by examining the assembly source and the Software Pipeline Information comment block. Potential solutions are given for each condition.

  • Two Loops are Generated, One Not Software Pipelined / Duplicate Loop Generated. If you see the message "Duplicate Loop Generated" in the Software Pipeline Information comment block, or you notice that there is a second version of the loop that isn't software pipelined, it may mean that when the iteration count (iteration count) of the loop is too low, it is illegal to execute the software pipelined version of the loop that the compiler has created. In order to generate only the software pipelined version of the loop, the compiler needs to prove that the minimum iteration count of the loop would be high enough to always safe execute the pipelined version. If the minimum number of iterations of the loop is known, using the MUST_ITERATE pragma to tell the compiler this information may help eliminate the duplicate loop.
  • Loop Carried Dependency Bound is Larger than the Partitioned Resource Bound. If you see a loop carried dependency bound that is higher than the partitioned resource bound, you likely have one of two problems. First, the compiler may think there is a memory dependence from a store to a subsequent load. See the "Memory Dependencies" section of the TMS320C6000 Programmer’s Guide (SPRU198) for more information. Second, a computation in one iteration of the loop may be used in the next iteration of the loop. In this case, the only option is to try to eliminate the flow of information from one iteration to the next, thereby making the iterations more independent of each other.
  • Large Outer Loop Overhead in Nested Loop. If the inner loop count of a nested loop is relatively small, the time to execute the outer loop can become a large percentage of the total execution time. For cases where this seems to degrade the overall loop nest performance, two approaches can be tried. First, if there are not too many instructions in the outer loop, you may want to give a hint to the compiler that it should coalesce the loop nest. Try using the COALESCE_LOOP pragma and check the relative performance of the entire loop nest. If the COALESCE_LOOP pragma does not work, and the number of iterations of the inner loop is small and do not vary, fully unrolling the inner loop by hand may improve performance of the nested loop because the outer loop may be able to be software pipelined.
  • There are Memory Bank Conflicts If the compiler generates two memory accesses in one cycle and those accesses reside within the same memory block in the cache hierarchy, a memory bank stall can occur. To avoid this degradation, memory bank conflicts can be avoided by placing the two accesses in different memory blocks through use of the DATA_ALIGN pragma.

See the C7000 Optimizing C/C++ Compiler User's Guide (SPRUIG8) for information about pragmas.