SPRUIV4C May   2020  – December 2023

 

  1.   1
  2.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  3. 2Introduction
    1. 2.1 C7000 Digital Signal Processor CPU Architecture Overview
    2. 2.2 C7000 Split Datapath and Functional Units
  4. 3C7000 C/C++ Compiler Options
    1. 3.1 Overview
    2. 3.2 Selecting Compiler Options for Performance
    3. 3.3 Understanding Compiler Optimization
      1. 3.3.1 Software Pipelining
      2. 3.3.2 Vectorization and Vector Predication
      3. 3.3.3 Automatic Use of Streaming Engine and Streaming Address Generator
      4. 3.3.4 Loop Collapsing and Loop Coalescing
      5. 3.3.5 Automatic Inlining
      6. 3.3.6 If Conversion
  5. 4Basic Code Optimization
    1. 4.1  Signed Types for Iteration Counters and Limits
    2. 4.2  Floating-Point Division
    3. 4.3  Loop-Carried Dependencies and the Restrict Keyword
      1. 4.3.1 Loop-Carried Dependencies
      2. 4.3.2 The Restrict Keyword
      3. 4.3.3 Run-Time Alias Disambiguation
    4. 4.4  Function Calls and Inlining
    5. 4.5  MUST_ITERATE and PROB_ITERATE Pragmas and Attributes
    6. 4.6  If Statements and Nested If Statements
    7. 4.7  Intrinsics
    8. 4.8  Vector Types
    9. 4.9  C++ Features to Use and Avoid
    10. 4.10 Streaming Engine
    11. 4.11 Streaming Address Generator
    12. 4.12 Optimized Libraries
    13. 4.13 Memory Optimizations
  6. 5Understanding the Assembly Comment Blocks
    1. 5.1 Software Pipelining Processing Stages
    2. 5.2 Software Pipeline Information Comment Block
      1. 5.2.1 Loop and Iteration Count Information
      2. 5.2.2 Dependency and Resource Bounds
      3. 5.2.3 Initiation Interval (ii) and Iterations
      4. 5.2.4 Constant Extensions
      5. 5.2.5 Resources Used and Register Tables
      6. 5.2.6 Stage Collapsing
      7. 5.2.7 Memory Bank Conflicts
      8. 5.2.8 Loop Duration Formula
    3. 5.3 Single Scheduled Iteration Comment Block
    4. 5.4 Identifying Pipeline Failures and Performance Issues
      1. 5.4.1 Issues that Prevent a Loop from Being Software Pipelined
      2. 5.4.2 Software Pipeline Failure Messages
      3. 5.4.3 Performance Issues
  7. 6Revision History

Resources Used and Register Tables

The Resource Partition table summarizes how the instructions have been assigned to various machine resources and how they have been partitioned between the A and B side. Examples are shown below.

An asterisk (*) marks entries that determine the resource bound value (that is, the maximum mii). Because many C7000 instructions can execute on more than one functional unit, the table breaks the functional units into categories by possible resource combinations.

  • Individual Functional Units (.L, .S, .D, .M, .C units, etc.) show the total number of instructions that specifically require that unit. Instructions that can operate on multiple functional units are not included in these counts.
    ;*      .S units                                     0        0     
    ;*      .M units                                     4       12*    
    . . .
  • Grouped Functional Units (.M/.N, .L/.S, .L/.S/.C, etc) show the total number of instructions that can execute on all of the listed functional units. For example, if the .L/.S line shows an A-side value of 14 and a B-side value of 12, it means that there are 14 instructions that will execute on either .L1 or .S1 and 12 instructions that will execute on either .L2 or .S2.
    ;*      .L/.S units                                  1        8     
    ;*      .L/.S/.C units                               0        0     
    . . .
  • .X cross paths shows the number of cross path buses needed to move data from one datapath to another (A-to-B or B-to-A).
    ;*      .X cross paths                              13*       0     
  • Bound: shows the minimum ii at which the loop can software pipeline when only considering instructions that can operate on the set of functional units listed on that line. For example, if the .L .S .LS line shows an A-side value of 3 and a B-side value of 2, it means that there are enough instructions that need to go on .L and .S that require .L1 and .S1 for three cycles in the software pipeline schedule and .L2 and .S2 for two cycles in the software pipeline schedule. Note that the .L .S .LS notation means we take into account instructions that can go only on the .L unit or can go only on .S or can go on either .L or .S.
    ;*      Bound(.L .S .LS)                             1        4     
  • Register Usage Tables The compiler shows which CPU registers are used on each cycle of the software pipelined kernel. It is difficult to use this information to improve the performance of the loop, but the information can give you an idea of how many registers are active throughout the loop.
    ;*                   Regs Live Always   :  6/ 1/ 4/
    ;*                   Max Regs Live      : 56/26/29/
    ;*                   Max Cond Regs Live :  0/ 0/ 0/