SPRUIV4C May   2020  – December 2023

 

  1.   1
  2.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  3. 2Introduction
    1. 2.1 C7000 Digital Signal Processor CPU Architecture Overview
    2. 2.2 C7000 Split Datapath and Functional Units
  4. 3C7000 C/C++ Compiler Options
    1. 3.1 Overview
    2. 3.2 Selecting Compiler Options for Performance
    3. 3.3 Understanding Compiler Optimization
      1. 3.3.1 Software Pipelining
      2. 3.3.2 Vectorization and Vector Predication
      3. 3.3.3 Automatic Use of Streaming Engine and Streaming Address Generator
      4. 3.3.4 Loop Collapsing and Loop Coalescing
      5. 3.3.5 Automatic Inlining
      6. 3.3.6 If Conversion
  5. 4Basic Code Optimization
    1. 4.1  Signed Types for Iteration Counters and Limits
    2. 4.2  Floating-Point Division
    3. 4.3  Loop-Carried Dependencies and the Restrict Keyword
      1. 4.3.1 Loop-Carried Dependencies
      2. 4.3.2 The Restrict Keyword
      3. 4.3.3 Run-Time Alias Disambiguation
    4. 4.4  Function Calls and Inlining
    5. 4.5  MUST_ITERATE and PROB_ITERATE Pragmas and Attributes
    6. 4.6  If Statements and Nested If Statements
    7. 4.7  Intrinsics
    8. 4.8  Vector Types
    9. 4.9  C++ Features to Use and Avoid
    10. 4.10 Streaming Engine
    11. 4.11 Streaming Address Generator
    12. 4.12 Optimized Libraries
    13. 4.13 Memory Optimizations
  6. 5Understanding the Assembly Comment Blocks
    1. 5.1 Software Pipelining Processing Stages
    2. 5.2 Software Pipeline Information Comment Block
      1. 5.2.1 Loop and Iteration Count Information
      2. 5.2.2 Dependency and Resource Bounds
      3. 5.2.3 Initiation Interval (ii) and Iterations
      4. 5.2.4 Constant Extensions
      5. 5.2.5 Resources Used and Register Tables
      6. 5.2.6 Stage Collapsing
      7. 5.2.7 Memory Bank Conflicts
      8. 5.2.8 Loop Duration Formula
    3. 5.3 Single Scheduled Iteration Comment Block
    4. 5.4 Identifying Pipeline Failures and Performance Issues
      1. 5.4.1 Issues that Prevent a Loop from Being Software Pipelined
      2. 5.4.2 Software Pipeline Failure Messages
      3. 5.4.3 Performance Issues
  7. 6Revision History

Automatic Inlining

The compiler sometimes takes functions defined in header files and places the code at the call site. This allows software pipelining in an enclosing loop and thus improves performance. The compiler may also do this to eliminate the cost of calling and returning from a function.

In the following example, the add_and_saturate_to_255() function sums two values and caps the sum at 255 if the sum is over 255. This function is called from a function in inlining.cpp, which includes the inlining.h file via a preprocessor #include directive.

// inlining.cpp
// Compile with "cl7x -mv7100 --opt_level=3
//   --debug_software_pipeline --src_interlist"
#include "inlining.h"                                                            
                                                                                 
void saturated_vector_sum(int * restrict a, int * restrict b,                    
                          int * restrict out, int n)                             
{                                                                                
#pragma MUST_ITERATE(1024,,)                                                     
#pragma UNROLL(1)                                                                
    for (int i = 0; i < n; i++)                                                  
    {                                                                            
        out[i] = add_and_saturate_to_255(a[i], b[i]);                            
    }                                                                            
}

// inlining.h
int add_and_saturate_to_255(int a, int b)                            
{                      
    int sum = a + b;                                                 
    if (sum > 255) sum = 255;                                        
                                                                     
    return sum;                                                            

In this case, the compiler will inline the call to add_and_saturate_to_255() so that software pipelining can be performed. You can determine that inlining has been performed by looking at the bottom of the generated assembly file. Here, the compiler places a comment that add_and_saturate_to_255() has been inlined. Note that the function's identifier has been modified due to C++ name mangling.

;; Inlined function references:
;; [0] _Z23add_and_saturate_to_255ii

The inlining can also be seen in the generated assembly code, because there is no CALL instruction to a function in the loop. In fact, because of the inlining (and thus the elimination of the call to a function), the loop can be software pipelined. Software pipelining cannot occur if there is a call to another function in the loop. Note that because of code size concerns, not every call that can be inlined will be inlined automatically. See the C7000 Optimizing Compiler User's Guide for more information on inlining.

;*----------------------------------------------------------------------------* 
;*        SINGLE SCHEDULED ITERATION
;*
;*        ||$C$C44||:
;*   0              TICK                               ; [A_U] 
;*   1              SLDW    .D1     *D1++(4),BL0      ; [A_D1] |5| 
;*   2              SLDW    .D2     *D2++(4),BL1      ; [A_D2] |5| 
;*   3              NOP     0x5     ; [A_B] 
;*   8              ADDW    .L2     BL1,BL0,BL1       ; [B_L2] |5| 
;*   9              VMINW   .L2     BL2,BL1,B0        ; [B_L2] |5| 
;*  10              STW     .D1X    B0,*D0++(4)       ; [A_D1] |5| 
;*     ||           BNL     .B1     ||$C$C44||        ; [A_B] |11| 
;*  11              ; BRANCHCC OCCURS {||$C$C44||}    ; [] |11| 
;*----------------------------------------------------------------------------*