SPRUIV4C May   2020  – December 2023

 

  1.   1
  2.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  3. 2Introduction
    1. 2.1 C7000 Digital Signal Processor CPU Architecture Overview
    2. 2.2 C7000 Split Datapath and Functional Units
  4. 3C7000 C/C++ Compiler Options
    1. 3.1 Overview
    2. 3.2 Selecting Compiler Options for Performance
    3. 3.3 Understanding Compiler Optimization
      1. 3.3.1 Software Pipelining
      2. 3.3.2 Vectorization and Vector Predication
      3. 3.3.3 Automatic Use of Streaming Engine and Streaming Address Generator
      4. 3.3.4 Loop Collapsing and Loop Coalescing
      5. 3.3.5 Automatic Inlining
      6. 3.3.6 If Conversion
  5. 4Basic Code Optimization
    1. 4.1  Signed Types for Iteration Counters and Limits
    2. 4.2  Floating-Point Division
    3. 4.3  Loop-Carried Dependencies and the Restrict Keyword
      1. 4.3.1 Loop-Carried Dependencies
      2. 4.3.2 The Restrict Keyword
      3. 4.3.3 Run-Time Alias Disambiguation
    4. 4.4  Function Calls and Inlining
    5. 4.5  MUST_ITERATE and PROB_ITERATE Pragmas and Attributes
    6. 4.6  If Statements and Nested If Statements
    7. 4.7  Intrinsics
    8. 4.8  Vector Types
    9. 4.9  C++ Features to Use and Avoid
    10. 4.10 Streaming Engine
    11. 4.11 Streaming Address Generator
    12. 4.12 Optimized Libraries
    13. 4.13 Memory Optimizations
  6. 5Understanding the Assembly Comment Blocks
    1. 5.1 Software Pipelining Processing Stages
    2. 5.2 Software Pipeline Information Comment Block
      1. 5.2.1 Loop and Iteration Count Information
      2. 5.2.2 Dependency and Resource Bounds
      3. 5.2.3 Initiation Interval (ii) and Iterations
      4. 5.2.4 Constant Extensions
      5. 5.2.5 Resources Used and Register Tables
      6. 5.2.6 Stage Collapsing
      7. 5.2.7 Memory Bank Conflicts
      8. 5.2.8 Loop Duration Formula
    3. 5.3 Single Scheduled Iteration Comment Block
    4. 5.4 Identifying Pipeline Failures and Performance Issues
      1. 5.4.1 Issues that Prevent a Loop from Being Software Pipelined
      2. 5.4.2 Software Pipeline Failure Messages
      3. 5.4.3 Performance Issues
  7. 6Revision History

The Restrict Keyword

The correct the problem in the previous example caused by a loop-carried dependency, we need to tell the compiler that these arrays do not overlap in memory, and thus there is no memory dependence from one iteration to the next.

Many common digital signal processing loops contain one or more load operations, some computation, and then a store operation. Typically, the loads are reading from an array and the stores are storing values to an array. If the compiler does not know that the arrays are separate (or do not overlap), the compiler must be conservative and assume that the stores of iteration i may be needed in the loads of iteration i+1, or i+2, etc. Therefore, it is important to tell the compiler if the load and store arrays inhabit entirely different memory areas (that is, the objects/arrays pointed-to do not overlap).

We can do this with the use of the restrict keyword. This keyword tells the compiler that throughout the scope of the variable (array name or pointer name used to access the array), accesses to that object or array will only be made through that array name or pointer name.

Note: This description of the restrict pointer is not precisely accurate; it is good enough for most purposes. If you wish to learn more about the restrict keyword, see the C standard or Demystifying The Restrict Keyword.

Use of the restrict keyword effectively allows you to tell the compiler that the store to memory will not write to the same place where the next iterations' loads will read from. Thus, successive iterations can be overlapped when the compiler performs software pipelining, thus allowing the generated code to run faster.

This C function example uses the restrict keyword. The resulting Software Pipeline Information comment block will show that when the restrict keyword is used, the loop-carried dependence bound is zero, while the partitioned resource bound is two. This leads to a much-improved initiation interval (ii) of two cycles.

void weighted_sum(int * restrict a, int *restrict b, int *restrict out,
                  int weight_a, int weight_b, int n)

The Texas Instruments C7000 C/C++ Compiler allows the restrict keyword to be used in both C and C++ modes, despite the restrict keyword not being part of the C++14 or C++17 standards.

Note: If you use the restrict keyword incorrectly, the compiler will often produce code with undefined behavior--meaning that the code generated by the compiler will produce an incorrect result.