SPRUIV4C May   2020  – December 2023

 

  1.   1
  2.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  3. 2Introduction
    1. 2.1 C7000 Digital Signal Processor CPU Architecture Overview
    2. 2.2 C7000 Split Datapath and Functional Units
  4. 3C7000 C/C++ Compiler Options
    1. 3.1 Overview
    2. 3.2 Selecting Compiler Options for Performance
    3. 3.3 Understanding Compiler Optimization
      1. 3.3.1 Software Pipelining
      2. 3.3.2 Vectorization and Vector Predication
      3. 3.3.3 Automatic Use of Streaming Engine and Streaming Address Generator
      4. 3.3.4 Loop Collapsing and Loop Coalescing
      5. 3.3.5 Automatic Inlining
      6. 3.3.6 If Conversion
  5. 4Basic Code Optimization
    1. 4.1  Signed Types for Iteration Counters and Limits
    2. 4.2  Floating-Point Division
    3. 4.3  Loop-Carried Dependencies and the Restrict Keyword
      1. 4.3.1 Loop-Carried Dependencies
      2. 4.3.2 The Restrict Keyword
      3. 4.3.3 Run-Time Alias Disambiguation
    4. 4.4  Function Calls and Inlining
    5. 4.5  MUST_ITERATE and PROB_ITERATE Pragmas and Attributes
    6. 4.6  If Statements and Nested If Statements
    7. 4.7  Intrinsics
    8. 4.8  Vector Types
    9. 4.9  C++ Features to Use and Avoid
    10. 4.10 Streaming Engine
    11. 4.11 Streaming Address Generator
    12. 4.12 Optimized Libraries
    13. 4.13 Memory Optimizations
  6. 5Understanding the Assembly Comment Blocks
    1. 5.1 Software Pipelining Processing Stages
    2. 5.2 Software Pipeline Information Comment Block
      1. 5.2.1 Loop and Iteration Count Information
      2. 5.2.2 Dependency and Resource Bounds
      3. 5.2.3 Initiation Interval (ii) and Iterations
      4. 5.2.4 Constant Extensions
      5. 5.2.5 Resources Used and Register Tables
      6. 5.2.6 Stage Collapsing
      7. 5.2.7 Memory Bank Conflicts
      8. 5.2.8 Loop Duration Formula
    3. 5.3 Single Scheduled Iteration Comment Block
    4. 5.4 Identifying Pipeline Failures and Performance Issues
      1. 5.4.1 Issues that Prevent a Loop from Being Software Pipelined
      2. 5.4.2 Software Pipeline Failure Messages
      3. 5.4.3 Performance Issues
  7. 6Revision History

C7000 Digital Signal Processor CPU Architecture Overview

The C7000 CPU DSP architecture is the latest high-performance digital signal processor (DSP) from Texas Instruments. It is featured in some Texas Instruments Keystone 3 devices. This Very-Long-Instruction-Word (VLIW) DSP has significant mathematical processing capabilities, due to its wide vector instructions and multiple functional units. This optimization guide can help developers get the most performance of the C7000 DSPs.

When integrated into a larger TI device, such as some Keystone 3 devices, the C7000 is often paired with a Matrix Multiply Accelerator (MMA), which can significantly improve the performance of certain machine learning networks. We recommend use of the TI Deep Learning library, which has been optimized to use the Matrix Multiply Accelerator. The TI Deep Learning library is part of the Processor SDK.

The C7000 DSP has vector (SIMD) instructions that are capable of performing up to 64 operations in a single instruction, depending on the data type and version of the C7000 CPU. Nearly all computational instructions on C7000 DSP cores are fully pipelined, which means independent instructions can be started on every clock cycle. This combination of vector instructions and pipelined behavior allows you to perform a large number of computations per cycle. The C7000 DSP cores feature both fixed-point and floating-point vector instructions.

Each C7000 DSP core has several functional units. On each clock cycle, each functional unit can be executing an independent instruction. In this guide, we focus on the first generation of C7000 DSP cores, the C7100 and C7120. Because the C7100 and C7120 DSP cores have 13 functional units, there are 13 instructions that can execute every clock cycle. In reality, some of the functional units are specialized for certain kinds of instructions, so for this and other reasons, it is common that not all 13 functional units execute an instruction every cycle.

For more information on the C7000 instruction set, please see the C71x DSP CPU, Instruction Set, and Matrix Multiply Accelerator Technical Reference Manual (SPRUIP0).