SPRUIG3C January   2018  – August 2019 TDA4VM , TDA4VM-Q1

 

  1.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  2. 1Overview and Scope
    1. 1.1 Comparing VCOP and C7000
    2. 1.2 About this Document
      1. 1.2.1 Documentation Conventions
    3. 1.3 Output Format
    4. 1.4 Data Types
      1. 1.4.1 40-bit Incompatibilities
      2. 1.4.2 40-Bit Detection in Host Emulation Mode
    5. 1.5 SIMD Width
    6. 1.6 VCOP Virtual Machine
  3. 2Kernel API
    1. 2.1 Overview
    2. 2.2 Parameter Block
      1. 2.2.1 Tvals Structure
      2. 2.2.2 Pblock Manipulation
  4. 3Loop Control
    1. 3.1 Overview
    2. 3.2 Loop Control and Nested Loops
    3. 3.3 Repeat Loops
    4. 3.4 Compound Conditions
    5. 3.5 Early Exit
  5. 4Addressing
    1. 4.1 Overview
    2. 4.2 Streaming Engines
    3. 4.3 Streaming Address Generators
    4. 4.4 Indexed Addressing
    5. 4.5 Circular Addressing
  6. 5Operations
    1. 5.1 Load Operations
    2. 5.2 Store Operations
      1. 5.2.1 Predicated Stores
      2. 5.2.2 Scatter and Transposing Stores
      3. 5.2.3 Optimization of OFFSET_NP1-Based Transpose
      4. 5.2.4 Rounding Stores
      5. 5.2.5 Saturating Stores
    3. 5.3 Arithmetic Operations
      1. 5.3.1 Vector Compares
      2. 5.3.2 Multiplication with Rounding, Truncation, or Left Shift
    4. 5.4 Lookup and Histogram Table Operations
      1. 5.4.1 Determination of Table Size
      2. 5.4.2 Table Configuration
      3. 5.4.3 Copy-in Operation
      4. 5.4.4 Copy-out Operation
      5. 5.4.5 Index Adjustment from Non-zero Agen
      6. 5.4.6 Lookup Operation
      7. 5.4.7 Histogram Update Operation
      8. 5.4.8 16-Way Lookup and Histogram
  7. 6Performance
    1. 6.1 Overview
    2. 6.2 Compiler Requirements
    3. 6.3 Automatic Performance Profiling
    4. 6.4 Performance Options
  8.   A Warnings and Notes
    1.     A.1 Compatibility Warnings
    2.     A.2 Efficiency Warnings

Indexed Addressing

Any loads or stores that remain after exhausting the SE or SA resources are generated using normal indexed addressing. The migration tool defines a variable corresponding to the Agen whose type is __agen, which is a typedef for int. The agen variable represents the address offset in bytes.

An access with base address b and agen a is generated as the following C expression:

*(<type>*)((char*)b + a)

This generally results in no overhead for the access itself, as the compiler generally keeps both b and a in registers for the duration of the loop nest, so the expression compiles to a simple indirect operand such as *Rega[Regb].

The overhead results from having to update the agen as the loops iterate. These updates generally involve adding a constant term at each loop level corresponding to the stride at that level. For levels outside the innermost level, the stride is adjusted so as to rewind the inner level. The agen adjustment values are computed in the init() function from the coefficients in the Agen expressions and the trip counts, and stored in the tvals structure.

For example, the following code shows the addressing operations generated for a loop translated using indexed addressing:

__agen A0, A1, A2;     // typedef int __agen
for (I1 ... )
{
   for (I2 ...)
   {
      for (I3 ...)
      {
         for (I4...)
         {
            Vreg0 = *(tvals->p4 + A0); // load using A0
            Vreg1 = *(tvals->p5 + A1);// load using A1
            
            A0 += 2;   // update A0
            A1 += 2;   // update A1
         }
         A0 += tvals->p8;        // outer loop updates
         A1 += tvals->p11;
      }
      *(tvals->p7 + A2) = Vdst;// store using A2
      
      A0 += tvals->p9;           // outer loop updates
      A1 += tvals->p12;
      A2 += 16;
   }
   A0 += tvals->p10;             // outer loop updates
   A2 += tvals->p13;
}

The agen updates are generated as statements of the form a += tvals->pN, positioned at the end of the loop at the appropriate level. These generally turn into a single ADD instruction, but since they appear in outer loops, can hamper loop collapsing. For loops collapsed with NLC, the NLC-supplied predicate can be used to predicate any agen adjustments in the outer loop. The total overhead of the agen in this case is generally two instructions: the instruction to fetch the predicate from the NLC, and the (predicated) add instruction to update the agen. For loops not collapsed with NLC, the overhead is higher, because the presence of the agen update in the outer loop usually prevents collapsing, necessitating explicit loop control for that loop level.