SPRUIG3C January   2018  – August 2019 TDA4VM , TDA4VM-Q1

 

  1.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  2. 1Overview and Scope
    1. 1.1 Comparing VCOP and C7000
    2. 1.2 About this Document
      1. 1.2.1 Documentation Conventions
    3. 1.3 Output Format
    4. 1.4 Data Types
      1. 1.4.1 40-bit Incompatibilities
      2. 1.4.2 40-Bit Detection in Host Emulation Mode
    5. 1.5 SIMD Width
    6. 1.6 VCOP Virtual Machine
  3. 2Kernel API
    1. 2.1 Overview
    2. 2.2 Parameter Block
      1. 2.2.1 Tvals Structure
      2. 2.2.2 Pblock Manipulation
  4. 3Loop Control
    1. 3.1 Overview
    2. 3.2 Loop Control and Nested Loops
    3. 3.3 Repeat Loops
    4. 3.4 Compound Conditions
    5. 3.5 Early Exit
  5. 4Addressing
    1. 4.1 Overview
    2. 4.2 Streaming Engines
    3. 4.3 Streaming Address Generators
    4. 4.4 Indexed Addressing
    5. 4.5 Circular Addressing
  6. 5Operations
    1. 5.1 Load Operations
    2. 5.2 Store Operations
      1. 5.2.1 Predicated Stores
      2. 5.2.2 Scatter and Transposing Stores
      3. 5.2.3 Optimization of OFFSET_NP1-Based Transpose
      4. 5.2.4 Rounding Stores
      5. 5.2.5 Saturating Stores
    3. 5.3 Arithmetic Operations
      1. 5.3.1 Vector Compares
      2. 5.3.2 Multiplication with Rounding, Truncation, or Left Shift
    4. 5.4 Lookup and Histogram Table Operations
      1. 5.4.1 Determination of Table Size
      2. 5.4.2 Table Configuration
      3. 5.4.3 Copy-in Operation
      4. 5.4.4 Copy-out Operation
      5. 5.4.5 Index Adjustment from Non-zero Agen
      6. 5.4.6 Lookup Operation
      7. 5.4.7 Histogram Update Operation
      8. 5.4.8 16-Way Lookup and Histogram
  7. 6Performance
    1. 6.1 Overview
    2. 6.2 Compiler Requirements
    3. 6.3 Automatic Performance Profiling
    4. 6.4 Performance Options
  8.   A Warnings and Notes
    1.     A.1 Compatibility Warnings
    2.     A.2 Efficiency Warnings

Pblock Manipulation

Some VCOP applications have kernels that manipulate the parameter blocks of other kernels. This is to allow dynamic update of kernel parameters without the overhead of returning to ARP32 and re-calling the init() function. However, it is fragile in that it relies on the updating kernel to have direct knowledge of the pblock of the other kernel, manifested as hard-coded pblock offsets.

There is no direct way to support this for C7x translation since the pblock size and layout differ. Kernels that perform pblock manipulation will not work if directly translated for C7x.

Compatibility Warning: Pblock Manipulation
The size and layout of the pblock differs between C7x and VCOP. A kernel which tries to modify the pblock or otherwise depends on its contents will not execute correctly on C7x.

As an alternative means to support this functionality, the C7x migration tool adds an additional function to the dispatch API that allows for dynamic update of kernel parameters.

Recall that the init() function captures kernel parameters and stores them in the tvals structure. It also computes additional expressions used by the vloops() function and stores them in the tvals structure.

A new keyword, __update, identifies kernel parameters that may be dynamically updated. If any kernel parameters are declared using __update, the migration tool generates an additional function with the following signature:

void kernel_update(<update args>, unsigned short pblock[])

The arguments to the update() function consist only of the kernel parameters declared as __update. The function updates the pblock by re-capturing these parameters into the tvals structure, and re-evaluating any tvals that depend on them. The updated pblock can be passed to a re-invocation of the vloops() function.

Table 2-2 provides an example of a kernel that declares __update parameters, along with the generated code and the kernel dispatch code that calls it.

Table 2-2 Updating Kernel Parameters via update() API
Kernel-C Program
kernel(         __vptr_int32 parm1,      // non-updatable parameter
       __update __vptr_int32 parm2,      // updatable
                int          N)          // non-updatable
{
    ...
}
Generated Update Function
kernel_update(__vptr_int32 parm2,      // sig. includes only __update parameters
              ushort *pblock)
{
   kernel_tvals_t *tvals = (kernel_tvals_t *)pblock
   // recapture __update parameters
   tvals->parm2 = parm2;
   // re-compute tvals that depend on __update parameters
   tvals->loop0.tvals[0].p3 = ... parm2 ... ; 
   ... 
}
Client’s Kernel Dispatch Code
kernel_init(buffer1, buffer2);   // (using built-in default pblock)
kernel_vloops();                 // first call
kernel_update(buffer3);          // updates pblock with parm2=buffer3
kernel_vloops();                 // second call