SPRUIG3C January   2018  – August 2019 TDA4VM , TDA4VM-Q1

 

  1.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  2. 1Overview and Scope
    1. 1.1 Comparing VCOP and C7000
    2. 1.2 About this Document
      1. 1.2.1 Documentation Conventions
    3. 1.3 Output Format
    4. 1.4 Data Types
      1. 1.4.1 40-bit Incompatibilities
      2. 1.4.2 40-Bit Detection in Host Emulation Mode
    5. 1.5 SIMD Width
    6. 1.6 VCOP Virtual Machine
  3. 2Kernel API
    1. 2.1 Overview
    2. 2.2 Parameter Block
      1. 2.2.1 Tvals Structure
      2. 2.2.2 Pblock Manipulation
  4. 3Loop Control
    1. 3.1 Overview
    2. 3.2 Loop Control and Nested Loops
    3. 3.3 Repeat Loops
    4. 3.4 Compound Conditions
    5. 3.5 Early Exit
  5. 4Addressing
    1. 4.1 Overview
    2. 4.2 Streaming Engines
    3. 4.3 Streaming Address Generators
    4. 4.4 Indexed Addressing
    5. 4.5 Circular Addressing
  6. 5Operations
    1. 5.1 Load Operations
    2. 5.2 Store Operations
      1. 5.2.1 Predicated Stores
      2. 5.2.2 Scatter and Transposing Stores
      3. 5.2.3 Optimization of OFFSET_NP1-Based Transpose
      4. 5.2.4 Rounding Stores
      5. 5.2.5 Saturating Stores
    3. 5.3 Arithmetic Operations
      1. 5.3.1 Vector Compares
      2. 5.3.2 Multiplication with Rounding, Truncation, or Left Shift
    4. 5.4 Lookup and Histogram Table Operations
      1. 5.4.1 Determination of Table Size
      2. 5.4.2 Table Configuration
      3. 5.4.3 Copy-in Operation
      4. 5.4.4 Copy-out Operation
      5. 5.4.5 Index Adjustment from Non-zero Agen
      6. 5.4.6 Lookup Operation
      7. 5.4.7 Histogram Update Operation
      8. 5.4.8 16-Way Lookup and Histogram
  7. 6Performance
    1. 6.1 Overview
    2. 6.2 Compiler Requirements
    3. 6.3 Automatic Performance Profiling
    4. 6.4 Performance Options
  8.   A Warnings and Notes
    1.     A.1 Compatibility Warnings
    2.     A.2 Efficiency Warnings

Output Format

On EVE, the vcc-arp32 tool translates a kernel to a C source file that contains the five functions described in Chapter 2, which is then compiled by the ARP32 compiler. The vloops() function contains VCOP intrinsics to implement the vector loop command(s).

On C7x, the vcc7x tool translates a kernel to C++ source file that contains the same five functions, which is then compiled by the C7x compiler.

It is feasible to use this translated output as the basis for additional development or performance optimization by directly modifying the generated code. Under this scenario, translation is a one-time step and the generated code becomes the source code. However, it’s important to be aware of the limitations described in this section. Whether a user chooses to modify the translated output as the basis for further development is up to the user.

Both versions of VCC invoke the C preprocessor on the Kernel-C input, which removes comments, handles preprocessor directives like #if and #include, and expands macros. The implication is that the C/C++ output of the migration tool will have these elements removed, with the exception of VCOP_SIMD_WIDTH. (Additionally, VCOP_SIMD_WIDTH will not be expanded in #if statements.) The translated output will therefore be less readable and less configurable than the source.

The migration tool makes some effort to make the translated output resemble the original Kernel-C source code. In particular:

  • Variable names (vectors, agens, and loop counters) are preserved.
  • Original indentation and white spacing is not preserved, but the migration tool applies its own output formatting to make the output readable.
  • VCOP statements that map to simple C expressions are generated that way. Other statements generally map to function calls in the VCOP virtual machine so that the kernel function remains uncluttered.

VCOP vectors are translated using “native vector types” on C7x, which are built-in types in the C7x C Compiler that are declared using OpenCL™ syntax. The typedef __vector maps to an OpenCL type corresponding to a VCOP vector.

The virtual machine is implemented in C++ using template functions and classes, allowing for a relatively economical yet efficient implementation. Since the virtual machine API is a C++ API, the generated kernel is a C++ source file and must be compiled as C++.

The virtual machine uses extensions to the C++ language, particularly native vector types and C7x intrinsics, and will therefore not be portable to other targets. In particular, the code will not compile and run on a PC or other GPP host.

Although the translated code is C++, the client code that calls it can be either C or C++. By default, the client code is assumed to be C and the migration tool includes extern C on the generate kernel functions (which are C++) so they are callable from C. If the --cpp_out option is used on the migration tool, the client code is assumed to be C++ and the extern C declarations are omitted.