SPRUIG5E January   2018  – March 2023 TDA4VM , TDA4VM-Q1

 

  1.   Abstract
  2. 1About This Document
    1. 1.1 Related Documents
    2. 1.2 Trademarks
  3. 2Migrating C Source from C6000 to C7000
    1. 2.1  Compiler Options
    2. 2.2  Native Vector Data Types
    3. 2.3  Type Qualifiers: near and far
    4. 2.4  64-bit long Type
    5. 2.5  References to Control Registers
    6. 2.6  Memory-Mapped Peripherals
    7. 2.7  Run-Time Support
    8. 2.8  Contents of Migration Header File c6x_migration.h
      1. 2.8.1 Supported Macros
      2. 2.8.2 Non-Supported Macros
      3. 2.8.3 Legacy Data Types
      4. 2.8.4 Legacy Intrinsics
    9. 2.9  Galois Field Multiply Instructions
    10. 2.10 Performance Considerations for Migrated Code
      1. 2.10.1 UNROLL Pragma
      2. 2.10.2 Subvector Access
      3. 2.10.3 16x16 and 16x32 Bit Multiplies
      4. 2.10.4 __x128_t Type
      5. 2.10.5 Unsigned Array Offsets
      6. 2.10.6 Streaming Engine and Streaming Address Generator
      7. 2.10.7 Additional Optimization Guidance
  4. 3Host Emulation
  5. 4Revision History
    1.     29
    2.     30

Subvector Access

Accessing a portion of a vector type may be “free” on C6000 devices, but requires an extra instruction on C7000 devices.

For example, a subvector access of an int4 element is likely to be free on C6000, since an int4 on C6000 is composed of four 32-bit registers. Therefore, accessing one element of an int4 can be performed by the compiler by using the appropriate 32-bit register. However, on C7000 devices, an int4 element is located in a single vector register. Therefore, accessing one element of an int4 requires the compiler to use an instruction (such as VGETW) to extract that data.

Similarly, packing a vector of int4 is likely free or almost free on C6000 devices, while on C7000 devices it may require a sequence of instructions (such as VPUTWs).

If the C7000 compiler is able to vectorize the code further, in some cases the performance penalty may be mitigated. For example, the access of the low 32 bits of 64 bits with _loll() may be vectorized into VDEAL2W.