SPRUIG3C January   2018  – August 2019 TDA4VM , TDA4VM-Q1

 

  1.   Read This First
    1.     About This Manual
    2.     Related Documentation
    3.     Trademarks
  2. 1Overview and Scope
    1. 1.1 Comparing VCOP and C7000
    2. 1.2 About this Document
      1. 1.2.1 Documentation Conventions
    3. 1.3 Output Format
    4. 1.4 Data Types
      1. 1.4.1 40-bit Incompatibilities
      2. 1.4.2 40-Bit Detection in Host Emulation Mode
    5. 1.5 SIMD Width
    6. 1.6 VCOP Virtual Machine
  3. 2Kernel API
    1. 2.1 Overview
    2. 2.2 Parameter Block
      1. 2.2.1 Tvals Structure
      2. 2.2.2 Pblock Manipulation
  4. 3Loop Control
    1. 3.1 Overview
    2. 3.2 Loop Control and Nested Loops
    3. 3.3 Repeat Loops
    4. 3.4 Compound Conditions
    5. 3.5 Early Exit
  5. 4Addressing
    1. 4.1 Overview
    2. 4.2 Streaming Engines
    3. 4.3 Streaming Address Generators
    4. 4.4 Indexed Addressing
    5. 4.5 Circular Addressing
  6. 5Operations
    1. 5.1 Load Operations
    2. 5.2 Store Operations
      1. 5.2.1 Predicated Stores
      2. 5.2.2 Scatter and Transposing Stores
      3. 5.2.3 Optimization of OFFSET_NP1-Based Transpose
      4. 5.2.4 Rounding Stores
      5. 5.2.5 Saturating Stores
    3. 5.3 Arithmetic Operations
      1. 5.3.1 Vector Compares
      2. 5.3.2 Multiplication with Rounding, Truncation, or Left Shift
    4. 5.4 Lookup and Histogram Table Operations
      1. 5.4.1 Determination of Table Size
      2. 5.4.2 Table Configuration
      3. 5.4.3 Copy-in Operation
      4. 5.4.4 Copy-out Operation
      5. 5.4.5 Index Adjustment from Non-zero Agen
      6. 5.4.6 Lookup Operation
      7. 5.4.7 Histogram Update Operation
      8. 5.4.8 16-Way Lookup and Histogram
  7. 6Performance
    1. 6.1 Overview
    2. 6.2 Compiler Requirements
    3. 6.3 Automatic Performance Profiling
    4. 6.4 Performance Options
  8.   A Warnings and Notes
    1.     A.1 Compatibility Warnings
    2.     A.2 Efficiency Warnings

Copy-in Operation

The copy-in operation is responsible for copying table data from its “permanent” location in L2 into L1D so that an LHT operation can be performed. The source table in L2 is in VCOP layout; the destination table in L1D is in C7x layout.

This operation is performed by the LHT_copy_in::copy_table_in() method of the virtual machine. A pointer to the table in L2 and its size in bytes are passed as parameters. The size is rounded up to a multiple of 128 bytes (1024 bits), which is the line size of banked tables on C7x.

The table is read and written one 1024-bit line at a time. Each line is read from L2 as a pair of 512-bit vectors using the Streaming Engine. The pair of vectors, containing four VCOP lines, are rearranged using two VPERM instructions. Then, they are written into L1D using a LUTINIT instruction.

Thus each 1024-bit chunk requires 2 SE-based vector loads, two VPERMs, and two LUTINITs. The resultant loop pipelines at an ii (initiation interval) of 2. The throughput is 512 bits per cycle.

In order to use LUTINIT to populate a table in L1D, the table must be configured as one parallel table, allowing the lanes of the payload vectors to be written into the table in linear fashion. This is regardless of how the table is configured for the LHT operation itself. So there is an independent LTCR configuration that applies only to the copy-in operation. This configuration is computed during the init() function by a call to the copy_in_config() method, and stored in the tvals structure.

Similarly, the SE configuration used for the copy-in operation is computed during init() by the copy_in_SE_config() method and stored in the tvals structure.