SPRUIG8 User guide

SPRUIG8K January 2018 – March 2025

1
Read This First
1. About This Manual
2. Notational Conventions
3. Related Documentation
4. Related Documentation From Texas Instruments
5. Trademarks
1 Introduction to the Software Development Tools
1. 1.1 Software Development Tools Overview
2. 1.2 Compiler Interface
3. 1.3 ANSI/ISO Standard
4. 1.4 Output Files
2 Getting Started with the Code Generation Tools
1. 2.1 How Code Composer Studio Projects Use the Compiler
2. 2.2 Compiling from the Command Line
3 Using the C/C++ Compiler
1. 3.1 About the Compiler
2. 3.2 Invoking the C/C++ Compiler
3. 3.3 Changing the Compiler's Behavior with Options
4. 3.4 Controlling the Compiler Through Environment Variables
  1. 3.4.1 Setting Default Compiler Options (C7X_C_OPTION)
  2. 3.4.2 Naming One or More Alternate Directories (C7X_C_DIR)
5. 3.5 Controlling the Preprocessor
6. 3.6 Passing Arguments to main()
7. 3.7 Understanding Diagnostic Messages
  1. 3.7.1 Controlling Diagnostic Messages
  2. 3.7.2 How You Can Use Diagnostic Suppression Options
8. 3.8 Other Messages
9. 3.9 Generating a Raw Listing File (--gen_preprocessor_listing Option)
10. 3.10 Using Inline Function Expansion
11. 3.11 Using Interlist
12. 3.12 About the Application Binary Interface
13. 3.13 Enabling Entry Hook and Exit Hook Functions
4 Optimizing Your Code
1. 4.1 Invoking Optimization
2. 4.2 Controlling Code Size Versus Speed
3. 4.3 Performing File-Level Optimization (--opt_level=3 option)
  1. 4.3.1 Creating an Optimization Information File (--gen_opt_info Option)
4. 4.4 Program-Level Optimization (--program_level_compile and --opt_level=3 options)
  1. 4.4.1 Controlling Program-Level Optimization (--call_assumptions Option)
5. 4.5 Automatic Inline Expansion (--auto_inline Option)
6. 4.6 Link-Time Optimization (--opt_level=4 Option)
  1. 4.6.1 Option Handling
  2. 4.6.2 Incompatible Types
7. 4.7 Optimizing Software Pipelining
8. 4.8 Redundant Loops
9. 4.9 Indicating Whether Certain Aliasing Techniques Are Used
  1. 4.9.1 Use the --aliased_variables Option When Certain Aliases are Used
10. 4.10 Prevent Reordering of Associative Floating-Point Operations
11. 4.11 Using Performance Advice to Optimize Code
  1. 4.11.1 Advice #35000: Use restrict to improve loop performance
12. 4.12 Using the Interlist Feature With Optimization
13. 4.13 Debugging and Profiling Optimized Code
  1. 4.13.1 Profiling Optimized Code
14. 4.14 What Kind of Optimization Is Being Performed?
15. 4.15 Streaming Engine and Streaming Address Generator
16. 4.16 Nested Loop Controller (NLC)
  1. 4.16.1 Obstacles That May Inhibit Use of NLC
5 C/C++ Language Implementation
1. 5.1 Characteristics of C7000 C
  1. 5.1.1 Implementation-Defined Behavior
2. 5.2 Characteristics of C7000 C++
3. 5.3 Data Types
  1. 5.3.1 Size of Enum Types
  2. 5.3.2 Vector Data Types
4. 5.4 File Encodings and Character Sets
5. 5.5 Keywords
6. 5.6 C++ Exception Handling
7. 5.7 Register Variables and Parameters
8. 5.8 Pragma Directives
9. 5.9 The _Pragma Operator
10. 5.10 Application Binary Interface
11. 5.11 Object File Symbol Naming Conventions (Linknames)
12. 5.12 Changing the ANSI/ISO C/C++ Language Mode
13. 5.13 GNU and Clang Language Extensions
14. 5.14 Operations and Functions for Vector Data Types
15. 5.15 C7000 Intrinsics
16. 5.16 C7000 Scalable Vector Programming
6 Run-Time Environment
1. 6.1 Memory
2. 6.2 Object Representation
3. 6.3 Register Conventions
4. 6.4 Function Structure and Calling Conventions
5. 6.5 Accessing Linker Symbols in C and C++
6. 6.6 Run-Time-Support Arithmetic Routines
7. 6.7 System Initialization
  1. 6.7.1 Boot Hook Functions for System Pre-Initialization
  2. 6.7.2 Automatic Initialization of Variables
7 Using Run-Time-Support Functions and Building Libraries
1. 7.1 C and C++ Run-Time Support Libraries
2. 7.2 The C I/O Functions
  1. 7.2.1 High-Level I/O Functions
    1. 7.2.1.1 Formatting and the Format Conversion Buffer
  2. 7.2.2 Overview of Low-Level I/O Implementation
    1. open
    2. close
    3. read
    4. write
    5. lseek
    6. unlink
    7. rename
  3. 7.2.3 Device-Driver Level I/O Functions
    1. DEV_open
    2. DEV_close
    3. DEV_read
    4. DEV_write
    5. DEV_lseek
    6. DEV_unlink
    7. DEV_rename
  4. 7.2.4 Adding a User-Defined Device Driver for C I/O
    1. 7.2.4.1 Mapping Default Streams to Device
  5. 7.2.5 The device Prefix
3. 7.3 Handling Reentrancy (_register_lock() and _register_unlock() Functions)
4. 7.4 Library-Build Process
8 Introduction to Object Modules
1. 8.1 Object File Format Specifications
2. 8.2 Executable Object Files
3. 8.3 Introduction to Sections
  1. 8.3.1 Special Section Names
4. 8.4 How the Linker Handles Sections
  1. 8.4.1 Combining Input Sections
  2. 8.4.2 Placing Sections
5. 8.5 Symbols
  1. 8.5.1 Local Symbols
  2. 8.5.2 Weak Symbols
6. 8.6 Loading a Program
9 Program Loading and Running
1. 9.1 Loading
2. 9.2 Entry Point
3. 9.3 Run-Time Initialization
4. 9.4 Arguments to main
5. 9.5 Run-Time Relocation
6. 9.6 Additional Information
10Archiver Description
1. 10.1 Archiver Overview
2. 10.2 The Archiver's Role in the Software Development Flow
3. 10.3 Invoking the Archiver
4. 10.4 Archiver Examples
5. 10.5 Library Information Archiver Description
11Linking C/C++ Code
1. 11.1 Invoking the Linker Through the Compiler (-z Option)
2. 11.2 Linker Code Optimizations
3. 11.3 Controlling the Linking Process
12Linker Description
1. 12.1 Linker Overview
2. 12.2 The Linker's Role in the Software Development Flow
3. 12.3 Invoking the Linker
4. 12.4 Linker Options
5. 12.5 Linker Command Files
6. 12.6 Linker Symbols
7. 12.7 Default Placement Algorithm
  1. 12.7.1 How the Allocation Algorithm Creates Output Sections
  2. 12.7.2 Reducing Memory Fragmentation
8. 12.8 Using Linker-Generated Copy Tables
9. 12.9 Partial (Incremental) Linking
10. 12.10 Linking C/C++ Code
11. 12.11 Linker Example
13Object File Utilities
1. 13.1 Invoking the Object File Display Utility
2. 13.2 Invoking the Disassembler
3. 13.3 Invoking the Name Utility
4. 13.4 Invoking the Strip Utility
14C++ Name Demangler
1. 14.1 Invoking the C++ Name Demangler
2. 14.2 Sample Usage of the C++ Name Demangler
A XML Link Information File Description
1. A.1 XML Information File Element Types
2. A.2 Document Elements
  1. A.2.1 Header Elements
  2. A.2.2 Input File List
  3. A.2.3 Object Component List
  4. A.2.4 Logical Group List
  5. A.2.5 Placement Map
  6. A.2.6 Far Call Trampoline List
  7. A.2.7 Symbol Table
B Unsupported Tools and Features
1. B.1 List of Unsupported Tools and Features
C Glossary
1. 529
D Revision History

5.14.2.1 Semantics of Vector Pointers

Vector data object: If you use the unary & operator to take the address of a vector data object, the result is a pointer. Unlike taking the address of an array, taking the address of a vector gives you a pointer representing the whole vector, not a pointer to an individual element.

For example, given an object vec of type int4, the expression &vec has type int4 *.

int4 vec;
randomize(&vec); /* OK */

To access the elements of a pointer vector object, use swizzle operators instead of trying to cast the pointer as some other pointer type. See Section 5.14.4.

void randomize(int4 *vecp)
{
    for([..]) (*vecp).s[i] = rand();
}

Complex data object: Similarly, if you use the unary & operator to take the address of a complex scalar object, the result is a pointer. This pointer represents the entire complex scalar object, not an individual component.

cfloat cplx;
foo(&cplx); /* OK */

Vector data element: You can use the unary & operator to take the address of an indexing-style swizzle (for example, &x.s[1] or x.s[j]). The address is a pointer to that element, with the expected semantics. However, you cannot use the unary & operator to take the address of a non-indexing-style swizzle, such as s1, s1(), or r().

Types with the const qualifier: You can declare vector types and complex types with the const type qualifier. The semantics are the same as for non-vector types. Using the unary & operator to take the address of a const vector object gives a pointer to a const-qualified vector type.

Types with the volatile qualifier: You can declare vector types and complex types with the volatile type qualifier. The semantics are the same as for non-vector types. Using the unary & operator to take the address of a volatile vector object gives a pointer to a volatile-qualified vector type.

When optimizing access to any volatile object, the compiler must preserve the number and relative order in which accesses occur at runtime. If possible, the compiler accesses a volatile vector object whole-vector-at-a-time, not element-by-element. If that is not possible (for instance, because a vector is larger than a single vector register), the compiler can split the volatile access into smaller chunks, and the order in which these smaller chunks are accessed is unspecified.

Consider these loops. The first loop is the original, and the second loop is the vectorized version:


volatile int *p = array;                    /* original loop */
for ([..])
    *p++ = [..]                             /* process one int at a time in sequence */

volatile int8 *p8 = (volatile int8 *)array; /* vectorized loop */
for ([..])
    *p8++ = [..]                            /* process 8 int at a time in parallel */

The first loop in this example accesses one array element at a time in sequence. The second loop accesses 8 ints in parallel. Because the second loop changes the relative order of accesses, the compiler is not allowed to automatically convert the first loop to the second. You can manually make a change from one style of access to the other if that change is warranted.

Pointers with the restrict qualifier: You can add the restrict qualifier to a pointer to a vector type if the pointer follows the usual rules for the restrict keyword. Simply stated, the pointer to the vector must be the only way that the vector the pointer points to is accessed.

Correspondence between vectors and arrays: The most effective use of vectors is to process a large array of data with vectors that are as large as possible. Because vectors and arrays are so similar, you can easily use vector pointers to access large arrays. For example:

int array[N];
int8 vec = *(int8 *)array; /* vector to access first 8 ints in array */

One important use of vectors is to access an array by casting the address of the array (or a pointer) as a pointer-to-vector type. That pointer can then be used to load or store multiple elements of the array in parallel. Suppose you have the following example:

void dotp(int x[N], int y[N], int z[restrict])
{
    /* original */
    for (int i=0;i<N;i++)
    {
        /* process the data one int at a time */
        *z++ = *x++ + *y++;
    }
}

The compiler can automatically transform this simple example because the restrict keyword tells the compiler that the z input parameter does not overlap with either x or y.

void dotp(int x[N], int y[N], int z[restrict])
{
    int8 *xp = (int8 *)x;
    int8 *yp = (int8 *)y;
    int8 *zp = (int8 *)z;
    for (int i=0;i<N/8;i++)
    {
        /* process the data 8 ints at a time */
        *zp++ = *xp++ + *yp++;
    }
}

Note: When using a pointer to a vector to read from a non-complex array, match the element type of the vector to the element type of the non-complex array. For example, if an array type is const int array[N], make the vector a const with int elements, like const int8.

Using a pointer-to-vector to access an array is safe because elements of both arrays and vectors are stored and aligned the same way when stored in memory. In memory, a vector's first element (s0) is stored at the lowest address in memory, regardless of endian mode. The individual bytes of each element are stored according to the endianness mode. The first element of the array is then assigned to the vector's first element (s0).

When using C++, you can use similar code to process multiple elements of a std::vector or std::array in parallel using the data() member:

void dotp(std::vector<int> x, std::vector<int> y, std::vector<int> z)
{
    int8 *xp = (int8 *)x.data();
    int8 *yp = (int8 *)y.data();
    int8 *zp = (int8 *)z.data();
    for (int i=0;i<N/8;i++)
    {
        /* process the data 8 ints at a time */
        *zp++ = *xp++ + *yp++;
    }
}

void dotp(std::array<int> x, std::array<int> y, std::array<int> z)
{
    int8 *xp = (int8 *)x.data();
    int8 *yp = (int8 *)y.data();
    int8 *zp = (int8 *)z.data();
    for (int i=0;i<N/8;i++)
    {
        /* process the data 8 ints at a time */
        *zp++ = *xp++ + *yp++;
    }
}

Accessing a complex type as a vector: The imaginary and real components of complex vector objects and complex scalar objects are stored differently depending on whether the object is in memory or in registers.

Complex scalar objects are stored in memory with the real component at the lowest address. Complex vector objects are stored in memory as a sequence of complex scalar values, with the real component of s0 at the lowest address, followed by the imaginary component of s0, and so on.

Complex scalar objects are stored in registers with the imaginary component in the least significant bit, followed by the real component. Complex vector objects are stored as a sequence of complex scalar values, with imaginary component of s0 at the LSB, followed by real component of s0, and so on. This register layout is required by CPU instructions that operate on complex values (for example, VCMPYSP).

As a result of the reversal in the order in which the components are stored, the compiler must swap the complex real and imaginary components when loading from memory to a register and vice versa.

In memory, complex scalar objects are stored with the real component at the lowest address. In registers, the imaginary component is stored starting at the least significant bit.

Figure 5-1 Memory and Register Layout of Complex Components

The compiler normally handles swapping the order of complex components. When you access a complex scalar or complex vector object through a pointer, the pointer must have the correct type.

You can access a non-complex array as a complex scalar:

float value[2];
cfloat x = *(cfloat *)&value;

You can access a complex scalar as a vector of length two with the same element type as the complex components:

cchar value;
char2 x = *(char2 *)&cchar;

You can access an array of non-complex scalars as a complex vector as long as the complex values are stored in the non-complex scalar array with the real component first, then the imaginary component. You can use either a 1- or 2-dimensional array, as convenient.

float value[2][] = { { r, i }, { r, i } ... };
cfloat4 x = *(cfloat4 *)&value;

float value[] = { r, i, r, i, ... };
cfloat4 x = *(cfloat4 *)&value;

You can access any complex vector as a vector of twice the length with the same element type as the complex components:

cchar4 value;
char8 x = *(char8 *)&value;

When loading a complex scalar or complex vector from a non-complex type, the compiler handles the necessary component swapping for you.

Note: When using a pointer to complex to read from a non-complex array, make the element type of the complex object match the element type of the non-complex array. For example, if your array is of type

const int
                    coefficients[2][N]

, make the vector a const with int elements (for example, const cint8). If you do not stick to this type, undesirable behavior can occur due to automatic swapping of complex components.

A frequent use case for complex values in C code is to represent them as two-dimensional array of the component values, like so:

float coefficients[2][4] =
{ { real, imag }, { real, imag },
  { real, imag }, { real, imag }, };

/* access part of the coefficient array as a vector */
output = input * *(cfloat4 *)coefficients;

You can also initialize a complex vector from an array of C99 complex values, because the layout of a C99 complex object and a TI complex object are the same:

float _Complex coefficients[4] =
{ real + imag * _Complex_I, real + imag * _Complex_I,
  real + imag * _Complex_I, real + imag * _Complex_I, };

/* access part of the coefficient array as a vector */
output = input * *(cfloat4 *)coefficients;

Mismatched vector/array sizes: You can use vector operations even if an array has an element count that is not an even multiple of the vector element count by using predicated loads and stores for the last iteration of the loop. As a result, the last iteration of the loop loads and stores only part of the vector, and thus only the remaining part of the array.

#include <stddef.h>
#include <c7x.h>

void dotp(int x[], int y[], int z[restrict], size_t n)
{
    int8 *xp = (int8 *)x;
    int8 *yp = (int8 *)y;
    int8 *zp = (int8 *)z;
    for (int i=0;i<n/8;i++)
        *zp++ = *xp++ * *yp++;
    unsigned leftover = n%8;
    if (leftover)
    {
        vpred mask = __mask_char(leftover);
        __vstore_pred(mask, zp, __vload_pred(mask, xp) * __vload_pred(mask, yp));
    }
}