SPRUI04 User guide

SPRUI04G June 2015 – August 2025

7.15.2.1 Using Address Operator on a Vector

Vector data object: If you use the unary & operator to take the address of a vector data object, the result is a pointer. Unlike taking the address of an array, taking the address of a vector gives you a pointer representing the whole vector, not a pointer to an individual element.

For example, given an object vec of type int4, the expression &vec has type int4 *.

int4 vec;
randomize(&vec); /* OK */

To access the elements of a pointer vector object, use swizzle operators instead of trying to cast the pointer as some other pointer type. See Section 7.15.3.

void randomize(int4 *vecp)
{
    for([..]) (*vecp).s[i] = rand();
}

Complex data object: Similarly, if you use the unary & operator to take the address of a complex scalar object, the result is a pointer. This pointer represents the entire complex scalar object, not an individual component.

cfloat cplx;
foo(&cplx); /* OK */

Vector data element: You can use the unary & operator to take the address of an indexing-style swizzle (for example, &x.s[1] or x.s[j]). The address is a pointer to that element, with the expected semantics. However, you cannot use the unary & operator to take the address of a non-indexing-style swizzle, such as s1, s1(), or r().

Types with the const qualifier: You can declare vector types and complex types with the const type qualifier. The semantics are the same as for non-vector types. Using the unary & operator to take the address of a const vector object gives a pointer to a const-qualified vector type.

Types with the volatile qualifier: You can declare vector types and complex types with the volatile type qualifier. The semantics are the same as for non-vector types. Using the unary & operator to take the address of a volatile vector object gives a pointer to a volatile-qualified vector type.

When optimizing access to any volatile object, the compiler must preserve the number and relative order in which accesses occur at runtime. If possible, the compiler accesses a volatile vector object whole-vector-at-a-time, not element-by-element. If that is not possible (for instance, because a vector is larger than a single vector register), the compiler can split the volatile access into smaller chunks, and the order in which these smaller chunks are accessed is unspecified.

Consider these loops. The first loop is the original, and the second loop is the vectorized version:


volatile int *p = array;                    /* original loop */
for ([..])
    *p++ = [..]                             /* process one int at a time in sequence */

volatile int8 *p8 = (volatile int8 *)array; /* vectorized loop */
for ([..])
    *p8++ = [..]                            /* process 8 int at a time in parallel */

The first loop in this example accesses one array element at a time in sequence. The second loop accesses 8 ints in parallel. Because the second loop changes the relative order of accesses, the compiler is not allowed to automatically convert the first loop to the second. You can manually make a change from one style of access to the other if that change is warranted.

Pointers with the restrict qualifier: You can add the restrict qualifier to a pointer to a vector type if the pointer follows the usual rules for the restrict keyword. Simply stated, the pointer to the vector must be the only way that the vector the pointer points to is accessed.

Correspondence between vectors and arrays: The most effective use of vectors is to process a large array of data with vectors that are as large as possible. Because vectors and arrays are so similar, you can easily use vector pointers to access large arrays. For example:

int array[N];
int8 vec = *(int8 *)array; /* vector to access first 8 ints in array */

One important use of vectors is to access an array by casting the address of the array (or a pointer) as a pointer-to-vector type. That pointer can then be used to load or store multiple elements of the array in parallel. Suppose you have the following example:

void dotp(int x[N], int y[N], int z[restrict])
{
    /* original */
    for (int i=0;i<N;i++)
    {
        /* process the data one int at a time */
        *z++ = *x++ + *y++;
    }
}

The compiler can automatically transform this simple example because the restrict keyword tells the compiler that the z input parameter does not overlap with either x or y.

void dotp(int x[N], int y[N], int z[restrict])
{
    int8 *xp = (int8 *)x;
    int8 *yp = (int8 *)y;
    int8 *zp = (int8 *)z;
    for (int i=0;i<N/8;i++)
    {
        /* process the data 8 ints at a time */
        *zp++ = *xp++ + *yp++;
    }
}

Note: When using a pointer to a vector to read from a non-complex array, match the element type of the vector to the element type of the non-complex array. For example, if an array type is const int array[N], make the vector a const with int elements, like const int8.

Using a pointer-to-vector to access an array is safe because elements of both arrays and vectors are stored and aligned the same way when stored in memory. In memory, a vector's first element (s0) is stored at the lowest address in memory, regardless of endian mode. The individual bytes of each element are stored according to the endianness mode. The first element of the array is then assigned to the vector's first element (s0).

When using C++, you can use similar code to process multiple elements of a std::vector or std::array in parallel using the data() member:

void dotp(std::vector<int> x, std::vector<int> y, std::vector<int> z)
{
    int8 *xp = (int8 *)x.data();
    int8 *yp = (int8 *)y.data();
    int8 *zp = (int8 *)z.data();
    for (int i=0;i<N/8;i++)
    {
        /* process the data 8 ints at a time */
        *zp++ = *xp++ + *yp++;
    }
}

void dotp(std::array<int> x, std::array<int> y, std::array<int> z)
{
    int8 *xp = (int8 *)x.data();
    int8 *yp = (int8 *)y.data();
    int8 *zp = (int8 *)z.data();
    for (int i=0;i<N/8;i++)
    {
        /* process the data 8 ints at a time */
        *zp++ = *xp++ + *yp++;
    }
}

Accessing a complex type as a vector: Complex scalar objects are stored in memory with the real component at the lowest address. Complex vector objects are stored as a sequence of complex scalar values, with the real component of s0 at the lowest address, followed by the imaginary component of s0, and so on.

You can access a non-complex array as a complex scalar:

float value[2];
cfloat x = *(cfloat *)&value;

You can access a complex scalar as a vector of length two with the same element type as the complex components:

cchar value;
char2 x = *(char2 *)&cchar;

You can access an array of non-complex scalars as a complex vector as long as the complex values are stored in the non-complex scalar array with the real component first, then the imaginary component. You can use either a 1- or 2-dimensional array, as convenient.

float value[2][] = { { r, i }, { r, i } ... };
cfloat4 x = *(cfloat4 *)&value;

float value[] = { r, i, r, i, ... };
cfloat4 x = *(cfloat4 *)&value;

You can access any complex vector as a vector of twice the length with the same element type as the complex components:

cchar4 value;
char8 x = *(char8 *)&value;

Note: When using a pointer to complex to read from a non-complex array, make the element type of the complex object match the element type of the non-complex array. For example, if your array is of type

const int
                    coefficients[2][N]

, make the vector a const with int elements (for example, const cint8).

A frequent use case for complex values in C code is to represent them as two-dimensional array of the component values, like so:

float coefficients[2][4] =
{ { real, imag }, { real, imag },
  { real, imag }, { real, imag }, };

/* access part of the coefficient array as a vector */
output = input * *(cfloat4 *)coefficients;

You can also initialize a complex vector from an array of C99 complex values, because the layout of a C99 complex object and a TI complex object are the same:

float _Complex coefficients[4] =
{ real + imag * _Complex_I, real + imag * _Complex_I,
  real + imag * _Complex_I, real + imag * _Complex_I, };

/* access part of the coefficient array as a vector */
output = input * *(cfloat4 *)coefficients;