SPRUIG8J January 2018 – March 2024
The Flag Status Register (FSR), which contains
bits representing floating point status, can be accessed using the
__get_FSR(type)
API, which is defined in
c7x.h
. The API takes a type argument, which refers to a
valid scalar or vector floating point type (except for "float3") that is being used
with the floating point operation.
The API returns an "OR" of the data bits for all pertinent vector lanes. The result is an 8-bit value containing the following fields:
For example:
float4 a = ... ;
float4 b = ... ;
float4 c = a * b;
uint8_t fsr_val = __get_FSR(float4);
The __get_FSR(type)
API is
provided to make accessing the FSR easier. The actual hardware register is a 64-bit
value that is divided into eight 8-bit chunks. Each 8-bit chunk corresponds to a
64-bit vector slice of data in either the input or output data, depending on the
operation being performed. A 64-bit slice may consist of a 64-bit double or two
32-bit float values that are OR'd together by the hardware.
However, for vector operations, while this "OR" is done for every 64-bit slice, the results for all 64-bit slices are not OR'd together by the hardware. The reason for this is that when partial vectors are used (less than 512 bits), the upper lanes of a vector are considered invalid and are ignored and therefore should not be reflected in the final FSR result. To ensure that only the information pertinent to the valid lanes of a vector are reflected, the API allows users to specify the scalar or vector type of the data they are working with. The API will then ensure that only the valid 64-bit vector slices are OR'd together through a sequence of instructions to produce a final 8-bit result. All invalid lanes are therefore ignored.
Using the __get_FSR(type)
API
results in performance degradations. This is because the API inserts a sequence
of instructions to ensure that only the valid vector lanes are reflected in the
final result. The API also prevents loop vectorization throughout a function in
which it is used because vectorization would change the number of valid vector
lanes in ways the user is not able to track.