DSP Libraries - Linear Algebra
The TI Linear Algebra library (LINALG) is an optimized library for performing dense linear algebra computations. It includes optimized BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) libraries for dense linear algebra.
The BLAS library provides routines to perform basic vector and matrix operations. Level 1 routines provide scalar-vector and vector-vector operations, level 2 routines provide matrix-vector operations and level 3 routines provide matrix-matrix operations. The TI optimized BLAS library supports the CBLAS API and is based on BLIS (BLAS-like library instantiation software) 0.1.6.
The LAPACK library built using the f2c utility on LAPACK provides routines for solving systems of simultaneous linear equations, least squares solutions of linear systems of equations, eigenvalue problems and singular value problems. The LAPACK library relies on the TI optimized BLAS library for acceleration.
For more detailed information on the LINALG, see the LINALG User’s Guide.
Key library features
- Support for the standard CBLAS APIs and CLAPACK APIs.
- Support for single core and multi core CBLAS computation.
- CBLAS can be configured to run on either ARM or DSP cores.
- CLAPACK only runs on ARM cores (though CBLAS functions used by CLAPACK are accelerated by DSP cores).
- Build and run examples.
LINALG performance
The following table shows example performance of LINALG routines as measured on multicore ARM Cortex-A15 cores and C66x DSP cores in two different TI SoCs.
^{1} Time represents total seconds for processing across all ARM Cortex-A15 cores on the device running at the designated speed. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hardware Platform Used | AM572x EVM | 66AK2H EVM | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Devices with core configuration benchmarked | AM5726 | 66AK2H12 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
AM5728 | 66AK2H14 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Cores Benchmarked | 2x ARM A15 | 2x C66x DSPs | Heading Speedup measured when moving code from ARM to DSP | 4x ARM A15 | 8x C66x DSPs | Speedup measured when moving code from ARM to DSP | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
@1.5GHz | @750MHz | @1.2Ghz | @1.2GHz | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
LINALG Functions | Time in seconds^{1} | Time in seconds^{2} | Time in seconds^{1} | Time in seconds^{2} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
SGEMM (m=n=k=1000) | 0.27 | 0.157 | 1.7x | 0.121 | 0.027 | 4.5x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
DGEMM (m=n=k=1000) | 0.786 | 0.55 | 1.4x | 0.294 | 0.084 | 3.5x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CGEMM (m=n=k=1000) | 6.98 | 0.635 | 11x | 2.36 | 0.09 | 26.2x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ZGEMM (m=n=k=1000) | 7.62 | 3.83 | 2x | 2.54 | 0.516 | 4.9x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
CHERK (m=k=1000) | 3.53 | 0.392 | 9x | 1.2 | 0.053 | 22.6x | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ZHERK (m=k=1000) | 3.85 | 2.27 | 1.7x | 1.3 | 0.33 | 3.9x |
Included examples
The following examples are provided to show how to use LINALG with CBLAS and CLAPACK API. They are located in /examples:
ARM+DSP examples
These are located in the arm+dsp folder. All examples run on the host (ARM) and may offload BLAS functions to DSP according to BLAS Configuration.
- Matrix multiplication (dgemm)
- Symmetric rank k operation (dsyrk)
- Triangular matrix multiplication (dtrmm)
- Triangular matrix equation solver (dtrsm)
- Eigen decomposition and matrix inversion (eig)
- LU decomposition and matrix inversion (ludinv)
- xGEMM benchmarking (gemm_bench)
DSP-only example
This is located in the dsp folder. This runs on the DSP through CCS and JTAG.
- Matrix multiplication (dgemm)
Download LINALG
LINALG has been ported to the following devices and is included as part of TI’s free Processor SDK that can be downloaded for the following devices using the links below:
Within processor SDK, LINALG is found at the following locations:
- For Processor-SDK RTOS: <Processor-SDK-RTOS-installation-root>/linalg_<version>
- For Processor-SDK Linux: /linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi/usr/share/ti/ti-linalg-tree