DSP Libraries - Linear Algebra

The TI Linear Algebra library (LINALG) is an optimized library for performing dense linear algebra computations. It includes optimized BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) libraries for dense linear algebra.

The BLAS library provides routines to perform basic vector and matrix operations. Level 1 routines provide scalar-vector and vector-vector operations, level 2 routines provide matrix-vector operations and level 3 routines provide matrix-matrix operations. The TI optimized BLAS library supports the CBLAS API and is based on BLIS (BLAS-like library instantiation software) 0.1.6.

The LAPACK library built using the f2c utility on LAPACK provides routines for solving systems of simultaneous linear equations, least squares solutions of linear systems of equations, eigenvalue problems and singular value problems. The LAPACK library relies on the TI optimized BLAS library for acceleration.

For more detailed information on the LINALG, see the LINALG User’s Guide.

Key library features

  • Support for the standard CBLAS APIs and CLAPACK APIs.
  • Support for single core and multi core CBLAS computation.
  • CBLAS can be configured to run on either ARM or DSP cores.
  • CLAPACK only runs on ARM cores (though CBLAS functions used by CLAPACK are accelerated by DSP cores).
  • Build and run examples.

LINALG performance

The following table shows example performance of LINALG routines as measured on multicore ARM Cortex-A15 cores and C66x DSP cores in two different TI SoCs.

1 Time represents total seconds for processing across all ARM Cortex-A15 cores on the device running at the designated speed.
2 Time represents total seconds for processing across all C66x DSP cores on the device running at the designated speed. Time includes OpenCL overhead.
More performance information can be found in the Benchmarking section in the LINALG User's Guide.

Hardware Platform Used AM572x EVM 66AK2H EVM
Devices with core configuration benchmarked AM5726 66AK2H12
AM5728 66AK2H14
Cores Benchmarked 2x ARM A15 2x C66x DSPs Heading Speedup measured when moving code from ARM to DSP 4x ARM A15 8x C66x DSPs Speedup measured when moving code from ARM to DSP
@1.5GHz @750MHz @1.2Ghz @1.2GHz
LINALG Functions Time in seconds1 Time in seconds2 Time in seconds1 Time in seconds2
SGEMM (m=n=k=1000) 0.27 0.157 1.7x 0.121 0.027 4.5x
DGEMM (m=n=k=1000) 0.786 0.55 1.4x 0.294 0.084 3.5x
CGEMM (m=n=k=1000) 6.98 0.635 11x 2.36 0.09 26.2x
ZGEMM (m=n=k=1000) 7.62 3.83 2x 2.54 0.516 4.9x
CHERK (m=k=1000) 3.53 0.392 9x 1.2 0.053 22.6x
ZHERK (m=k=1000) 3.85 2.27 1.7x 1.3 0.33 3.9x

Included examples

The following examples are provided to show how to use LINALG with CBLAS and CLAPACK API. They are located in /examples:

ARM+DSP examples

These are located in the arm+dsp folder. All examples run on the host (ARM) and may offload BLAS functions to DSP according to BLAS Configuration.

  • Matrix multiplication (dgemm)
  • Symmetric rank k operation (dsyrk)
  • Triangular matrix multiplication (dtrmm)
  • Triangular matrix equation solver (dtrsm)
  • Eigen decomposition and matrix inversion (eig)
  • LU decomposition and matrix inversion (ludinv)
  • xGEMM benchmarking (gemm_bench)

DSP-only example

This is located in the dsp folder. This runs on the DSP through CCS and JTAG.

  • Matrix multiplication (dgemm)

Download LINALG

LINALG has been ported to the following devices and is included as part of TI’s free Processor SDK that can be downloaded for the following devices using the links below:

Sitara AM57x

Single/dual Cortex®-A15 and C66x DSPs

C667x Multicore DSP

Highest performance multicore DSP

C6000 DSP + ARM 66AK2Hx

High performance multicore C66x DSPs + ARM Cortex®-A15s.

Within processor SDK, LINALG is found at the following locations:

  • For Processor-SDK RTOS: <Processor-SDK-RTOS-installation-root>/linalg_<version>
  • For Processor-SDK Linux: /linux-devkit/sysroots/cortexa15hf-vfp-neon-linux-gnueabi/usr/share/ti/ti-linalg-tree