SPRADP7 Application note

SPRADP7A February 2025 – March 2025 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM67A , TDA4AEN-Q1

2.4 TI Deep Learning Acceleration

Deep learning and neural networks are an increasingly popular strategy to extract meaning and information from imagery and other data. TI’s AM6xA and TDA4x SoC’s use an in-house developed hardware IP, the C7xMMA, with TI Deep Learning (TIDL) software to accelerate neural network inference.

The C7xMMA is a tightly coupled C7x SIMD DSP and matrix multiplier accelerator (MMA). The architecture is highly effective for Convolution Neural Networks (CNNs), which are a common type of neural network used for vision processing. In most CNNs, matrix multiplication and similar operations compose at least 98% of the total operations. In this way, MMA’s have a large impact on the computational efficiency of neural network acceleration for vision tasks, such as object detection, pixel-level segmentation, and key-point detection.

Figure 2-3 depicts a general development flow for TIDL on AM6xA and TDA4x processors. This development flow can be entered from multiple points. TI provides GUI-based and command line-based tools that enable users to:

Bring data (BYOD) and train a TI model
Bring pretrained model (BYOM) of a custom architecture
Evaluate a pre-trained and pre-optimized model from TI’s Model Zoo.

Where each of these development actions feeds into the next. Developers compile a model for the target SoC and can test accuracy on PC before deploying to the target. The compilation tools and accelerator are invoked through open source runtimes like Tensorflow Lite, ONNX Runtime or TVM. These runtimes provide a familiar API and allow unaccelerated layers to run on the Arm® A cores, easing usability for a broad host of models. Each of these open source runtimes (OSRT) leverage the TIDL runtime (TIDL_RT) under-the-hood.

Figure 2-3 TI Deep Learning Development Flow