SPRADB4 june   2023 AM69A , TDA4VH-Q1

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
  5. 2AM69 Processor
  6. 3Edge AI Use Cases on AM69A
    1. 3.1 AI Box
    2. 3.2 Machine Vision
    3. 3.3 Multi-Camera AI
    4. 3.4 Other Use Cases
  7. 4Software Tools and Support
  8. 5Conclusion
  9. 6References

AM69 Processor

The AM69A processor is the best performance device among the AM6xA scalable embedded processor family. Along with the octal-core Arm® Cortex® A72 microprocessor, the AM69A provides the most significant levels of processing power, image and video processing, and graphics capability. Compared with the AM62A(1) and the AM68A(2), which are excellent choices for the applications with 1 – 2 cameras and 4 – 8 cameras, respectively, the AM69A enables the real-time processing of 12 cameras with improved AI performance. As shown in Figure 2-1, the AM69A processor features the following multiple sub-systems based on the heterogeneous architecture:

  • An octal-core Arm Cortex-A72 microprocessor at 2 GHz provides up to 100K Dhrystone Million Instructions Per Second (DMIPS).
  • Vision Processing Accelerator V3 (VPAC3) performs image processing in Vision Image Sub-System (VISS) to support raw image sensor through de-mosaic, defective pixel correction, auto exposure, auto white balance, chromatic aberration correction (CAC), and so forth. In addition, VPAC3 includes Lens Distortion Correction (LDC), Multi-Scaler (MSC), and Bilateral Noise Filter (BNF) hardware accelerators (HWAs) to accelerate correction of distorted images, down scaling of images into multiple resolutions and noise filtering, respectively. The AM69A has two instances of VPAC3, which can process 1,200 MP per second (MP/s) when assuming 20% system overhead.
  • Digital Signal Processing (DSP) and Matrix Multiplication Accelerator (MMA) are integrated together for DL acceleration as well as traditional computer vision tasks. The AM69A processor has four 512-bit C7x DSP running at 1 GHz. And each of them is tightly coupled with one of four MMAs capable of 4K (64 × 64) 8-bit fixed multiply accumulates per cycles. When run at 1 GHz, four MMAs provide a maximum compute speed of 32 Trillion Operations per Second (TOPS).
  • H.264, H.265 codec can encode and decode multiple channels simultaneously. H.264, H.265 codec supports H.264 Baseline, Main, High Profile at L5.2, and H.265 Main Profile at L5.1. There are two instances of video codec so that the H.264, H.265 encoder and decoder can process 960 MP/s, for example, 16 channels of 2MP at 30 frames per second (fps).
  • 3x 4-lane mobile industry processor interface (MIPI) CIS-2 RX ports are equipped in AM69A. Three high-resolution (for example, 12MP) cameras can be directly connected to CSI-2 RX ports, captured and pre-processed by two VPAC3 instances. Capturing twelve 2MP cameras is possible via MIPI CSI-2 4-to-1 aggregators.
  • BXS-4-64 GPU offers up to 50 Giga Floating-point Operations Per Second (GFLOPS) to enable dynamic 2D and 3D rendering for enhanced viewing applications.
  • Display Sub-System (DSS) supports multiple displays with the flexibility to interface with different panel types such as eDP, DSI, and DPI.
  • Improved memory architecture and high-speed interfaces improve the system throughput and energy efficiency by enabling high utilization of cores and HWAs. AM69A supports up to 64 Giga Bytes Per Second (GBps) DDR memory bandwidth.
GUID-20230517-SS0I-QCBB-FKDZ-JVM7M7CMRHQH-low.svg Figure 2-1 AM69A Block Diagram With Subsystems

Deep learning inference efficiency is crucial for the performance of an edge AI system. As the Performance and efficiency benchmarking with TDA4 Edge AI processors application note shows, MMA-based deep learning inference is 60% more efficient than a GPU-based one in terms of FPS and TOPS. The optimized network models for C7xMMA are also provided by TI Model Zoo(3), which is a large collection of DNN models optimized for C7xMMA for various computer vision tasks. The models include popular image classification, 2D and 3D object detection, semantic segmentation, and 6D pose estimation models. For the several models in TI Model Zoo, the 8-bit fixed-point inference performances on the TI embedded processors including AM69A can be evaluated via TI's Edge AI Studio.

The multicore heterogeneous architecture of AM6xA provides flexibility to optimize the performance of an edge AI system for various applications by utilizing suitable programmable cores or HWAs for particular tasks. For example, on AM69A, computationally-intense deep learning (DL) inference can run on four instances of MMA with optimized DL models, and vision processing, video encoding and decoding can be offloaded to two instances of VPAC3 and hardware-accelerated video codec for the best performance. Other functional blocks can be programmed in eight A72 cores or available C7x cores. Section 3 describes how edge AI systems can be built on AM69A for several industrial use cases.