SPRY344A January   2022  – March 2023 TDA4VM , TDA4VM-Q1

 

  1.   At a glance
  2.   Authors
  3.   Introduction
  4.   Defining AI at the edge
  5.   What is an efficient edge AI system?
    1.     Selecting an SoC architecture
    2.     Programmable core types and accelerators
  6.   Designing edge AI systems with TI vision processors
    1.     Deep learning accelerator
    2.     Imaging and computer vision hardware accelerators
    3.     Smart internal bus and memory architecture
    4.     Optimized system BOM
    5.     Easy-to-use software development environment
  7.   Conclusion

Programmable core types and accelerators

Let’s review the possible core types in edge AI systems:

CPUs

Central processing units (CPUs) are general-purpose processing units that can handle sequential workloads. They have great programming flexibility and benefit from a large existing code base. Generally, most edge AI systems have between two and eight CPU cores for managing platforms and feature-rich applications. CPUs only systems are not a good fit for highly specialized tasks such as pixel-level imaging, computer vision and convolution neural network (CNN) processing, however. CPUs also have high power consumption but the lowest throughput of the different core types. A single-core CPU system paired with dedicated hardware blocks such as AI acceleration, image processing can be used to to meet power budget requirements of low-cost applications.

GPUs

Graphics processing units (GPUs) have hundreds to thousands of small cores that are a good fit for parallel processing tasks. Originally designed to implement a sequence of graphics operations, GPUs are common in deep learning applications and especially useful for training DNNs. One of the main drawbacks is that, because of the high number of cores, GPUs consume a lot of power and have higher on-chip memory requirements.

DSPs

Digital signal processors (DSPs) are power-efficient, specialized cores typically designed to solve multiple complex math problems. DSPs process real-time data at low power from real-world vision, audio, speech, radar and sonar sensors. DSPs help maximize processing per clock cycle. They are not as easy to program, however, requiring familiarity with the features of the DSP hardware, programming environment and optimization of DSP software to achieve the best performance.

ASICs

Application-specific integrated circuits (ASICs) and accelerators deliver maximum performance at the lowest power for system applications. They are popular choices when you know the core kernels for the function you want to accelerate. For example, core computation for CNNs always involves matrix multiplications. For traditional computer vision tasks, dedicated hardware accelerators can compute operations such as image scaling, lens distortion correction and noise filtering.

FPGAs

Field-programmable gate arrays (FPGAs) are a class of integrated circuits where it is possible to reprogram and target the hardware blocks for specific applications. They have lower power consumption than GPUs and CPUs but use more power than ASICs. The hardware is difficult to program, however, and requires expertise in hardware descriptor languages such as Verilog or Very High Speed IC Hardware Description Language.