SPRADC9 july   2023 AM62A3 , AM62A7

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
    1. 1.1 Defect Detection Demo Summary
    2. 1.2 AM62A Processor
    3. 1.3 Defect Detection Systems
    4. 1.4 Conventional Machine Vision vs Deep Learning
  5. 2Data Set Preparation
    1. 2.1 Test Samples
    2. 2.2 Data Collection
    3. 2.3 Data Annotation
    4. 2.4 Data Augmentation
  6. 3Model Selection and Training
    1. 3.1 Model Selection
    2. 3.2 Model Training and Compilation
  7. 4Application Development
    1. 4.1 System Flow
    2. 4.2 Object Tracker
    3. 4.3 Dashboard and Bounding Boxes Drawing
    4. 4.4 Physical Demo Setup
  8. 5Performance Analysis
    1. 5.1 System Accuracy
    2. 5.2 Frame Rate
    3. 5.3 Cores Utilization
    4. 5.4 Power Consumption
  9. 6Summary
  10. 7References

Cores Utilization

The AM62A SoC consists of various processing cores and hardware accelerators. Monitoring the loads on these components is important to explore the whole system capabilities and the expansion opportunity. The defect detection demo uses tiperfoverlay gstreamer plugin to show core loads as a bar graph at the bottom of the screen. Figure 5-2 shows a screenshot of the core loads graph of AM62A while running the defect detection demo. By default, the graph is updated every two seconds to show the loads as a utilization percentage. In addition to the tiperfoverlay gstreamer plugin, the perf_stats tool is a second option to show cores performance directly on the terminal with an option for file save. This option is more accurate compared to the tiperfoverlay as the later adds extra load on the Arm cores and the DDR to draw the graph and overlay it on the screen.

GUID-20230630-SS0I-TVZN-2FMD-CZ5RMWVLLMQH-low.png Figure 5-2 Core Load Graph Bar Shown at the Bottom of the Defect Detection Demo Using tiperfoverlay gstreamer Plugin (the figure is edited to appropriately fit in the page)

The graph shown in Figure 5-2 shows that the defect detection demo in addition to the whole supporting Linux processes utilizes about only 39% of the Arm cores capacity (averaged across four A53 cores). In the same time the yolox-nano-lite used in the application utilizes about 22% of the C7xMMA deep learning accelerator. It is important to note that in this experiment, the C7xMMA is clocked at 850 MHz instead of 1000 MHz. In other words, if the C7xMMA accelerator was clocked at 1000 MHz, its utilization will be less than the reported 22%. The DDR used for read operations is 1706 MB/s and for write operations is 1118 MB/s resulting in a total of 2824 MB/s operations. The AM62A supports a total DDR band of 12.8 GB/s when using 32 bit DDR4 at 3200 MT/s. The total 2824 MB/s utilizes about 22 % of the total DDR bandwidth.

These low utilization values of the Arm cores, accelerators, and DDR bandwidth indicate that there is a big room for expansion on the AM62A to run additional applications or to expand the defect detection application itself such as increasing the frame rate by using another faster camera. In addition, the low cores utilization provides flexibility to select the right SoC variant of AM6A. The core loads shown in Figure 5-2 are for the AM62A74 variant of the SOC AM62A family. This variant is equipped with four A53 Arm cores and a C7xMMA deep learning accelerators capable of executing two TOPS. The low utilization values suggest that the defect detection demo in its current form can be implemented on other lower end variants of the AM62A such as AM62A3, which includes two Arm cores and one TOPS deep learning accelerator.