SPRACZ2 August   2022 TDA4VM , TDA4VM-Q1

ADVANCE INFORMATION  

  1.   Abstract
  2. 1Introduction
    1. 1.1 Vision Analytics
    2. 1.2 End Equipments
    3. 1.3 Deep learning: State-of-the-art
  3. 2Embedded edge AI system: Design considerations
    1. 2.1 Processors for edge AI: Technology landscape
    2. 2.2 Edge AI with TI: Energy-efficient and Practical AI
      1. 2.2.1 TDA4VM processor architecture
        1. 2.2.1.1 Development platform
    3. 2.3 Software programming
  4. 3Industry standard performance and power benchmarking
    1. 3.1 MLPerf models
    2. 3.2 Performance and efficiency benchmarking
    3. 3.3 Comparison against other SoC Architectures
      1. 3.3.1 Benchmarking against GPU-based architectures
      2. 3.3.2 Benchmarking against FPGA based SoCs
      3. 3.3.3 Summary of competitive benchmarking
  5. 4Conclusion
  6.   Revision History
  7. 5References

MLPerf models

MLPerf Inference is a benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios. Quoting from the website mlperf.org, the benchmark aims to do: "Fair and useful benchmarks for measuring training and inference performance of ML hardware, software, and services". A major contribution of MLPerf is selection of representative models that permit reproducible measurements. Based on industry consensus, MLPerf Inference comprises models that are mature and have earned community support. MLPerf models are also open source and the software and models for the benchmarks are provided in github repositories in https://github.com/mlcommons and https://github.com/mlperf. Benchmarks are provided for both training and inference of models. Inference includes both cloud scenario as well as edge scenario. In this application note, as we discussed before, we focus on inference of models for edge and mobile benchmarks.

Image classification: As we saw in the introduction, image classification is a commonly used deep learning function for applications that include photo searches, text extraction, and industrial automation, such as object sorting and defect detection. MLPerf uses the ImageNet 2012 data set [10], crop the images to 224x224 in preprocessing, and measure Top-1 accuracy. MLPerf suggests two models: a computationally heavyweight model that is more accurate and a computationally lightweight model that is faster but less accurate. The heavyweight model, ResNet-50 v1.5 [16] is used in this benchmarking and comparison.

Object detection. Object detection is a vision task that determines the coordinates of bounding boxes around objects in an image and then classifies those boxes. Implementations typically use a pretrained image-classifier network as a backbone or feature extractor, then perform regression for localization and bounding-box selection. Object detection is crucial for a multitude of tasks in automotive and robotics, such as detecting hazards and analyzing traffic, and for mobile-retail tasks, such as identifying items in a picture. MLPerf suggests two models: a lightweight model using 300x300 image and a heavyweight model using 1200x1200 image with the COCO data set [11].

Based on this, the two models used in the app note are shown in Table 3-1below.

Note: TI has not officially submitted the results to MLcommons.org yet. These models are used because they represent practical use cases and these models are used by other edge AI SoC vendors.
Table 3-1 TDA4VM performance and power measurements
DL Model Function Image size Data set Compute requirements per input
ResNet-50 Image Classification 224x224 IMAGENET 8.2 GOPS
25.6 million parameters
SSD MobileNet-V1 Object Detection 300x300 COCO 2.47 GOPS
6.91 million parameters

MLPerf inference standard also defines different scenarios for benchmarking - single-stream, multi-stream, server, and offline. For real-time embedded edge AI systems such as smart cameras, machine vision and robotics, the most relevant scenarios are single-stream and multi-stream use cases involving image and video processing from single and multiple cameras simultaneously. We will be using single-stream use case in the benchmarking.