SPRAD90 February   2023 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1

 

  1.   Abstract
  2.   Trademarks
  3. 1Introduction
    1. 1.1 Change Cortex-A53 Clock Frequency
  4. 2Processor Core Benchmarks
    1. 2.1 Dhrystone
  5. 3Compute and Memory System Benchmarks
    1. 3.1 Memory Bandwidth and Latency
      1. 3.1.1 LMBench
      2. 3.1.2 STREAM
      3. 3.1.3 Critical Memory Access Latency
    2. 3.2 CoreMark-Pro
    3. 3.3 Fast Fourier Transform
    4. 3.4 Cryptographic Benchmarks
  6. 4Application Benchmarks
    1. 4.1 Machine Learning Inference
  7. 5References

Machine Learning Inference

Comprehensive Machine Learning performance numbers on 2TOPS Deep Learning Accelerator (C7x 256v with MMA) on AM62Ax will be available on EdgeAI cloud : Edge AI (ti.com).

TensorFlow Lite is used to test the performance of the Arm-Cortex-A53 processors in deep learning inference at the edge. As examples below are two runs of TensorFlow Lite models for image classification (224x224 pixels 3 bytes for colors) based on imagenet database and 1000 object classes. A quantized Mobilenetv1 and floating point Mobilenetv2 were chosen as common benchmarks that can be used to interpolate the performance of an inference application. These models are not available in the SDK. The TensorFlow Lite classifier and models (1.15-r5.0) were downloaded from the official host website at tensorflow.org . The example image of Rear Admiral Grace Hopper is installed in the file system (available at). The example label_image program will crop and resize the bmp image to the 224 x 224 pixels before calling the TensorFlow Lite. The below code block shows the terminal printout of Mobilenetv1 (mobilenet_v1_1.0_224_quant.tflite) and Mobilenetv2 (mobilenet_v2_1.0_224.tflite) models inference execution of the same image resolutions (224x224x3).

root@am62axx-evm:/usr/share/tensorflow-lite/examples# ./label_image -i grace_hopper.bmp -l labels.txt -m mobilenet_v1_1.0_224_quant.tflite
Loaded model mobilenet_v1_1.0_224_quant.tflite
resolved reporter
invoked
average time: 56.945 ms
0.780392: 653 military uniform
0.105882: 907 Windsor tie
0.0156863: 458 bow tie
0.0117647: 466 bulletproof vest
0.00784314: 835 suit

root@am62axx-evm:/usr/share/tensorflow-lite/examples# ./label_image -i grace_hopper.bmp -l labels.txt -m mobilenet_v2_1.0_224.tflite
Loaded model mobilenet_v2_1.0_224.tflite
resolved reporter
invoked
average time: 178.05 ms
0.911345: 653 military uniform
0.014466: 835 suit
0.0062473: 440 bearskin
0.00296661: 907 Windsor tie
0.00269019: 753 racket

For full performance evaluation of the Arm-Cortex-A53 processors, the benchmarks were executed at both 1.25 GHz and 1.4 GHz. #GUID-34639EF2-4B4E-4983-AB94-496A68EFE5D7/GUID-9A2FB03B-E928-4A66-9BC0-ADAF7F959748 shows the average time for all tests executions.

Table 4-1 Average Time for All Tests Executions
Model Arm-Cortex-A53
At 1.25 GHz
Arm-Cortex-A53
At 1.4 GHz
Mobilenetv1 mobilenet_v1_1.0_224_quant.tflite 63.35 ms 56.94 ms
Mobilenetv2 mobilenet_v2_1.0_224.tflite 192.10 ms 178.05 ms