SDAA429 Application note

SDAA429 June 2026 MSPM0G5187

4.2 NPU/CPU Performance Comparison

The Edge AI model can be deployed to hardware via TI Neural Network Compiler, targeting either the dedicated hardware NPU or the host CPU. The TinyEngine NPU is a dedicated hardware accelerator specifically designed to execute neural network computations with high efficiency, delivering significantly reduced inference latency and power consumption compared to a general-purpose CPU.

To facilitate performance evaluation, the MSPM0 SDK provides both NPU and CPU implementation examples for user benchmarking, refers to:

Table 5-9 summarizes the performance comparison between the NPU-based and CPU-based Edge AI designs for the digit recognition application.

Table 4-3 NPU/CPU Performance Comparison

Performance Metric	NPU-Based Design	CPU-Based Design
Accuracy	~99%	~99%
Flash Usage	73 KB	68 KB
RAM Usage	10.9 KB	8.1 KB
Inference Latency	6.05 ms	89.81 ms
Inference Power Consumption (AVG)	424.65 uJ	6,265.32 uJ

The NPU-based design achieves approximately 14x lower inference latency compared to the CPU-based implementation and effectively reduces the energy consumption per inference by approximately 93%, making it a highly efficient choice for power-sensitive edge applications.