SDAA185 Application note

SDAA185 February 2026

1
Abstract
Trademarks
1 Introduction
2 Development Flow Overview
3 Example Model Creation (Python)
4 Quantization for Embedded Platform
1. 4.1 Quantization Approaches: QAT versus PTQ
  1. 4.1.1 Post-Training Quantization (PTQ)
  2. 4.1.2 Quantization-Aware Training (QAT)
2. 4.2 Quantization Frameworks and Wrapper Modules
  1. 4.2.1 Generic Wrappers for CPU Quantization
  2. 4.2.2 TINPU Wrappers for NPU Hardware Acceleration
5 Validating the Model
6 Testing the Model
7 Moving the Model to TI MCU (C2000 – F28P55x) [Beginner Level]
8 Moving the Model to TI MCU (C2000 – F28P55x) [Developer Level]
9 Setting up the MCU Project
1. 9.1 Creating a CCS Project for NPU Applications
2. 9.2 Understanding the NPU Interface
  1. 9.2.1 Key Interface Components
  2. 9.2.2 Basic Usage Pattern
10Testing the Model in the Embedded Environment
1. 10.1 Visual Performance Assessment
2. 10.2 Quantitative Performance Metrics
11NPU Integration in a Real-Time Signal Chain
12Key Design Decisions and Impact
13Benchmarks
14Summary
15References

1.3 Technical Limitations

While powerful, the F28P55x NPU operates under several constraints that influence application design:

Architectural Limitations: Neural Network topologies like CNNs and MLPs with ReLu activations are better supported compared to complex architectures such as LSTMs or Transformers.
Precision Tradeoffs: Quantization necessary for NPU execution introduces precision loss compared to floating-point implementations, requiring careful training approaches to maintain accuracy.
Development Workflow Complexity: Specific toolchain requirements for model compilation and deployment add additional development steps compared to standard microcontroller programming.

These capabilities and limitations frame the practical application space for the F28P55x NPU in automotive and industrial embedded systems, where balancing computational power with resource constraints is essential for successful implementation.