SDAA185
February 2026
1
Abstract
Trademarks
1
Introduction
1.1
NPU Definition and Purpose
1.2
Key Capabilities
1.3
Technical Limitations
2
Development Flow Overview
2.1
Model Development Phase
2.2
Model Compilation Phase
2.3
Application Integration Phase
3
Example Model Creation (Python)
3.1
Model Selection Rationale
3.2
Model Architecture Design
3.3
Training Details
3.3.1
Development Environment Setup
3.3.2
Dataset Generation
3.3.3
Model Training Configuration
3.3.4
Quantization-Aware Training Process
4
Quantization for Embedded Platform
4.1
Quantization Approaches: QAT versus PTQ
4.1.1
Post-Training Quantization (PTQ)
4.1.2
Quantization-Aware Training (QAT)
4.2
Quantization Frameworks and Wrapper Modules
4.2.1
Generic Wrappers for CPU Quantization
4.2.2
TINPU Wrappers for NPU Hardware Acceleration
5
Validating the Model
5.1
Two-Phase Training Strategy
5.1.1
Initial Training Phase
5.1.2
Fine-Tuning Phase
5.2
Training Phase Comparison
5.3
Validation Results and Metrics
6
Testing the Model
6.1
Inference Setup and Methodology
6.1.1
Generic User Testing Approach
6.2
Testing Results and Visual Analysis
6.2.1
Visual Performance Assessment
6.3
Quantitative Performance Metrics
7
Moving the Model to TI MCU (C2000 – F28P55x) [Beginner Level]
8
Moving the Model to TI MCU (C2000 – F28P55x) [Developer Level]
8.1
Compilation Prerequisites
8.1.1
Required TI Software Components
8.1.2
Environment Setup Process
8.2
Configuration File Setup
8.2.1
Configuration File Structure
8.2.1.1
Models Requiring Dequantization Flag
8.2.2
Special Configuration for Regression Models
8.2.2.1
Output Dequantization Flag
8.2.2.2
Compiler Constants Modification
8.2.2.3
Compilation Dictionary Update
8.3
Compilation Process Flow
8.3.1
Launching the Compilation
8.3.2
Compilation Phases
8.3.3
Common Issues to Watch For
9
Setting up the MCU Project
9.1
Creating a CCS Project for NPU Applications
9.2
Understanding the NPU Interface
9.2.1
Key Interface Components
9.2.2
Basic Usage Pattern
10
Testing the Model in the Embedded Environment
10.1
Visual Performance Assessment
10.2
Quantitative Performance Metrics
11
NPU Integration in a Real-Time Signal Chain
11.1
Application Block Diagram
11.2
Application Code Implementation
11.3
Hardware Components Utilized
11.4
Hardware Validation Results
11.4.1
Input Signal Characteristics
11.4.2
Neural Network Output Analysis
12
Key Design Decisions and Impact
12.1
NPU Handling of Numbers
12.1.1
Integer-Only Architecture
12.1.2
Working with Negative and Floating-Point Values
12.2
Supported Neural Network Layers and Constraints
12.2.1
Supported Layer Types
12.2.1.1
Convolution Layers
12.2.1.2
Other Core Layers
12.2.1.3
Flexibilities
12.3
Model Complexity and Size Limitations
12.3.1
Memory Constraints and Model Size
12.3.2
Optimization Process and Performance Trade-offs
13
Benchmarks
13.1
Model Performance Comparison
13.1.1
128 - Neuron Model
13.1.2
64 - Neuron Model
13.1.3
16 - Neuron Model
13.1.4
Reference Benchmark
13.2
Performance Analysis
13.2.1
Model Selection Trade-offs
13.2.2
CPU versus NPU Performance
13.3
Pipeline Stage Timing Measurements
14
Summary
14.1
Key Capabilities and Constraints
14.2
Development Workflow
14.3
Model Design Considerations
14.4
Implementation Challenges and Solutions
14.5
Broader Applications
15
References
Application Note
Neural-Network Processing Unit (NPU) Guide