SDAA185 February   2026

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. Introduction
    1. 1.1 NPU Definition and Purpose
    2. 1.2 Key Capabilities
    3. 1.3 Technical Limitations
  5. Development Flow Overview
    1. 2.1 Model Development Phase
    2. 2.2 Model Compilation Phase
    3. 2.3 Application Integration Phase
  6. Example Model Creation (Python)
    1. 3.1 Model Selection Rationale
    2. 3.2 Model Architecture Design
    3. 3.3 Training Details
      1. 3.3.1 Development Environment Setup
      2. 3.3.2 Dataset Generation
      3. 3.3.3 Model Training Configuration
      4. 3.3.4 Quantization-Aware Training Process
  7. Quantization for Embedded Platform
    1. 4.1 Quantization Approaches: QAT versus PTQ
      1. 4.1.1 Post-Training Quantization (PTQ)
      2. 4.1.2 Quantization-Aware Training (QAT)
    2. 4.2 Quantization Frameworks and Wrapper Modules
      1. 4.2.1 Generic Wrappers for CPU Quantization
      2. 4.2.2 TINPU Wrappers for NPU Hardware Acceleration
  8. Validating the Model
    1. 5.1 Two-Phase Training Strategy
      1. 5.1.1 Initial Training Phase
      2. 5.1.2 Fine-Tuning Phase
    2. 5.2 Training Phase Comparison
    3. 5.3 Validation Results and Metrics
  9. Testing the Model
    1. 6.1 Inference Setup and Methodology
      1. 6.1.1 Generic User Testing Approach
    2. 6.2 Testing Results and Visual Analysis
      1. 6.2.1 Visual Performance Assessment
    3. 6.3 Quantitative Performance Metrics
  10. Moving the Model to TI MCU (C2000 – F28P55x) [Beginner Level]
  11. Moving the Model to TI MCU (C2000 – F28P55x) [Developer Level]
    1. 8.1 Compilation Prerequisites
      1. 8.1.1 Required TI Software Components
      2. 8.1.2 Environment Setup Process
    2. 8.2 Configuration File Setup
      1. 8.2.1 Configuration File Structure
        1. 8.2.1.1 Models Requiring Dequantization Flag
      2. 8.2.2 Special Configuration for Regression Models
        1. 8.2.2.1 Output Dequantization Flag
        2. 8.2.2.2 Compiler Constants Modification
        3. 8.2.2.3 Compilation Dictionary Update
    3. 8.3 Compilation Process Flow
      1. 8.3.1 Launching the Compilation
      2. 8.3.2 Compilation Phases
      3. 8.3.3 Common Issues to Watch For
  12. Setting up the MCU Project
    1. 9.1 Creating a CCS Project for NPU Applications
    2. 9.2 Understanding the NPU Interface
      1. 9.2.1 Key Interface Components
      2. 9.2.2 Basic Usage Pattern
  13. 10Testing the Model in the Embedded Environment
    1. 10.1 Visual Performance Assessment
    2. 10.2 Quantitative Performance Metrics
  14. 11NPU Integration in a Real-Time Signal Chain
    1. 11.1 Application Block Diagram
    2. 11.2 Application Code Implementation
    3. 11.3 Hardware Components Utilized
    4. 11.4 Hardware Validation Results
      1. 11.4.1 Input Signal Characteristics
      2. 11.4.2 Neural Network Output Analysis
  15. 12Key Design Decisions and Impact
    1. 12.1 NPU Handling of Numbers
      1. 12.1.1 Integer-Only Architecture
      2. 12.1.2 Working with Negative and Floating-Point Values
    2. 12.2 Supported Neural Network Layers and Constraints
      1. 12.2.1 Supported Layer Types
        1. 12.2.1.1 Convolution Layers
        2. 12.2.1.2 Other Core Layers
        3. 12.2.1.3 Flexibilities
    3. 12.3 Model Complexity and Size Limitations
      1. 12.3.1 Memory Constraints and Model Size
      2. 12.3.2 Optimization Process and Performance Trade-offs
  16. 13Benchmarks
    1. 13.1 Model Performance Comparison
      1. 13.1.1 128 - Neuron Model
      2. 13.1.2 64 - Neuron Model
      3. 13.1.3 16 - Neuron Model
      4. 13.1.4 Reference Benchmark
    2. 13.2 Performance Analysis
      1. 13.2.1 Model Selection Trade-offs
      2. 13.2.2 CPU versus NPU Performance
    3. 13.3 Pipeline Stage Timing Measurements
  17. 14Summary
    1. 14.1 Key Capabilities and Constraints
    2. 14.2 Development Workflow
    3. 14.3 Model Design Considerations
    4. 14.4 Implementation Challenges and Solutions
    5. 14.5 Broader Applications
  18. 15References
Application Note

Neural-Network Processing Unit (NPU) Guide