SDAA185 February   2026

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. Introduction
    1. 1.1 NPU Definition and Purpose
    2. 1.2 Key Capabilities
    3. 1.3 Technical Limitations
  5. Development Flow Overview
    1. 2.1 Model Development Phase
    2. 2.2 Model Compilation Phase
    3. 2.3 Application Integration Phase
  6. Example Model Creation (Python)
    1. 3.1 Model Selection Rationale
    2. 3.2 Model Architecture Design
    3. 3.3 Training Details
      1. 3.3.1 Development Environment Setup
      2. 3.3.2 Dataset Generation
      3. 3.3.3 Model Training Configuration
      4. 3.3.4 Quantization-Aware Training Process
  7. Quantization for Embedded Platform
    1. 4.1 Quantization Approaches: QAT versus PTQ
      1. 4.1.1 Post-Training Quantization (PTQ)
      2. 4.1.2 Quantization-Aware Training (QAT)
    2. 4.2 Quantization Frameworks and Wrapper Modules
      1. 4.2.1 Generic Wrappers for CPU Quantization
      2. 4.2.2 TINPU Wrappers for NPU Hardware Acceleration
  8. Validating the Model
    1. 5.1 Two-Phase Training Strategy
      1. 5.1.1 Initial Training Phase
      2. 5.1.2 Fine-Tuning Phase
    2. 5.2 Training Phase Comparison
    3. 5.3 Validation Results and Metrics
  9. Testing the Model
    1. 6.1 Inference Setup and Methodology
      1. 6.1.1 Generic User Testing Approach
    2. 6.2 Testing Results and Visual Analysis
      1. 6.2.1 Visual Performance Assessment
    3. 6.3 Quantitative Performance Metrics
  10. Moving the Model to TI MCU (C2000 – F28P55x) [Beginner Level]
  11. Moving the Model to TI MCU (C2000 – F28P55x) [Developer Level]
    1. 8.1 Compilation Prerequisites
      1. 8.1.1 Required TI Software Components
      2. 8.1.2 Environment Setup Process
    2. 8.2 Configuration File Setup
      1. 8.2.1 Configuration File Structure
        1. 8.2.1.1 Models Requiring Dequantization Flag
      2. 8.2.2 Special Configuration for Regression Models
        1. 8.2.2.1 Output Dequantization Flag
        2. 8.2.2.2 Compiler Constants Modification
        3. 8.2.2.3 Compilation Dictionary Update
    3. 8.3 Compilation Process Flow
      1. 8.3.1 Launching the Compilation
      2. 8.3.2 Compilation Phases
      3. 8.3.3 Common Issues to Watch For
  12. Setting up the MCU Project
    1. 9.1 Creating a CCS Project for NPU Applications
    2. 9.2 Understanding the NPU Interface
      1. 9.2.1 Key Interface Components
      2. 9.2.2 Basic Usage Pattern
  13. 10Testing the Model in the Embedded Environment
    1. 10.1 Visual Performance Assessment
    2. 10.2 Quantitative Performance Metrics
  14. 11NPU Integration in a Real-Time Signal Chain
    1. 11.1 Application Block Diagram
    2. 11.2 Application Code Implementation
    3. 11.3 Hardware Components Utilized
    4. 11.4 Hardware Validation Results
      1. 11.4.1 Input Signal Characteristics
      2. 11.4.2 Neural Network Output Analysis
  15. 12Key Design Decisions and Impact
    1. 12.1 NPU Handling of Numbers
      1. 12.1.1 Integer-Only Architecture
      2. 12.1.2 Working with Negative and Floating-Point Values
    2. 12.2 Supported Neural Network Layers and Constraints
      1. 12.2.1 Supported Layer Types
        1. 12.2.1.1 Convolution Layers
        2. 12.2.1.2 Other Core Layers
        3. 12.2.1.3 Flexibilities
    3. 12.3 Model Complexity and Size Limitations
      1. 12.3.1 Memory Constraints and Model Size
      2. 12.3.2 Optimization Process and Performance Trade-offs
  16. 13Benchmarks
    1. 13.1 Model Performance Comparison
      1. 13.1.1 128 - Neuron Model
      2. 13.1.2 64 - Neuron Model
      3. 13.1.3 16 - Neuron Model
      4. 13.1.4 Reference Benchmark
    2. 13.2 Performance Analysis
      1. 13.2.1 Model Selection Trade-offs
      2. 13.2.2 CPU versus NPU Performance
    3. 13.3 Pipeline Stage Timing Measurements
  17. 14Summary
    1. 14.1 Key Capabilities and Constraints
    2. 14.2 Development Workflow
    3. 14.3 Model Design Considerations
    4. 14.4 Implementation Challenges and Solutions
    5. 14.5 Broader Applications
  18. 15References

Quantitative Performance Metrics

Beyond visual inspection, the testing framework calculates comprehensive error metrics to quantify prediction accuracy with precision. The model's performance can be precisely quantified through the following error metrics obtained during testing:

  • Mean Absolute Error (MAE): 0.013827
  • Maximum Error: 0.093750
  • Minimum Error: 0.000126

These quantitative results complement the visual assessment provided by the graph, offering numerical confirmation of the model's excellent approximation capabilities despite the constraints of Quantized training. The combination of visual and numerical validation provides complete confidence in the model's performance prior to hardware implementation.