



# C6000 Compile Tools / PBC Agenda

- CCS 1.2 Announcement
- C6000 Release 4.0
- Profile Based Compiler
- Roadmap



# TI DSP Compile Tools Value Proposition

For the embedded software developer, TI's DSP Compile Tools - co-developed with TI's DSPs - offer the highest performance and code density in the industry due to architecture-specific optimizations as well as application-level analysis including interactive feedback, tuning, profiling, and system memory allocation.



# TI Compile Tools Current Focus

- Architecture Co-development - Compiler and architecture work in unison
- High performance - alleviates the need to hand code assembly
- High code density - reduces system cost by minimizing memory requirements
- Architecture Specific Optimizations - Compiler possesses the knowledge of the expert hand coded assembly writer.
- Unique Interactive Tuning and Feedback
- Application-level optimizations - Utilizes knowledge of entire application to optimize key components
- Profile Based Compiler - Makes the right tradeoff along a two dimensional codesize vs performance graph
- Visual Linker - Eases System Memory Allocation
- Moving Forward → Unified Build Environment and Alchemy



# Compiler Status/Roadmap - Platforms

- C6000
  - Industry's Best Tuned and Out of the Box C performance
  - 4.0 Meets Internal Goals - 65% NatC, >80% OptC, >95% LinASM
  - Take C64x performance to C62x Levels
  - Continue to improve “out of the box” C performance
- C5000
  - Code Size better than Arm with Thumb mode
  - Mnemonic Assembler ensures compatibility
  - Need to add more functionality into Assembler
  - Initial Benchmarks in place end of March
  - Will use to drive 2.0 Goals

Industry leading real-time tools  
reduce cost, risk and development time

Enhancements

In Code Composer Studio 1.2

**DSP/BIOS II**

Flexibility, scalability and ease of implementation

**New Compiler Tools**

Visualize and optimize for maximum productivity

**New Cores**

All customers can start today!

Slash product development time over 50%



# New C6000™ Compile Tools

**#1 DSP Compiler Extends Performance Lead**

[www.ti.com/sc/c6000compiler](http://www.ti.com/sc/c6000compiler)

## Out-of-the-box Compiler Performance Improvement



- Achieves 80-90% performance vs. hand coded assembly
- Performance statistics backed up with real code examples downloadable today
- Out-of-the-box C code focus has produced more than 20% performance improvement
- Unique compiler feedback
- Support for C++



## Continuation of Speaker Notes

# New C6000™ Compile Tools

**Visualize and optimize code size and performance trade-offs**



## PROFILE-BASED COMPILER SOLUTIONS

- Build and profile multiple build option sets
- Automatically plot a 2D graph of code size vs performance
- Graphically select the optimum combination of size and speed for your application
- Click to build desired performance and code size trade-off in seconds



## Continuation of Speaker Notes

# Profile Based Compiler Details

**Visualize and optimize code size and performance trade-offs**



## PROFILE-BASED COMPILER SOLUTIONS

- Express Assistant to Start
- On-line Tutorial
- Includes Ready to Run Demo
- File Overrides for ISR, etc.

# PBC Results on EFR GSM

- 288Kcycles at 60 Kbytes
- 311Kcycles at 56 Kbytes
- Fastest -  
276Kcycles at 65 Kbytes
- Lowest Code Size -  
45Kbytes at 1.25 Mcycles





# Performance Roadmap - Two Vectors

- Compiler gathers system/application-level information
  - Use profiling to get run-time behavior knowledge
  - Feed the compiler more system details (memory maps, libraries) to gain more contextual knowledge
  - Continue to develop optimizations to utilize these new sources of information
  - Continue to drive Architecture Specific Optimizations
- Interactive Visual tuning tools for the User
  - Identify performance critical code and provide suggestions for improvement
  - Graphical System Optimization
  - Automatically choose the best compiler optimization levels for an application based on user criteria

# Driving Performance!

## Benchmarking

- Methodology
  - Representative benchmarks created with both C and optimal hand coded assembly implementations
  - Each benchmark wrapped in a process that self checks correctness and reports timing
  - Performance of the compiler output compared to the optimal assembly
  - Process automated for nightly update
- Benefits
  - Benchmark analysis provides direction for compiler improvements
  - Measurable way to track compiler progress
  - Gives developers immediate feedback on impact of potential optimizations
  - Enables competitive benchmarking





# Full Algorithms

- Provides large pieces of DSP code to validate - improves compiler robustness
- Tracks out of the box algorithm performance
- Tracks code size vs performance
- Run on large data sets
- Run on small data sets with many option combinations
- Adding more control applications to grade code size

# Algorithms

[http://www.micro.ti.com/asp/sds/c6x/metrics/release\\_results.html](http://www.micro.ti.com/asp/sds/c6x/metrics/release_results.html)

Normalized Application Performance



Normalized Application Size



# C6000 Benchmarks (on the TI Website)

| Algorithm                                                             | Used in                                           | Assembly Cycles | Assembly Time (μs) | C Cycles (Rel 4.0) | C Time (μs) | % Efficiency vs Hand Coded |
|-----------------------------------------------------------------------|---------------------------------------------------|-----------------|--------------------|--------------------|-------------|----------------------------|
| Block Mean Square Error<br><i>MSE of a 20 column image matrix</i>     | For motion compensation of image data             | 348             | 1.16               | 402                | 1.34        | <b>87%</b>                 |
| Codebook Search                                                       | CELP based voice coders                           | 977             | 3.26               | 961                | 3.20        | <b>100+%</b>               |
| Vector Max<br><i>40 element input vector</i>                          | Search Algorithms                                 | 61              | 0.20               | 59                 | 0.20        | <b>100+%</b>               |
| All-zero FIR Filter<br><i>40 samples, 10 coefficients</i>             | VSELP based voice coders                          | 238             | 0.79               | 280                | 0.93        | <b>85%</b>                 |
| Minimum Error Search<br><i>Table Size = 2304</i>                      | Search Algorithms                                 | 1185            | 3.95               | 1318               | 4.39        | <b>90%</b>                 |
| IIR Filter<br><i>16 coefficients</i>                                  | Filter                                            | 43              | 0.14               | 38                 | 0.13        | <b>100+%</b>               |
| IIR – cascaded biquads<br><i>10 Cascaded biquads (Direct Form II)</i> | Filter                                            | 70              | 0.23               | 75                 | 0.25        | <b>93%</b>                 |
| MAC<br><i>Two 40 samples vector</i>                                   | VSELP based voice coders                          | 61              | 0.20               | 58                 | 0.19        | <b>100+%</b>               |
| Vector Sum<br><i>Two 44 sample vectors</i>                            |                                                   | 51              | 0.17               | 47                 | 0.16        | <b>100+%</b>               |
| MSE<br><i>MSE between two 256 element vectors</i>                     | Mean Square Error computation in Vector Quantizer | 279             | 0.93               | 274                | 0.91        | <b>100+%</b>               |

TI 'C62x Compiler Performance Rel 4.0 : Execution Time in μs @ 300 MHz

16



# Compiler Status/Roadmap

- C6000
  - Industry's Best Tuned and Out of the Box C performance
  - 4.0 Met Internal Goals
    - ◆ 65% NatC, >80% OptC, >95% LinASM
  - Take C64x performance to C62x Levels
  - Continue to improve “out of the box” C performance