

# CPU, Memory and Peripheral System

# TMS320C64x<sup>™</sup>DSP Technical Brief

# The World's Fastest DSPs Complemented by a High-Performance Memory and Peripheral System

The TMS320C64x<sup>™</sup> DSP core scales operating speeds beyond 1 GHz and achieves 10X performance improvements over the industry's previous DSP performance leader, the TMS320C62x<sup>™</sup> DSP. Chips in development couple this processing performance with a new memory and peripheral system designed to accelerate realtime throughput for higher system performance. The C64x<sup>™</sup> DSP core provides greater overall efficiency for demanding applications such as digital subscriber line

access multiplexers (DSLAMs), broadband video transcoding, streaming video servers, high-speed raster image processing (RIP) engines and network cameras. Full object code compatibility with existing C62x<sup>TM</sup> DSPs allows system developers to work on next-generation C64x<sup>TM</sup> DSP

# Highlights of the C64x™ DSP Core

designs today.

- VelociTI.2<sup>TM</sup> architecture extensions with new instructions to accelerate performance in key applications
- Increased parallelism with dual 16-bit or quad 8-bit operations, and two 16 x 16 bit multiplies or four 8 x 8 bit multiplies

# **Key Features**

- The world's highest performance DSP core, scalable to 1.1 GHz and beyond.
- New two-level cache supports the high-performance C64x<sup>™</sup> DSP core.
- Enhanced Direct Memory Access (DMA) provides more than 2 GB of sustained bandwidth.
- Synchronous External Memory Interfaces (EMIFs) and Host Interface provide over 1.8 GB of bandwidth.
- Complete software compatibility with the programmable TMS320C6000™ DSP platform.
  - Packed data processing for formatting data within registers so that instructions can operate directly
  - Initial devices are expected to operate at 600 - 800 MHz with scalable performance to over 1.1 GHz
  - Improved orthogonality with frequently used instructions available in more functional units
  - Double the bandwidth resulting from more registers, wider load/store data paths and enlarged 2-level cache
  - Completely software compatible with C62x DSPs

# TMS320C64x<sup>™</sup> DSP Initial Implementation



The  $C64x^{TM}$  DSP core couples its processing performance with a new memory and peripheral system designed to accelerate real-time throughput for higher system performance.

# A New Memory and Peripheral System That Measures Up

The new C64x<sup>™</sup> DSP memory and peripheral system includes a variety of features designed to help developers maximize the many performance advantages of the C64x DSP core:

#### L1/L2 Cache

The two-level cache has been scaled for initial implementation to support the high performance of the core, with 16 Kbytes each in Level 1 data and Level 1 program caches and 128 Kbytes in the unified Level 2 cache. For applications that require greater data determinism, the four associative sets or "ways" of the Level 2 cache can be redefined individually as blocks of memory with fixed addressing.

#### Enhanced DMA

A 32-channel DMA controller with a highly efficient transfer engine provides more than 2 GB/sec of sustained bandwidth, resulting in faster system performance. Thirty-two channels can be programmed to perform one or more of these tasks in the background of core program execution. Channels can be configured to run continuously throughout the device's entire operation, with only an initial configuration required. Each independently synchronized channel has a dedicated programmable parameter set. The user can configure address modification and stride independently for source and destination. Each channel can be configured to perform fixed. one-dimensional or two-dimensional transfers. Two-dimensional transfers allow automatic interleaving and de-interleaving of data streams and buffers, as well as the movement of sections of 2D



The C64x<sup>TM</sup> DSP L1/L2 cache sustains core performance at high clock rates.

images. A total of 85 parameter sets are available and allow sophisticated linking of transfers. This flexibility permits auto-initializing circular buffers and the background movement of complex data structures.

#### Three External Buses

A 64-bit synchronous External Memory Interface (EMIF), a 16-bit secondary EMIF for peripherals, and a 32-bit Host Port Interface provide over 1.8 GB of bandwidth on initial implementations. The EMIFs can be clocked independently of the CPU, allowing them to support a wide variety of advanced synchronous and asynchronous external memory devices, including PC100 and PC 133 SDRAMs and various synchronous SRAM standards.

# Multi-channel Buffered Serial Ports

Three multi-channel buffered serial ports (McBSPs) support a variety of audio and telecom standards. One hundred twenty-eight independently selectable time slots provide full connectivity to an

entire ST-Bus span of telephony channels. Direct connect to the ST-Bus allows an easy interface to a variety of H.110/100 and T1/E1 framing devices. The highly flexible and programmable framing, clocking, and baud rate of the McBSPs allow direct interface to multiple high performance audio codecs, including those supporting the AC97 and IIS standards.

#### Scalable to 1.1 GHz

The C64x DSP core uses advanced process technologies (0.15 micron for the initial devices, moving to 0.1 micron in the future) plus innovative logic circuit and design methodologies that focus on minimal wire lengths and low gate counts per clock. Compared to traditional approaches, there are only half the switching wires and transistors needed for the same amount of work. As a result, the C64x DSP core can achieve incredibly high performance — up to 1.1 GHz.

|                        | SOFTWARE COMPATIBLE                                                    |                                                                                   |             |
|------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------|-------------|
|                        | TMS320C62x <sup>™</sup> DSP Core<br>VelociTI <sup>™</sup> Architecture | TMS320C64x <sup>™</sup> DSP Core  VelociTI.2 <sup>™</sup> Architecture Extensions | Improvement |
| MHz                    | 150-300                                                                | 600-1100                                                                          | 4X          |
| MIPS                   | 1200-2400                                                              | 4800-8800                                                                         | 4X          |
| 16-bit MMACs           | 300-600                                                                | 2400-4400                                                                         | 8X          |
| B-bit MMACs            | 300-600                                                                | 4800-8800                                                                         | 16X         |
| Communications         | General                                                                | Special purpose instructions                                                      | 8X          |
| maging                 | General                                                                | Special purpose instructions                                                      | 15X         |
| Code size<br>reduction |                                                                        | Advanced instruction packing                                                      | 25%*        |

\* Typical compiler kernels

Overall, the  $C64x^{\text{TM}}$  DSP core, with advanced VelociTI.2<sup>TM</sup> VLIW architecture extensions, provides ten times the performance of the industry-leading  $C62x^{\text{TM}}$  DSP core.

#### Unbeatable Performance

The TMS320C6000<sup>™</sup> DSP platform's 1st and 2nd generation comparison chart summarizes the performance advantages of the C64x<sup>™</sup> DSP core with TI's advanced VelociTI.2<sup>™</sup> architecture extensions.

# **Application Benchmarks**

Performance benchmarks of a 750 MHz C64x DSP core against the industry leading 300 MHz C62x<sup>™</sup> DSP core show that the C64x DSP core achieves far better performance than would be expected on the basis of clock rate alone. This is the result of hardware enhancements, plus the increased performance of the C6000<sup>™</sup> DSP platform compiler. To view performance benchmarks visit: www.ti.com/sc/c6000benchmarks

## **Digital Communications**

The C64x DSP core achieves performance improvements up to 12X the performance of the C62x DSP core on key routines:

Filtering –
 The 5X improvement on 16-bit data is largely the result of

- increased parallelism for multiply accumulate operations.
- Reed Solomon Decode –
   The up to 12X improvement here results in a large part from use of the new Galois Field Multiply instruction.
- Viterbi Decode (GSM) This 7X improvement stems from the additional registers available for state variables and the new MAX and MIN instructions.
- FFT The 5X improvement here results from the dual 16-bit architecture of the C64x DSP core, plus the new Bit Reverse instruction.

# Imaging and Video

For imaging and video applications, the C64x DSP core achieves improvements up to 19X compared to the C62x DSP core on key routines:

- IDCT This 5X improvement results from the dual 16-bit mathematical capabilities of the C64x DSP core.
- Motion Estimation The 19X improvement here comes not only from quad 8-bit support, but also from non-aligned load

- support for loops on byte or half-word boundaries and the new SUBABS4 instruction.
- Morphology The C64x DSP core provides more than 15X the performance for grayscale operations due to the presence of logical instructions in the .D unit, increased parallelism, and greater scheduling flexibility provided by additional registers.



Benchmarks of typical routines used in communications applications show that the  $C64x^{TM}$  DSP core is up to 12 times faster than the  $C62x^{TM}$  DSP core.



Benchmarks of typical routines used in imaging and video applications show that the C64x<sup>TM</sup> DSP core is up to 19 times faster than the C62x<sup>TM</sup> DSP core.



The  $C64x^{TM}$  DSP core builds on a solid, successful foundation of  $C62x^{TM}$  DSP products and offers a software-compatible path to the future.

# Get To Market Faster with the Industry's Best Development Environment

# eXpressDSP™ Real-Time Software Technology

For rapid product development, the C64x<sup>™</sup> DSP core is supported by eXpressDSP<sup>™</sup> Real-Time Software Technology that slashes development time by well over 50 percent while improving product robustness. Made up of four key components, eXpressDSP real-time software technology enables developers to tap into the full power of TI DSPs:

# Code Composer Studio™ IDE -

Integrated and powerful code development tools for performing visualization and real-time analysis of data reduces development time from weeks to minutes. The C compiler, assembly optimizer,

simulator, and linker deliver optimum DSP efficiency and performance. An open, plug-in architecture for third party tools makes Code Composer Studio™ easily adaptable to different development needs. Benchmark results for the C6000™ DSP compiler and other information is available at: www.ti.com/sc/c6000compiler

# DSP/BIOS -

A real-time software kernel provides the run time target software necessary to support any DSP application. It reduces programming effort by providing device drivers, I/O, task, and buffer routines.

# TMS320™ DSP Algorithm Standard -

Another TI industry first, standards for DSP application interoperability include not only general programming but also DSP-specif-

Digitally printed by DTPros

ic guidelines to ensure hardware and software compatibility, allowing maximum software reuse.

#### TI DSP Third Party Network -

eXpressDSP Real-Time Software Technology is used throughout TI's DSP third party network – the world's largest. This network offers hundreds of reusable and modular software algorithms, plug-ins, and products that can accelerate your design cycle and get you to market faster.

# For More Information

To find out more about how you can harness the power of the TMS320C64x<sup>™</sup> DSP generation and its new memory and peripheral system, please contact your local TI field sales office, or visit the TI Web site at: www.ti.com/sc/c64xupdate