The TMS320C64x DSPs (including the SMJ320C6414, SMJ320C6415, and SMJ320C6416 devices) are the
highest-performance fixed-point DSP generation in the TMS320C6000 DSP platform. The TMS320C64x
(C64x) device is based on the second-generation high-performance, advanced VelociTI
very-long-instruction-word (VLIW) architecture (VelociTI.2) developed by Texas Instruments (TI), making
these DSPs an excellent choice for multichannel and multifunction applications. The C64x is a
code-compatible member of the C6000 DSP platform.
With performance of up to 5760 million instructions per second (MIPS) at a clock rate of 720 MHz, the C64x
devices offer cost-effective solutions to high-performance DSP programming challenges. The C64x DSPs
possess the operational flexibility of high-speed controllers and the numerical capability of array processors.
The C64x DSP core processor has 64 general-purpose registers of 32-bit word length and eight highly
independent functional unitstwo multipliers for a 32-bit result and six arithmetic logic units (ALUs) with
VelociTI.2 extensions. The VelociTI.2 extensions in the eight functional units include new instructions to
accelerate the performance in key applications and extend the parallelism of the VelociTI architecture. The
C64x can produce four 32-bit multiply-accumulates (MACs) per cycle for a total of 2400 million MACs per
second (MMACS), or eight 8-bit MACs per cycle for a total of 4800 MMACS. The C64x DSP also has
application-specific hardware logic, on-chip memory, and additional on-chip peripherals similar to the other
C6000 DSP platform devices.
The C6416 device has two high-performance embedded coprocessors [Viterbi Decoder Coprocessor (VCP) and Turbo Decoder Coprocessor (TCP)] that significantly speed up channel-decoding operations on-chip. The VCP operating at CPU clock divided-by-4 can decode over 500 7.95-Kbps adaptive multi-rate (AMR) [K = 9, R = 1/3] voice channels. The VCP supports constraint lengths K = 5, 6, 7, 8, and 9, rates R = 1/2, 1/3, and 1/4, and flexible polynomials, while generating hard decisions or soft decisions. The TCP operating at CPU clock divided-by-2 can decode up to thirty-six 384-Kbps or six 2-Mbps turbo encoded channels (assuming 6 iterations). The TCP implements the max*log-map algorithm and is designed to support all polynomials and rates required by Third-Generation Partnership Projects (3GPP and 3GPP2), with fully programmable frame length and turbo interleaver. Decoding parameters such as the number of iterations and stopping criteria are also programmable. Communications between the VCP/TCP and the CPU are carried out through the EDMA controller.
The C64x uses a two-level cache-based architecture and has a powerful and diverse set of peripherals. The Level 1 program cache (L1P) is a 128-Kbit direct mapped cache and the Level 1 data cache (L1D) is a 128-Kbit 2-way set-associative cache. The Level 2 memory/cache (L2) consists of an 8-Mbit memory space that is shared between program and data space. L2 memory can be configured as mapped memory or combinations of cache (up to 256K bytes) and mapped memory. The peripheral set includes three multichannel buffered serial ports (McBSPs); an 8-bit Universal Test and Operations PHY Interface for Asynchronous Transfer Mode (ATM) Slave [UTOPIA Slave] port (C6415/C6416 only); three 32-bit general-purpose timers; a user-configurable 16-bit or 32-bit host-port interface (HPI16/HPI32); a peripheral component interconnect (PCI) [C6415/C6416 only]; a general-purpose input/output port (GPIO) with 16 GPIO pins; and two glueless external memory interfaces (64-bit EMIFA and 16-bit EMIFB), both of which are capable of interfacing to synchronous and asynchronous memories and peripherals.
The C64x has a complete set of development tools which includes: an advanced C compiler with C64x-specific enhancements, an assembly optimizer to simplify programming and scheduling, and a Windows debugger interface for visibility into source code execution.