The C64x+™ DSPs (including the SM320C6455-EP device) are the highest-performance fixed-point DSP
generation in the C6000™ DSP platform. The C6455 device is based on the third-generation
high-performance, advanced VelociTI™ very-long-instruction-word (VLIW) architecture developed by
Texas Instruments (TI), making these DSPs an excellent choice for applications including video and
telecom infrastructure, imaging/medical, and wireless infrastructure (WI). The C64x+™ devices are
upward code-compatible from previous devices that are part of the C6000™ DSP platform.
Based on 90-nm process technology and with performance of up to 9600 million instructions per second
(MIPS) [or 9600 16 bit MMACs per cycle] at a 1.2-GHz clock rate, the C6455 device offers cost-effective
solutions to high-performance DSP programming challenges. The C6455 DSP possesses the operational
flexibility of high-speed controllers and the numerical capability of array processors.
The C64x+ DSP core employs eight functional units, two register files, and two data paths. Like the earlier
C6000 devices, two of these eight functional units are multipliers or .M units. Each C64x+ .M unit doubles
the multiply throughput versus the C64x core by performing four 16 bit x 16 bit multiply-accumulates
(MACs) every clock cycle. Thus, eight 16 bit x 16 bit MACs can be executed every cycle on the C64x+
core. At a 1.2-GHz clock rate, this means 9600 16 bit MMACs can occur every second. Moreover, each
multiplier on the C64x+ core can compute one 32 bit x 32 bit MAC or four 8 bit x 8 bit MACs every clock
The C6455 device includes Serial RapidIO. This high bandwidth peripheral dramatically improves system
performance and reduces system cost for applications that include multiple DSPs on a board, such as
video and telecom infrastructures and medical/imaging.
The C6455 DSP integrates a large amount of on-chip memory organized as a two-level memory system.
The level-1 (L1) program and data memories on the C6455 device are 32KB each. This memory can be
configured as mapped RAM, cache, or some combination of the two. When configured as cache, L1
program (L1P) is a direct mapped cache where as L1 data (L1D) is a two-way set associative cache. The
level 2 (L2) memory is shared between program and data space and is 2096KB in size. L2 memory also
can be configured as mapped RAM, cache, or some combination of the two. The C64x+ Megamodule also
has a 32 bit peripheral configuration (CFG) port, an internal DMA (IDMA) controller, a system component
with reset/boot control, interrupt/exception control, a power-down control, and a free-running 32 bit timer
for time stamp.
The peripheral set includes: an inter-integrated circuit bus module (I2C); two multichannel buffered serial
ports (McBSPs); an 8 bit Universal Test and Operations PHY Interface for Asynchronous Transfer Mode
(ATM) Slave [UTOPIA Slave] port; two 64 bit general-purpose timers (also configurable as four 32 bit
timers); a user-configurable 16 bit or 32 bit host-port interface (HPI16/HPI32); a peripheral component
interconnect (PCI); a 16-pin general-purpose input/output port (GPIO) with programmable interrupt/event
generation modes; an 10/100/1000 Ethernet media access controller (EMAC), which provides an efficient
interface between the C6455 DSP core processor and the network; a management data input/output
(MDIO) module (also part of the EMAC) that continuously polls all 32 MDIO addresses in order to
enumerate all PHY devices in the system; a glueless external memory interface (64 bit EMIFA), which is
capable of interfacing to synchronous and asynchronous peripherals; and a 32 bit DDR2 SDRAM
The I2C ports on the C6455 allow the DSP to easily control peripheral devices and communicate with a
host processor. In addition, the standard multichannel buffered serial port (McBSP) may be used to
communicate with serial peripheral interface (SPI) mode peripheral devices.
The C6455 device has two high-performance embedded coprocessors [enhanced Viterbi Decoder
Coprocessor (VCP2) and enhanced Turbo Decoder Coprocessor (TCP2)] that significantly speed up
channel-decoding operations on-chip. The VCP2 operating at CPU clock divided-by-3 can decode over
694 7.95-Kbps adaptive multi-rate (AMR) [K = 9, R = 1/3] voice channels. The VCP2 supports constraint
lengths K = 5, 6, 7, 8, and 9, rates R = 3/4, 1/2, 1/3, 1/4, and 1/5 and flexible polynomials, while
generating hard decisions or soft decisions. The TCP2 operating at CPU clock divided-by-3 can decode
up to fifty 384-Kbps or eight 2-Mbps turbo encoded channels (assuming 6 iterations). The TCP2
implements the max*log-map algorithm and is designed to support all polynomials and rates required by
Third-Generation Partnership Projects (3GPP and 3GPP2), with fully programmable frame length and
turbo interleaver. Decoding parameters such as the number of iterations and stopping criteria are also
programmable. Communications between the VCP2/TCP2 and the CPU are carried out through the
The C6455 has a complete set of development tools which includes: a new C compiler, an assembly
optimizer to simplify programming and scheduling, and a Windows® debugger interface for visibility into
source code execution.