The C665x
are high performance fixed- and floating-point DSPs that are based on TIs KeyStone multicore
architecture. Incorporating the new and innovative C66x DSP core, this device can run at a core
speed of up to 1.25 GHz. For developers of a broad range of applications, both
C665x DSPs enable a
platform that is power-efficient and easy to use. In addition, the C665x DSPs are fully backward compatible with all
existing C6000™ family of fixed- and floating-point DSPs.
TIs KeyStone architecture provides a programmable platform integrating various
subsystems (C66x cores, memory subsystem, peripherals, and accelerators) and uses several
innovative components and techniques to maximize intradevice and interdevice communication that
lets the various DSP resources operate efficiently and seamlessly. Central to this architecture are
key components such as Multicore Navigator that allows for efficient data management between the
various device components. The TeraNet is a nonblocking switch fabric enabling fast and
contention-free internal data movement. The multicore shared memory controller allows access to
shared and external memory directly without drawing from switch fabric capacity.
For fixed-point use, the C66x core has 4× the multiply accumulate (MAC) capability of
C64x+ cores. In addition, the C66x core integrates floating-point capability and the per-core raw
computational performance is an industry-leading 40 GMACS per core and 20 GFLOPS per core (@1.25 GHz operating frequency). The C66x core can execute 8 single precision
floating-point MAC operations per cycle and can perform double- and mixed-precision operations and
is IEEE 754 compliant. The C66x core incorporates 90 new instructions (compared to the C64x+ core)
targeted for floating-point and vector math oriented processing. These enhancements yield sizeable
performance improvements in popular DSP kernels used in signal processing, mathematical, and image
acquisition functions. The C66x core is backward code-compatible with TIs previous generation
C6000 fixed- and floating-point DSP cores, ensuring software portability and shortened software
development cycles for applications migrating to faster hardware.
The C665x DSP integrates a large amount of on-chip memory. In addition to 32KB
of L1 program and data cache, 1024KB of dedicated memory can be configured as mapped RAM or cache.
The device also integrates 1024KB of Multicore Shared Memory that can be
used as a shared L2 SRAM and/or shared L3 SRAM. All L2 memories incorporate error detection
and error correction. For fast access to external memory, this device includes a 32-bit DDR-3
external memory interface (EMIF) running at a rate of 1333 MHz and has ECC DRAM support.
This family supports a number of high-speed standard interfaces including RapidIO ver 2, PCI Express Gen2, and Gigabit Ethernet. This family of DSPs also includes I2C, UART,
Multichannel Buffered Serial Port (McBSP), Universal Parallel Port (uPP), and a 16-bit asynchronous
EMIF, along with general-purpose CMOS IO. For high throughput, low latency
communication between devices or with an FPGA, a 40-Gbaud full-duplex interface called HyperLink is
included.
The C665x
devices have a complete set of development tools, which includes: an enhanced C compiler, an
assembly optimizer to simplify programming and scheduling, and a Windows® debugger interface for
visibility into source code execution.
TI’s KeyStone Multicore Architecture provides a high performance
structure for integrating RISC and DSP cores with application-specific coprocessors and I/O. The
KeyStone architecture is the first of its kind that provides adequate internal bandwidth for
nonblocking access to all processing cores, peripherals, coprocessors, and I/O. This internal
bandwidth is achieved with four main hardware elements: Multicore Navigator, TeraNet, Multicore
Shared Memory Controller, and HyperLink.
Multicore Navigator is an innovative packet-based manager that controls 8192 queues.
When tasks are allocated to the queues, Multicore Navigator provides hardware-accelerated dispatch
that directs tasks to the appropriate available hardware. The packet-based system on a chip (SoC)
uses the two Tbps capacity of the TeraNet switched central resource to move packets. The Multicore
Shared Memory Controller lets processing cores access shared memory directly without drawing from
the capacity of TeraNet, so packet movement cannot be blocked by memory access.
HyperLink provides a 40-Gbaud chip-level interconnect that lets SoCs
work in tandem. The low-protocol overhead and high throughput of HyperLink make an ideal interface
for chip-to-chip interconnections. Working with Multicore Navigator, HyperLink dispatches tasks to
tandem devices transparently and executes tasks as if they are running on local resources.
The C665x
are high performance fixed- and floating-point DSPs that are based on TIs KeyStone multicore
architecture. Incorporating the new and innovative C66x DSP core, this device can run at a core
speed of up to 1.25 GHz. For developers of a broad range of applications, both
C665x DSPs enable a
platform that is power-efficient and easy to use. In addition, the C665x DSPs are fully backward compatible with all
existing C6000™ family of fixed- and floating-point DSPs.
TIs KeyStone architecture provides a programmable platform integrating various
subsystems (C66x cores, memory subsystem, peripherals, and accelerators) and uses several
innovative components and techniques to maximize intradevice and interdevice communication that
lets the various DSP resources operate efficiently and seamlessly. Central to this architecture are
key components such as Multicore Navigator that allows for efficient data management between the
various device components. The TeraNet is a nonblocking switch fabric enabling fast and
contention-free internal data movement. The multicore shared memory controller allows access to
shared and external memory directly without drawing from switch fabric capacity.
For fixed-point use, the C66x core has 4× the multiply accumulate (MAC) capability of
C64x+ cores. In addition, the C66x core integrates floating-point capability and the per-core raw
computational performance is an industry-leading 40 GMACS per core and 20 GFLOPS per core (@1.25 GHz operating frequency). The C66x core can execute 8 single precision
floating-point MAC operations per cycle and can perform double- and mixed-precision operations and
is IEEE 754 compliant. The C66x core incorporates 90 new instructions (compared to the C64x+ core)
targeted for floating-point and vector math oriented processing. These enhancements yield sizeable
performance improvements in popular DSP kernels used in signal processing, mathematical, and image
acquisition functions. The C66x core is backward code-compatible with TIs previous generation
C6000 fixed- and floating-point DSP cores, ensuring software portability and shortened software
development cycles for applications migrating to faster hardware.
The C665x DSP integrates a large amount of on-chip memory. In addition to 32KB
of L1 program and data cache, 1024KB of dedicated memory can be configured as mapped RAM or cache.
The device also integrates 1024KB of Multicore Shared Memory that can be
used as a shared L2 SRAM and/or shared L3 SRAM. All L2 memories incorporate error detection
and error correction. For fast access to external memory, this device includes a 32-bit DDR-3
external memory interface (EMIF) running at a rate of 1333 MHz and has ECC DRAM support.
This family supports a number of high-speed standard interfaces including RapidIO ver 2, PCI Express Gen2, and Gigabit Ethernet. This family of DSPs also includes I2C, UART,
Multichannel Buffered Serial Port (McBSP), Universal Parallel Port (uPP), and a 16-bit asynchronous
EMIF, along with general-purpose CMOS IO. For high throughput, low latency
communication between devices or with an FPGA, a 40-Gbaud full-duplex interface called HyperLink is
included.
The C665x
devices have a complete set of development tools, which includes: an enhanced C compiler, an
assembly optimizer to simplify programming and scheduling, and a Windows® debugger interface for
visibility into source code execution.
TI’s KeyStone Multicore Architecture provides a high performance
structure for integrating RISC and DSP cores with application-specific coprocessors and I/O. The
KeyStone architecture is the first of its kind that provides adequate internal bandwidth for
nonblocking access to all processing cores, peripherals, coprocessors, and I/O. This internal
bandwidth is achieved with four main hardware elements: Multicore Navigator, TeraNet, Multicore
Shared Memory Controller, and HyperLink.
Multicore Navigator is an innovative packet-based manager that controls 8192 queues.
When tasks are allocated to the queues, Multicore Navigator provides hardware-accelerated dispatch
that directs tasks to the appropriate available hardware. The packet-based system on a chip (SoC)
uses the two Tbps capacity of the TeraNet switched central resource to move packets. The Multicore
Shared Memory Controller lets processing cores access shared memory directly without drawing from
the capacity of TeraNet, so packet movement cannot be blocked by memory access.
HyperLink provides a 40-Gbaud chip-level interconnect that lets SoCs
work in tandem. The low-protocol overhead and high throughput of HyperLink make an ideal interface
for chip-to-chip interconnections. Working with Multicore Navigator, HyperLink dispatches tasks to
tandem devices transparently and executes tasks as if they are running on local resources.