SPRUGR9H November   2010  – April 2015 66AK2E05 , 66AK2H06 , 66AK2H12 , 66AK2H14 , 66AK2L06 , AM5K2E02 , AM5K2E04 , SM320C6678-HIREL , TMS320C6652 , TMS320C6654 , TMS320C6655 , TMS320C6657 , TMS320C6670 , TMS320C6671 , TMS320C6672 , TMS320C6674 , TMS320C6678

 

  1.   Preface
    1.     About This Manual
    2.     Trademarks
    3.     Notational Conventions
    4.     Related Documentation from Texas Instruments
  2. 1Introduction
    1. 1.1  Terminology Used in This Document
    2. 1.2  KeyStone I Features
    3. 1.3  KeyStone I Functional Block Diagram
    4. 1.4  KeyStone II Changes to QMSS
    5. 1.5  KeyStone II QMSS Modes of Use
      1. 1.5.1 Shared Mode
      2. 1.5.2 Split Mode
    6. 1.6  Overview
    7. 1.7  Queue Manager
    8. 1.8  Packet DMA (PKTDMA)
    9. 1.9  Navigator Cloud
    10. 1.10 Virtualization
    11. 1.11 ARM-DSP Shared Use
    12. 1.12 PDSP Firmware
  3. 2Operational Concepts
    1. 2.1 Packets
    2. 2.2 Queues
      1. 2.2.1 Packet Queuing
      2. 2.2.2 Packet De-queuing
      3. 2.2.3 Queue Proxy
    3. 2.3 Queue Types
      1. 2.3.1 Transmit Queues
      2. 2.3.2 Transmit Completion Queues
      3. 2.3.3 Receive Queues
      4. 2.3.4 Free Descriptor Queues (FDQ)
        1. 2.3.4.1 Host Packet Free Descriptors
        2. 2.3.4.2 Monolithic Free Descriptors
      5. 2.3.5 Queue Pend Queues
    4. 2.4 Descriptors
      1. 2.4.1 Host Packet
      2. 2.4.2 Host Buffer
      3. 2.4.3 Monolithic Packet
    5. 2.5 Packet DMA
      1. 2.5.1 Channels
      2. 2.5.2 RX Flows
    6. 2.6 Packet Transmission Overview
    7. 2.7 Packet Reception Overview
    8. 2.8 ARM Endianess
  4. 3Descriptor Layouts
    1. 3.1 Host Packet Descriptor
    2. 3.2 Host Buffer Descriptor
    3. 3.3 Monolithic Descriptor
  5. 4Registers
    1. 4.1 Queue Manager
      1. 4.1.1 Queue Configuration Region
        1. 4.1.1.1 Revision Register (0x00000000)
        2. 4.1.1.2 Queue Diversion Register (0x00000008)
        3. 4.1.1.3 Linking RAM Region 0 Base Address Register (0x0000000C)
        4. 4.1.1.4 Linking RAM Region 0 Size Register (0x00000010)
        5. 4.1.1.5 Linking RAM Region 1 Base Address Register (0x00000014)
        6. 4.1.1.6 Free Descriptor/Buffer Starvation Count Register N (0x00000020 + N×4)
      2. 4.1.2 Queue Status RAM
      3. 4.1.3 Descriptor Memory Setup Region
        1. 4.1.3.1 Memory Region R Base Address Register (0x00000000 + 16×R)
        2. 4.1.3.2 Memory Region R Start Index Register (0x00000004 + 16×R)
        3. 4.1.3.3 Memory Region R Descriptor Setup Register (0x00000008 + 16×R)
      4. 4.1.4 Queue Management/Queue Proxy Regions
        1. 4.1.4.1 Queue N Register A (0x00000000 + 16×N)
        2. 4.1.4.2 Queue N Register B (0x00000004 + 16×N)
        3. 4.1.4.3 Queue N Register C (0x00000008 + 16×N)
        4. 4.1.4.4 Queue N Register D (0x0000000C + 16×N)
      5. 4.1.5 Queue Peek Region
        1. 4.1.5.1 Queue N Status and Configuration Register A (0x00000000 + 16×N)
        2. 4.1.5.2 Queue N Status and Configuration Register B (0x00000004 + 16×N)
        3. 4.1.5.3 Queue N Status and Configuration Register C (0x00000008 + 16×N)
        4. 4.1.5.4 Queue N Status and Configuration Register D (0x0000000C + 16×N)
    2. 4.2 Packet DMA
      1. 4.2.1 Global Control Registers Region
        1. 4.2.1.1 Revision Register (0x00)
        2. 4.2.1.2 Performance Control Register (0x04)
        3. 4.2.1.3 Emulation Control Register (0x08)
        4. 4.2.1.4 Priority Control Register (0x0C)
        5. 4.2.1.5 QMn Base Address Register (0x10, 0x14, 0x18, 0x1c)
      2. 4.2.2 TX DMA Channel Configuration Region
        1. 4.2.2.1 TX Channel N Global Configuration Register A (0x000 + 32×N)
        2. 4.2.2.2 TX Channel N Global Configuration Register B (0x004 + 32×N)
      3. 4.2.3 RX DMA Channel Configuration Region
        1. 4.2.3.1 RX Channel N Global Configuration Register A (0x000 + 32×N)
      4. 4.2.4 RX DMA Flow Configuration Region
        1. 4.2.4.1 RX Flow N Configuration Register A (0x000 + 32×N)
        2. 4.2.4.2 RX Flow N Configuration Register B (0x004 + 32×N)
        3. 4.2.4.3 RX Flow N Configuration Register C (0x008 + 32×N)
        4. 4.2.4.4 RX Flow N Configuration Register D (0x00C + 32×N)
        5. 4.2.4.5 RX Flow N Configuration Register E (0x010 + 32×N)
        6. 4.2.4.6 RX Flow N Configuration Register F (0x014 + 32×N)
        7. 4.2.4.7 RX Flow N Configuration Register G (0x018 + 32×N)
        8. 4.2.4.8 RX Flow N Configuration Register H (0x01C + 32×N)
      5. 4.2.5 TX Scheduler Configuration Region
        1. 4.2.5.1 TX Channel N Scheduler Configuration Register (0x000 + 4×N)
    3. 4.3 QMSS PDSPs
      1. 4.3.1 Descriptor Accumulation Firmware
        1. 4.3.1.1 Command Buffer Interface
        2. 4.3.1.2 Global Timer Command Interface
        3. 4.3.1.3 Reclamation Queue Command Interface
        4. 4.3.1.4 Queue Diversion Command Interface
      2. 4.3.2 Quality of Service Firmware
        1. 4.3.2.1 QoS Algorithms
          1. 4.3.2.1.1 Modified Token Bucket Algorithm
        2. 4.3.2.2 Command Buffer Interface
        3. 4.3.2.3 QoS Firmware Commands
        4. 4.3.2.4 QoS Queue Record
        5. 4.3.2.5 QoS Cluster Record
        6. 4.3.2.6 RR-Mode QoS Cluster Record
        7. 4.3.2.7 SRIO Queue Monitoring
          1. 4.3.2.7.1 QoS SRIO Queue Monitoring Record
      3. 4.3.3 Open Event Machine Firmware
      4. 4.3.4 Interrupt Operation
        1. 4.3.4.1 Interrupt Handshaking
        2. 4.3.4.2 Interrupt Processing
        3. 4.3.4.3 Interrupt Generation
        4. 4.3.4.4 Stall Avoidance
      5. 4.3.5 QMSS PDSP Registers
        1. 4.3.5.1 Control Register (0x00000000)
        2. 4.3.5.2 Status Register (0x00000004)
        3. 4.3.5.3 Cycle Count Register (0x0000000C)
        4. 4.3.5.4 Stall Count Register (0x00000010)
    4. 4.4 QMSS Interrupt Distributor
      1. 4.4.1 INTD Register Region
        1. 4.4.1.1  Revision Register (0x00000000)
        2. 4.4.1.2  End Of Interrupt (EOI) Register (0x00000010)
        3. 4.4.1.3  Status Register 0 (0x00000200)
        4. 4.4.1.4  Status Register 1 (0x00000204)
        5. 4.4.1.5  Status Register 2 (0x00000208)
        6. 4.4.1.6  Status Register 3 (0x0000020c)
        7. 4.4.1.7  Status Register 4 (0x00000210)
        8. 4.4.1.8  Status Clear Register 0 (0x00000280)
        9. 4.4.1.9  Status Clear Register 1 (0x00000284)
        10. 4.4.1.10 Status Clear Register 4 (0x00000290)
        11. 4.4.1.11 Interrupt N Count Register (0x00000300 + 4xN)
  6. 5Mapping Information
    1. 5.1 Queue Maps
    2. 5.2 Interrupt Maps
      1. 5.2.1 KeyStone I TCI661x, C6670, C665x devices
      2. 5.2.2 KeyStone I TCI660x, C667x devices
      3. 5.2.3 KeyStone II devices
    3. 5.3 Memory Maps
      1. 5.3.1 QMSS Register Memory Map
      2. 5.3.2 KeyStone I PKTDMA Register Memory Map
      3. 5.3.3 KeyStone II PKTDMA Register Memory Map
    4. 5.4 Packet DMA Channel Map
  7. 6Programming Information
    1. 6.1 Programming Considerations
      1. 6.1.1 System Planning
      2. 6.1.2 Notification of Completed Work
    2. 6.2 Example Code
      1. 6.2.1 QMSS Initialization
      2. 6.2.2 PKTDMA Initialization
      3. 6.2.3 Normal Infrastructure DMA with Accumulation
      4. 6.2.4 Bypass Infrastructure notification with Accumulation
      5. 6.2.5 Channel Teardown
    3. 6.3 Programming Overrides
    4. 6.4 Programming Errors
    5. 6.5 Questions and Answers
  8. AExample Code Utility Functions
  9. BExample Code Types
  10. CExample Code Addresses
    1. C.1 KeyStone I Addresses:
    2. C.2 KeyStone II Addresses:
  11.   Revision History

Questions and Answers

This section contains frequently asked questions and responses to them.

Question:

How does descriptor accumulator polling work?

All high-priority accumulator channels are continuously polled. For the 48 channel accumulator, after each loop through all of the high-priority channels, one of the low-priority channels is polled. For the 16 and 32 channel versions, there is no high and low mix of channels, so they all scan at full speed. The timers are used only for the interrupt pacing modes.

Question:

How should RX flows and channels be used?

The first step is to recognize that the RX DMA is driven from the RX streaming I/F. Each transaction for a packet is received from the streaming I/F, and contains a channel number. This number will not change for the entire packet’s reception (and once a packet starts on a channel, that channel is dedicated to that packet until it completes). The channel number is determined by the programming of the peripheral (because it is the peripheral that drives the RX streaming I/F). For the infrastructure PKTDMA, it is always the same as the TX channel number because the streaming I/F is connected in loopback.

Next, the initial transmission of the packet to the RX DMA contains some sideband data. One of these parms is the flow_id. Because the flow_id comes as parametric data with the packet, it can change packet by packet, and is not related to channel number in any way. This is the reason why there are more flows than channels — in case there is a need to use more than one flow for a stream of packets.

How the flow_id is determined also varies by peripheral. The peripherals provide a mechanism to set the flow_id; it may be a register, or a value in protocol-specific data. For the infrastructure PKTDMA, it is passed from the TX side using the SRC_TAG_LO field in the TX descriptor. But it is a value that you choose.

An example from the LTE Demo project: At one point in the processing, an output of the FFTC block is in this form: A B C B A, where A is a range of bytes that are not needed, B contains data to be processed on core 1, and C contains data to be processed on core 2. The setup for this is done so that all data lands in the LL2 of the core that needs to process it:

First, FDQs are constructed (at initialization) with host descriptors and buffers in each core’s LL2.

Next, (at runtime, prior to the FFTC running) another highly specialized FDQ is built such that it is loaded with exactly five descriptors:

  • The first and fifth point to a single garbage buffer – for part A
  • The second and fourth point to buffers in core 1’s LL2 – for part B
  • The third points to a buffer in core 2’s LL2 – for part C

Descriptors 2, 3, and 4 are popped from the LL2 FDQs from cores 1, 2, and 1, respectively. Each buffer size is set to the exact number of bytes for the A, B, and C fields so that as the RX DMA processes the data, each descriptor in this specialized FDQ is popped at the right time, loaded with the correct data, and linked to the previous buffer to create a single host descriptor.

The RX flow setting is simple: One RX FDQ, and the RX destination queue will be an accumulation channel so that core 1 will be notified when the FFTC results are ready.

The choice of RX channel number and RX flow number for the FFTC is arbitrary — any of the four channels and eight flows is as good as any of the others. It depends on how the FFTC is programmed. When using the Multicore Navigator LLDs, they can determine which channel and flow to use (it will select one and pass back a handle to it).

That is a creative example of how to take the output of a peripheral and use Multicore Navigator resources to get the data to its destination(s) efficiently. The power of the Multicore Navigator is that this problem can be solved in many different ways.

Question:

How does a peripheral choose which RX channel to use?

Each Multicore Navigator peripheral has its own method:

  • QMSS and FFTC — the RX channel used is the same as the TX channel that drives it.
  • SRIO — The SRIO picks an RX channel from those that are enabled. See the Serial RapidIO (SRIO) for KeyStone Devices User Guide (SPRUGW1) for more details.
  • AIF2 — For DIO (WCDMA), RX channel must be channel number 128. For channels 0 through 127, the RX channel and RX flow numbers are always the same as the DB channel number.
  • PA — Packet classification information is associated with a destination queue and flow number. For example, the PA is instructed to match dest mac = x, dest IP = y, dest TCP port = z, and then it sends the packet to queue A using flow B. But the channels are hard mapped to streaming endpoints within submodules. These submodules are either PDSPs or else they are encryption/authentication blocks.

So, it is not a matter of the peripheral finding an enabled channel, but rather the peripheral using a specific channel whether or not it is enabled (and if not enabled, no data will pass through).

Question:

I’m interested in the case where the infrastructure PKTDMA is moving data from one core’s L2 to a second core’s L2. I don’t understand how the TX queue is tied to a TX channel for that case and how does that connect with the RX channel and queue?

With all the TX PKTDMAs, there is a one-to-one HW mapping between a queue number and a TX channel. In the QMSS case, queue 800 maps to TX channel 0, 801 to 1, etc. And, the QMSS loopback connection causes TX channel X to send data to RX channel X. So, to use channel 0, open (LLD terminology) queue 800, open TX channel 0, RX channel 0, an RX Flow, and whatever RX Q, RX FDQ, and TX FDQ queues that should be used. Note that the TX descriptor defines the TX FDQ (and RX Flow) to be used, and the RX Flow defines the RX Q and RX FDQ.

Once everything is properly initialized, all that must be done is pop a descriptor from the TX FDQ, fill it with data and push it to queue 800. It will flow through the TX DMA, the Streaming I/F, the RX DMA and be pushed to the RX Q. To make this transfer data from core A’s L2 to core B’s L2, the TX FDQ’s descriptors (or host buffers) must be in a QMSS memory region located in core A’s L2, and the RX FDQ’s descriptors (or host buffers) must be in a QMSS memory region located in core B’s L2.

Question:

Can I push multiple descriptor types to the same queue?

From the Queue Manager’s perspective, yes. The QM does not care what is stored at the descriptor addresses (it does not have to be a descriptor - but it is an error to push anything other than a valid descriptor through a PKTDMA). From the PKTDMA’s perspective, it depends. The TX DMA handles each packet separately, so there is no problem pushing host and monolithic descriptors into the same TX queue. On the RX side, each RX FDQ should be populated with one type of descriptor, because the RX DMA handles host and monolithic descriptors differently.

Question:

What happens when no configured RX packet queue has a packet that is large enough to hold the data?

It depends. For monolithic packet mode, the RX DMA assumes the descriptor is big enough, and will overwrite adjacent memory if it isn't. For Host packet mode, the RX DMA will keep popping descriptors and linking them together until it has buffered all the data or run out of descriptors (RX starvation).

Question:

What happens when the RX queue of the receiving flow is out of buffers?

This condition is called buffer starvation. The action of the RX DMA is configurable, depending on the setting of the rx_error_handling field of the RX Flow that is currently in use. If the field is clear, the packet will be dropped. If set, the RX DMA will re-try at the rate specified by the Performance Config Register.

Question:

What happens when a packet is received and the specified flow configuration is not set up? Is there any indication that the transmission did not succeed?

If the channel has been enabled, the RX DMA will use reset values for the RX flow if it has not been programmed. There is no error status; but a core can detect the problem in a couple of ways:

  1. Set a timeout and read the RX/RX FDQ descriptor counts,
  2. Use pacing mode last interrupt with the Accumulator and examine cases where the list is empty,
  3. Read the QMSS' INTD Status Register #4 to see if a starvation interrupt occurred (if the flow is using reset values, it will use queue 0 as the FDQ),
  4. Program a threshold for the RX queue(s) and read the Queue Status RAM.

Question:

Is there any situation that can cause packet transfer over infrastructure queues to cause packets to be delivered out-of-order or not at all (but following packets will be transferred)?

Out-of-order reception is not possible for the infrastructure PKTDMA, due to the loopback connection of the Streaming I/F. Dropped packets are possible due to errors on the RX side. Dropped packets can be detected by using the tag fields to send a sequence number through to the RX descriptor. Also, it is a function of the QoS firmware to drop packets in various situations.

Question:

Are peripherals limited by the 4 level TX DMA round robin scheduling?

No. It is true that the TX DMA by itself will starve lower priority levels if flooded by higher priority packets. However, it is possible for the IP to disable the higher priority channels at the Streaming I/F level, thereby allowing lower priority channels to run. The FFTC is a case of one IP that does this.

Question:

If I'm not going to use a Multicore Navigator feature, can I use its resources for other purposes?

In several cases, yes. For example, if you do not plan to use Low Priority Accumulation, you can use queues 0 to 511 as general purpose queues, and you can also use the QMSS INTD to generate events normally associated with the Low Priority Accumulator to the DSPs for sync barrier purposes. You only need to make sure that the feature (Low Priority Accumulation in this example) is not enabled or is programmed to use other queues.

Question:

Can I assign priorities to queues?

Not with the Queue Manager alone. The queue management of each queue is independent, and one queue does not affect another. The TX DMA allows a 4 level priority scheme to be imposed, and there are other methods for creating priority, such as external schedulers.

Question:

Should memory regions be specified in ascending order?

For KeyStone I, yes. Memory region base addresses must be set in ascending address order, i.e. region 0 must be at a lower address than region 1, region 1 must be at a lower address than region 2, etc. This requires extra planning when configuring regions in LL2, MSMC and DDR. This restriction does not apply to KeyStone II devices.

Question:

Is the mapping of accumulator channel, queue and interrupt event fixed or can it be modified?

The mapping of channels to events is fixed. The mapping of queue number to channel number is fixed only for any queue that drives a queue_pend signal. This means that accumulator queue numbers may be changed (the queues shown are the suggested mapping).

Question:

What is the point of the PKTDMA’s logical queue managers, and how can they be used?

The logical queue managers allow the PKTDMA to access queues (and thus descriptors) on other memory mapped devices, such as a second KeyStone device mapped via Hyperlink. The QMn Base Address Registers provide a VBUSM address that the PKTDMA uses as queue zero for that logical QM. The “Qmgr” field in descriptors and Rx Flow registers then specify which of the four logical queue managers is to be used (which really means which base address the PKTDMA uses for pushing and popping). Logical PKTDMAs can be used for creating logical groups of queues within the local physical QM, or a Hyperlink memory mapped QM, or in the case of K2K and K2H, to address the “other” physical QM in the QMSS subsystem (i.e. to allow Infra PKTDMA 1 to use QM2 and vice versa).