SPRUGR9H November   2010  – April 2015 66AK2E05 , 66AK2H06 , 66AK2H12 , 66AK2H14 , 66AK2L06 , AM5K2E02 , AM5K2E04 , SM320C6678-HIREL , TMS320C6652 , TMS320C6654 , TMS320C6655 , TMS320C6657 , TMS320C6670 , TMS320C6671 , TMS320C6672 , TMS320C6674 , TMS320C6678

 

  1.   Preface
    1.     About This Manual
    2.     Trademarks
    3.     Notational Conventions
    4.     Related Documentation from Texas Instruments
  2. 1Introduction
    1. 1.1  Terminology Used in This Document
    2. 1.2  KeyStone I Features
    3. 1.3  KeyStone I Functional Block Diagram
    4. 1.4  KeyStone II Changes to QMSS
    5. 1.5  KeyStone II QMSS Modes of Use
      1. 1.5.1 Shared Mode
      2. 1.5.2 Split Mode
    6. 1.6  Overview
    7. 1.7  Queue Manager
    8. 1.8  Packet DMA (PKTDMA)
    9. 1.9  Navigator Cloud
    10. 1.10 Virtualization
    11. 1.11 ARM-DSP Shared Use
    12. 1.12 PDSP Firmware
  3. 2Operational Concepts
    1. 2.1 Packets
    2. 2.2 Queues
      1. 2.2.1 Packet Queuing
      2. 2.2.2 Packet De-queuing
      3. 2.2.3 Queue Proxy
    3. 2.3 Queue Types
      1. 2.3.1 Transmit Queues
      2. 2.3.2 Transmit Completion Queues
      3. 2.3.3 Receive Queues
      4. 2.3.4 Free Descriptor Queues (FDQ)
        1. 2.3.4.1 Host Packet Free Descriptors
        2. 2.3.4.2 Monolithic Free Descriptors
      5. 2.3.5 Queue Pend Queues
    4. 2.4 Descriptors
      1. 2.4.1 Host Packet
      2. 2.4.2 Host Buffer
      3. 2.4.3 Monolithic Packet
    5. 2.5 Packet DMA
      1. 2.5.1 Channels
      2. 2.5.2 RX Flows
    6. 2.6 Packet Transmission Overview
    7. 2.7 Packet Reception Overview
    8. 2.8 ARM Endianess
  4. 3Descriptor Layouts
    1. 3.1 Host Packet Descriptor
    2. 3.2 Host Buffer Descriptor
    3. 3.3 Monolithic Descriptor
  5. 4Registers
    1. 4.1 Queue Manager
      1. 4.1.1 Queue Configuration Region
        1. 4.1.1.1 Revision Register (0x00000000)
        2. 4.1.1.2 Queue Diversion Register (0x00000008)
        3. 4.1.1.3 Linking RAM Region 0 Base Address Register (0x0000000C)
        4. 4.1.1.4 Linking RAM Region 0 Size Register (0x00000010)
        5. 4.1.1.5 Linking RAM Region 1 Base Address Register (0x00000014)
        6. 4.1.1.6 Free Descriptor/Buffer Starvation Count Register N (0x00000020 + N×4)
      2. 4.1.2 Queue Status RAM
      3. 4.1.3 Descriptor Memory Setup Region
        1. 4.1.3.1 Memory Region R Base Address Register (0x00000000 + 16×R)
        2. 4.1.3.2 Memory Region R Start Index Register (0x00000004 + 16×R)
        3. 4.1.3.3 Memory Region R Descriptor Setup Register (0x00000008 + 16×R)
      4. 4.1.4 Queue Management/Queue Proxy Regions
        1. 4.1.4.1 Queue N Register A (0x00000000 + 16×N)
        2. 4.1.4.2 Queue N Register B (0x00000004 + 16×N)
        3. 4.1.4.3 Queue N Register C (0x00000008 + 16×N)
        4. 4.1.4.4 Queue N Register D (0x0000000C + 16×N)
      5. 4.1.5 Queue Peek Region
        1. 4.1.5.1 Queue N Status and Configuration Register A (0x00000000 + 16×N)
        2. 4.1.5.2 Queue N Status and Configuration Register B (0x00000004 + 16×N)
        3. 4.1.5.3 Queue N Status and Configuration Register C (0x00000008 + 16×N)
        4. 4.1.5.4 Queue N Status and Configuration Register D (0x0000000C + 16×N)
    2. 4.2 Packet DMA
      1. 4.2.1 Global Control Registers Region
        1. 4.2.1.1 Revision Register (0x00)
        2. 4.2.1.2 Performance Control Register (0x04)
        3. 4.2.1.3 Emulation Control Register (0x08)
        4. 4.2.1.4 Priority Control Register (0x0C)
        5. 4.2.1.5 QMn Base Address Register (0x10, 0x14, 0x18, 0x1c)
      2. 4.2.2 TX DMA Channel Configuration Region
        1. 4.2.2.1 TX Channel N Global Configuration Register A (0x000 + 32×N)
        2. 4.2.2.2 TX Channel N Global Configuration Register B (0x004 + 32×N)
      3. 4.2.3 RX DMA Channel Configuration Region
        1. 4.2.3.1 RX Channel N Global Configuration Register A (0x000 + 32×N)
      4. 4.2.4 RX DMA Flow Configuration Region
        1. 4.2.4.1 RX Flow N Configuration Register A (0x000 + 32×N)
        2. 4.2.4.2 RX Flow N Configuration Register B (0x004 + 32×N)
        3. 4.2.4.3 RX Flow N Configuration Register C (0x008 + 32×N)
        4. 4.2.4.4 RX Flow N Configuration Register D (0x00C + 32×N)
        5. 4.2.4.5 RX Flow N Configuration Register E (0x010 + 32×N)
        6. 4.2.4.6 RX Flow N Configuration Register F (0x014 + 32×N)
        7. 4.2.4.7 RX Flow N Configuration Register G (0x018 + 32×N)
        8. 4.2.4.8 RX Flow N Configuration Register H (0x01C + 32×N)
      5. 4.2.5 TX Scheduler Configuration Region
        1. 4.2.5.1 TX Channel N Scheduler Configuration Register (0x000 + 4×N)
    3. 4.3 QMSS PDSPs
      1. 4.3.1 Descriptor Accumulation Firmware
        1. 4.3.1.1 Command Buffer Interface
        2. 4.3.1.2 Global Timer Command Interface
        3. 4.3.1.3 Reclamation Queue Command Interface
        4. 4.3.1.4 Queue Diversion Command Interface
      2. 4.3.2 Quality of Service Firmware
        1. 4.3.2.1 QoS Algorithms
          1. 4.3.2.1.1 Modified Token Bucket Algorithm
        2. 4.3.2.2 Command Buffer Interface
        3. 4.3.2.3 QoS Firmware Commands
        4. 4.3.2.4 QoS Queue Record
        5. 4.3.2.5 QoS Cluster Record
        6. 4.3.2.6 RR-Mode QoS Cluster Record
        7. 4.3.2.7 SRIO Queue Monitoring
          1. 4.3.2.7.1 QoS SRIO Queue Monitoring Record
      3. 4.3.3 Open Event Machine Firmware
      4. 4.3.4 Interrupt Operation
        1. 4.3.4.1 Interrupt Handshaking
        2. 4.3.4.2 Interrupt Processing
        3. 4.3.4.3 Interrupt Generation
        4. 4.3.4.4 Stall Avoidance
      5. 4.3.5 QMSS PDSP Registers
        1. 4.3.5.1 Control Register (0x00000000)
        2. 4.3.5.2 Status Register (0x00000004)
        3. 4.3.5.3 Cycle Count Register (0x0000000C)
        4. 4.3.5.4 Stall Count Register (0x00000010)
    4. 4.4 QMSS Interrupt Distributor
      1. 4.4.1 INTD Register Region
        1. 4.4.1.1  Revision Register (0x00000000)
        2. 4.4.1.2  End Of Interrupt (EOI) Register (0x00000010)
        3. 4.4.1.3  Status Register 0 (0x00000200)
        4. 4.4.1.4  Status Register 1 (0x00000204)
        5. 4.4.1.5  Status Register 2 (0x00000208)
        6. 4.4.1.6  Status Register 3 (0x0000020c)
        7. 4.4.1.7  Status Register 4 (0x00000210)
        8. 4.4.1.8  Status Clear Register 0 (0x00000280)
        9. 4.4.1.9  Status Clear Register 1 (0x00000284)
        10. 4.4.1.10 Status Clear Register 4 (0x00000290)
        11. 4.4.1.11 Interrupt N Count Register (0x00000300 + 4xN)
  6. 5Mapping Information
    1. 5.1 Queue Maps
    2. 5.2 Interrupt Maps
      1. 5.2.1 KeyStone I TCI661x, C6670, C665x devices
      2. 5.2.2 KeyStone I TCI660x, C667x devices
      3. 5.2.3 KeyStone II devices
    3. 5.3 Memory Maps
      1. 5.3.1 QMSS Register Memory Map
      2. 5.3.2 KeyStone I PKTDMA Register Memory Map
      3. 5.3.3 KeyStone II PKTDMA Register Memory Map
    4. 5.4 Packet DMA Channel Map
  7. 6Programming Information
    1. 6.1 Programming Considerations
      1. 6.1.1 System Planning
      2. 6.1.2 Notification of Completed Work
    2. 6.2 Example Code
      1. 6.2.1 QMSS Initialization
      2. 6.2.2 PKTDMA Initialization
      3. 6.2.3 Normal Infrastructure DMA with Accumulation
      4. 6.2.4 Bypass Infrastructure notification with Accumulation
      5. 6.2.5 Channel Teardown
    3. 6.3 Programming Overrides
    4. 6.4 Programming Errors
    5. 6.5 Questions and Answers
  8. AExample Code Utility Functions
  9. BExample Code Types
  10. CExample Code Addresses
    1. C.1 KeyStone I Addresses:
    2. C.2 KeyStone II Addresses:
  11.   Revision History

Modified Token Bucket Algorithm

Basic Operation

The modified token bucket algorithm allows each queue in a cluster to be assigned a fixed rate in bytes per time iteration (typically 25 µs, but is configurable). This is called iteration credit. In addition, there is a maximum number of bytes that can be retained as credit against a future traffic burst. This retained credit is called total credit. The maximum limit is called maximum credit.

Iteration credit is added to a queue's total credit at the start of each sampling period. While total credit remains above 0, the packet waiting at the head of the QoS queue is examined for size. If the byte size of the packet is less than or equal to the queue's total credit, the packet is forwarded and the packet byte size is deducted from the credit bytes. The queue's unused credit is carried over to the next iteration (held in its total credit), up to the maximum amount allocated to the queue.

For example, if a flow is rated for 40Mb/s, but can burst up to 20,000 bytes at a time, the queue would be configured as follows on a system with a 25-µs iteration:

  • Iteration Credit = 125 bytes (40 Mb/s is 125 bytes every 25 µs)
  • Maximum Credit = 20,000 bytes

The sum total of iteration credit for all queues in the cluster should add up to the total expected data rate of the egress device. When configuring a cluster, it is important that this rule be followed.

Global Credit and Borrowing

After all packets have been transmitted from a QoS queue, the queue's remaining total credit can not exceed the maximum credit allocated to that queue. Any credit bytes above the maximum credit limit are added to a global credit sum, and the total credit is set to the maximum credit.

Any queue may borrow from the global credit pool when doing so allows the queue to transmit an additional packet or is used to fill its allotted maximum credit level. This is done on a first come, first served basis. The global credit system allows queues that are allocated less credit than necessary to saturate a device to make use of the additional bandwidth when it is not being used by the other QoS queues in the cluster.

Thus in the example above, the queue was set to 40 Mb/s can use the entire bandwidth of the egress device when the other cluster queues are idle.

There is also a configurable maximum size on global credit. The limit on global credit is checked after every queue is processed. So for example, if the maximum global credit were set to 0, then the credit borrowing feature would be disabled.

Congestion and Packet Discard

A queue can become congested if the bandwidth of data arriving exceeds the bandwidth allocated or available. Each queue has a drop threshold expressed in bytes. Once the backlog in a QoS queue reaches its drop threshold, any packets that can not be transmitted are discarded until the backlog is cleared back below the threshold level.

For example, the 40-Mb/s flow with the 20,000-byte burst could be assumed to be congested if more than one burst’s worth of data has accumulated on the QoS queue. In this case, the drop threshold would be set to 40,000 bytes.

Congestion and Credit Scaling

The destination queue for a QoS cluster may also be congested. For example, a cluster may configure 100-Mb/s worth of data on an Ethernet device, but find that, for various reasons, the device is capable of sending only 70 Mb/s. The cluster algorithm will automatically scale the credit assigned to each queue according to how congested the egress queue becomes.

Each QoS cluster is configured with four egress congestion threshold values. Iteration credit is assigned to each queue in the cluster depending on the egress congestion, and the value of these four congestion thresholds. This is implemented as shown in Table 4-54.

Table 4-54 Destination Congestion and Credit Scaling

Egress Queue Congestion (Backlog) Level QoS Queue Credit Assigned
Backlog < Threshold 1 Double credit
Backlog >= Threshold 1 and Backlog < Threshold 2 Normal credit
Backlog >= Threshold 2 and Backlog < Threshold 3 Half credit
Backlog >= Threshold 3 and Backlog < Threshold 4 Quarter credit
Backlog >= Threshold 4 No credit

Note that the use of double credit for near idle situations is used to ensure that each queue's burst potential be refilled as quickly as possible. It also allows the full bandwidth of a device to be used when the allocated bandwidth isn't quite enough to fill the device (for example allocating 98 Mb/s from a 100-Mb/s device).

If the egress queue for a cluster becomes congested due to external influences (like heavy load on the network), the credit scaling will affect each QoS queue equally. There may be cases in which some flows require hard real-time scheduling. In this case, the queue can be marked as real time and exempt from credit scaling.

For example, in a 100-Mb/s system that has two flows, a 40-Mb/s flow and everything else, the first queue in the cluster would be configured as 40-Mb/s real time, and the second queue can be configured as 60-Mb/s (without the real time setting). As the available bandwidth on the network drops, the 40-Mb/s flow would remain unaffected, while the 60-Mb/s flow would be scaled down.

Fixed Priority Configuration

This algorithm can also be used to implement a fixed-priority method, in which each queue is serviced in a fixed priority with the first queue in the cluster being the highest priority. This is done by assigning all iteration credit to the first queue in the cluster, and setting the maximum credit of each queue to the maximum packet size. This ensures that credit is passed only to subsequent queues when there are no packets waiting on the current queue.

For example, assume there are three queues, A, B, and C. In a simple priority system, queue A would always transmit packets when packets are available, while queue B transmits only when queue A is idle, and queue C transmits only when queue B is idle.

On a 100-Mb/s system, the queues could be configured as follows:

  • Queue A
    • Iteration Credit = 313 (100 Mb/s is 312.5 bytes every 25 µs)
    • Max Credit = 1514
  • Queue B
    • Iteration Credit = 0
    • Max Credit = 1514
  • Queue C
    • Iteration Credit = 0
    • Max Credit = 1514

The way the algorithm works, queue A will get 313 bytes of credit at the start of each iteration. Because queue A can hold up to 1514 bytes as max credit, queue A will never pass credit onto queue B while queue A has a packet. (If queue A has more than 1514 bytes of credit, it can always forward a packet.)

Queue A must be idle for an entire packet time (1514 bytes of iteration credit) before any credit will start flowing into queue B. The same relationship holds between queue B and queue C. The only way queue B sends a packet is after queue A is idle for a packet time, and the only way queue C can send a packet is after queue B is idle for a packet time.