SPRZ491D december   2020  – june 2023 DRA821U , DRA821U-Q1

 

  1.   1
  2. 1Modules Affected
  3. 2Nomenclature, Package Symbolization, and Revision Identification
    1. 2.1 Device and Development-Support Tool Nomenclature
    2. 2.2 Devices Supported
    3. 2.3 Package Symbolization and Revision Identification
  4. 3Silicon Revision 1.0, 2.0 Usage Notes and Advisories
    1. 3.1 Silicon Revision 1.0, 2.0 Usage Notes
    2. 3.2 Silicon Revision 1.0, 2.0 Advisories
    3.     i2049
    4.     i2062
    5.     i2091
    6.     i2116
    7.     i2123
    8. 3.3 i2126
    9. 3.4 i2127
    10.     i2134
    11.     i2137
    12.     i2146
    13. 3.5 i2151
    14.     i2157
    15.     i2159
    16.     i2160
    17.     i2161
    18.     i2163
    19.     i2166
    20.     i2177
    21.     i2182
    22.     i2183
    23.     i2184
    24.     i2185
    25.     i2186
    26.     i2187
    27.     i2189
    28.     i2196
    29.     i2197
    30.     i2201
    31.     i2205
    32.     i2207
    33.     i2208
    34.     i2209
    35.     i2216
    36.     i2217
    37.     i2221
    38.     i2222
    39.     i2227
    40.     i2228
    41.     i2232
    42.     i2234
    43.     i2235
    44.     i2237
    45.     i2241
    46.     i2242
    47.     i2243
    48.     i2244
    49.     i2245
    50.     i2246
    51.     i2249
    52.     i2253
    53.     i2257
    54.     i2274
    55.     i2275
    56.     i2277
    57.     i2278
    58.     i2279
    59.     i2283
    60.     i2306
    61.     i2307
    62.     i2310
    63.     i2311
    64.     i2312
    65.     i2320
    66.     i2326
    67.     i2329
    68.     i2351
    69.     i2360
    70.     i2361
    71.     i2362
    72.     i2366
    73.     i2371
    74.     i2372
    75.     i2383
  5.   Trademarks
  6.   Revision History

i2163


UDMAP: UDMA transfers with ICNTs and/or src/dst addr NOT aligned to 64B fail when used in "event trigger" mode

Details:

Note: The following description uses an example a C7x DSP core, but it applies to any other processing cores which can program the UDMA.

For DSP algorithm processing on C6x/C7x, the software often uses UDMA in NavSS or DRU in MSMC. In many cases, UDMA is used instead of DRU, because DRU channels are reserved in many use-cases for C7x/MMA deep learning operations. In a typical DSP algorithm processing, data is DMA'ed block by block to L2 memory for DSP, and DSP operates on the data in L2 memory instead of operating from DDR (through the cache). The typical DMA setup and event trigger for this operation is as below; this is referred to as "2D trigger and wait" in the following example.

For each "frame":

  1. Setup a TR typically 3 or 4 dimension TR.
    1. Set TYPE = 4D_BLOCK_MOVE_REPACKING_INDIRECTION
    2. Set EVENT_SIZE = ICNT2_DEC
    3. Set TRIGGER0 = GLOBAL0
    4. Set TRIGGER0_TYPE = ICNT2_DEC
    5. Set TRIGGER1 = NONE
    6. ICNT0 x ICNT1 is block width x block height
    7. ICNT2 = number of blocks
    8. ICNT3 = 1
    9. src addr = DDR
    10. dst addr = C6x L2 memory
  2. Submit this TR
    1. This TR starts a transfer on GLOBAL TRIGGER0 and transfers ICNT0xICNT1 bytes, then raises an event
  3. For each block do the following:
    1. Trigger DMA by setting GLOBAL TRIGGER0
    2. Wait for the event that indicates that the block is transferred
    3. Do DSP processing
This sequence is a simplified sequence; in the actual algorithm, there can be multiple channels doing DDR to L2 or L2 DDR transfer in a "ping-pong" manner, such that DSP processing and DMA runs in parallel. The event itself is programmed appropriately at the channel OES registers, and the event status check is done using a free bit in IA for UDMA.

When the following conditions occur, the event in step 3.2 is not received for the first trigger:

  • Condition 1: ICNT0xICT1 is NOT a multiple of 64.
  • Condition 2: src or dst is NOT a multiple of 64.
  • Condition 3: ICNT0xICT1 is NOT a multiple of 64 and src/dst address not a multiple of 64
Multiple of 16B or 32B for ICNT0xICNT1 and src/dst addr also has the same issue, where the event is not received. Only alignment of 64B makes it work.

Conditions in which it works:

  • If ICNT0xICNT1 is made a multiple of 64 and src/dst address a multiple of 64, the test case passes.
  • If DRU is used instead of UDMA, then the test passes. You must submit the TR to DRU through the UDMA DRU external channel. With DRU and with ICNTs and src/dst addr unaligned, the user can trigger and get events as expected when TR is programmed such that the number of events and number of triggers in a frame is 1, i.e ICNT2 = 1 in above case or EVENT_SIZE = COMPLETION and trigger is NONE. Then the completion event occurs as expected. This is not feasible to be used by the use-cases in question.
Above is a example for "2D trigger and wait", the same constraint applies for "1D trigger and wait" and "3D trigger and wait":

  • For "1D trigger and wait", ICNT0 MUST be multiple of 64
  • For "3D trigger and wait", ICNT0xICNT1xICNT2 MUST be multiple of 64

Workaround(s):

Set the EOL flag in TR for UDMAP as shown in following example:

  • 1D trigger and wait
    • TR.FLAGS |= CSL_FMK(UDMAP_TR_FLAGS_EOL, CSL_UDMAP_TR_FLAGS_EOL_ICNT0);
  • 2D trigger and wait
    • TR.FLAGS |= CSL_FMK(UDMAP_TR_FLAGS_EOL, CSL_UDMAP_TR_FLAGS_EOL_ICNT0_ICNT1);
  • 3D trigger and wait
    • TR.FLAGS |= CSL_FMK(UDMAP_TR_FLAGS_EOL,CSL_UDMAP_TR_FLAGS_EOL_ICNT0_ICNT1_ICNT2);

There is no performance impact due to this workaround.