SWRZ150A December   2024  – October 2025 AWR2544

 

  1.   1
  2. 1Introduction
  3. 2Device Nomenclature
  4. 3Device Markings
  5. 4Advisory to Silicon Variant / Revision Map
  6. 5Known Design Exceptions to Functional Specifications
    1.     MSS#25
    2.     MSS#27
    3.     MSS#28
    4.     MSS#29
    5.     MSS#30
    6.     MSS#33
    7.     MSS#40
    8. 5.1  MSS#49
    9. 5.2  MSS#52
    10. 5.3  MSS#53
    11. 5.4  MSS#54
    12. 5.5  MSS#55
    13. 5.6  MSS#56
    14. 5.7  MSS#57
    15. 5.8  MSS#58
    16. 5.9  MSS#59
    17. 5.10 MSS#60
    18. 5.11 MSS#61
    19. 5.12 MSS#62
    20. 5.13 MSS#63
    21. 5.14 MSS#64
    22. 5.15 MSS#65
    23.     MSS#68
    24.     MSS#71
    25. 5.16 ANA#12A
    26.     ANA#37A
    27.     ANA#39
    28.     ANA#43
    29.     ANA#44
    30.     ANA#45
    31.     ANA#47
    32.     ANA#59
  7.   Trademarks
  8. 6Revision History

MSS#71

Single bit ECC (error correction) mechanism can cause an incorrect memory update

Revision(s) Affected:

AWR2544

Description:

Note: The issue was uncovered during the debug of an incorrect memory access sequence in simulations. Till date there are no such issues reported in-field / deployment scenarios by any of our customers.

In the uncommon occurrences of single bit upset events on below tabulated memory ranges in the SoC, under a specific memory access sequence combination, the single bit error correction mechanism can cause an incorrect memory update.

The RAM memories on AWR294x are ECC protected with a Single bit Error Correction, Double bit Error Detection (SECDED) mechanism. On the occurrence of a specific sequence of events, the single bit error correction mechanism can cause an incorrect memory update.

For the issue to cause an impact to the application, all the below conditions must satisfy

  • Random hardware faults, due to environmental conditions or other factors, leading to a single bit upset events occur, AND
  • The Single bit upset event affects the impacted memory ranges, AND
  • A read or partial-write access to the memory location with single bit error occurs (leading to the single bit error correction mechanism kicking in), AND
  • A specific memory access sequence combination occurs after the single bit error correction happens, AND
  • The incorrect memory update by the error correction mechanism is critical enough to impact the application program flow and is undetected by other safety mechanisms.

The following access combination (Conditions 3 and 4 above) to the impacted memory range after the single bit error correction happens can cause the issue.

  • Read / Partial write access (from/to the location A with SEC) → (Followed by) Full write (to one or more memory locations in the same memory range) → (Followed by) Partial write (to any other location in the same memory range) : leads to incorrect update to last full-write location.
  • Partial write access (to the location A with SEC) → (Followed by) Partial write (to any other location in the same memory range) : leads to incorrect update to location A.

Note: The issue doesn’t occur for all other combinations of memory access sequence combinations.

Workaround(s):

The single bit upset events are uncommon with lower probability of occurrence.

  • The scenario must lead to single bit errors alone. Double bit errors are only detected and on double bit errors, depending on the criticality, the device is taken to safe state.

Partial write memory accesses (needed to cause the issue) are limited as

  • Cached memory ranges do not lead to partial write accesses as cache lines writes are always full writes.
    • Ex., MSS L2 memories
  • Code sections are read only (hence the entire code section accesses do not satisfy the conditions to cause the issue).
    • Ex., MSS L2 memories
  • Impacted memories with partial write accesses can have other safety mechanisms that can detect or avoid such random errors.
    • Higher level processing algorithms of Radar data cube have built in outlier rejection capabilities due to tracking functions (temporal and logical monitoring).
      • Ex., DSS L3
    • Information redundancy techniques may be used on impact memories like Mailbox to detect errors.
      • Ex., Mailbox memories

In the impacted memory ranges, identify if there are possibilities of partial memory write accesses. Decide on the criticality for the need to take cation on such identified memory ranges with partial writes. Following are the possible courses of actions:

No Action:
  • If single bit upset events are unlikely in the operating environment.
  • If there are other safety mechanisms that can detect or avoid such spurious random errors.

  • Action: One or more of the following options can be considered
    • Avoid the partial write access pattern to those memory ranges.
    • Re-initialise the impacted memory bank on single bit memory correction event.
    • Treat the single bit memory correction event as an un-correctable error and enter safe state.
      • This does not impact the Functional safety detectability claims and may impact the availability in the event of such single bit upset occurrence.

Refer below table for memory range and its corresponding ESM line & ECC aggregator bit if action (2-b-ii) needs to be taken.

This table includes only impacted memory list and corresponding details regarding

Memory NameStart addressEnd AddressESM LineECC Aggregator Status bit
DSS L3 Bank00x880000000x880BFFFFDSS_ESM:: GROUP1 Line No- 92DSS_ECC_AGG::SEC_STATUS_REG0:: DSS_L3RAM0_PEND
DSS L3 Bank10x8800C0000x8817FFFFDSS_ESM:: GROUP1 Line No- 92DSS_ECC_AGG::SEC_STATUS_REG0:: DSS_L3RAM1_PEND
DSS L3 Bank20x881800000x881FFFFFDSS_ESM:: GROUP1 Line No- 92DSS_ECC_AGG::SEC_STATUS_REG0:: DSS_L3RAM2_PEND
DSS L3 Bank30x882000000x8827FFFFDSS_ESM:: GROUP1 Line No- 92DSS_ECC_AGG::SEC_STATUS_REG0:: DSS_L3RAM3_PEND
MSS L2 Bank00xC02000000xC027FFFFMSS_ESM:: GROUP1 Line No-18MSS_ECC_AGG_MSS::SEC_STATUS_REG0:: MSS_L2SLV0_PEND
MSS L2 Bank10xC02800000xC02EFFFFMSS_ESM:: GROUP1 Line No-18MSS_ECC_AGG_MSS::SEC_STATUS_REG0:: MSS_L2SLV1_PEND
MSS Mailbox0xC50000000xC5001FFFMSS_ESM:: GROUP1 Line No-18MSS_ECC_AGG_MSS::SEC_STATUS_REG0:: MSS_MBOX_PEND
MSS_RETRAM0xC50100000xC50107FFMSS_ESM:: GROUP1 Line No-18MSS_ECC_AGG_MSS::SEC_STATUS_REG0:: MSS_RETRAM_PEND
DSS Mailbox0x831000000x83100FFFDSS_ESM:: GROUP1 Line No- 92DSS_ECC_AGG::SEC_STATUS_REG0:: DSS_MAILBOX_PEND
Note: MSS_L2 address captured above is from DSS and EDMA addressing View. MSS_L2_BANK0 and MSS_L2_BAK1 address view from MSS-R5 is 0x10200000-0x1027FFFF. and 0x10280000-0x102EFFFF respectively

Other memories that are not utilized by the application but used by the BSS, such as BSS_Mailbox and BSS_Static_RAM, are also affected by this errata

  • The BSS mailbox is primarily used for communication between the BSS and MSS/DSS using mmWaveLink, following a message protocol that incorporates CRC for data integrity. Using CRC during message exchanges over the BSS mailbox reduces the risk associated with this memory.
  • When a fault occurs (in this case, an ECC SEC), BSS sends an ESM Fault Asynchronous event message to the MSS/DSS as a notification. The application must read the b20:ECC_AGG_SEC_ERROR from AWR_AE_RF_ADV_ESMFAULT_STATUS_SB async-event from the BSS. Treat this single-bit memory correction event as an uncorrectable error and enter to a safe state.
    • This workaround is only valid if the application uses BSS Patch from DFP version 2.4.14 or earlier