SWCU195 User guide

SWCU195A December 2024 – May 2025 CC2744R7-Q1 , CC2745P10-Q1 , CC2745R10-Q1 , CC2745R7-Q1 , CC2755R10

1
Read This First
1. About This Manual
2. Devices
3. Register, Field, and Bit Calls
4. Related Documentation
5. Trademarks
1 Architectural Overview
1. 1.1 Target Applications
2. 1.2 Introduction
3. 1.3 Arm Cortex M33
4. 1.4 On-Chip Memory
5. 1.5 Power Supply System
6. 1.6 Radio
7. 1.7 Hardware Security Module
8. 1.8 AES 128-Bit Cryptographic Accelerator
9. 1.9 System Timer (SYSTIM)
10. 1.10 General Purpose Timers (LGPT)
11. 1.11 Always-ON (AON) or Ultra-Low Leakage (ULL) Domain
12. 1.12 Direct Memory Access
13. 1.13 System Control and Clock
14. 1.14 Communication Peripherals
15. 1.15 Programmable I/Os
16. 1.16 Algorithm Processing Unit (APU)
17. 1.17 Serial Wire Debug (SWD)
2 Arm® Cortex®-M33 Processor
1. 2.1 Arm® Cortex®-M33 Processor Introduction
2. 2.2 M33 Instantiation Parameters
3. 2.3 Arm® Cortex®-M33 System Peripheral Details
4. 2.4 CPU Sub-System Peripheral Details
5. 2.5 Programming Model
6. 2.6 TrustZone-M
7. 2.7 Arm® Cortex®-M33 Registers
3 Memory Map
1. 3.1 Memory Map
4 Interrupts and Events
1. 4.1 Exception Model
2. 4.2 Fault Handling
3. 4.3 Security State Switches
4. 4.4 Event Fabric
5. 4.5 Digital Test Bus (DTB)
6. 4.6 EVTSVT Registers
7. 4.7 EVTULL Registers
5 Debug Subsystem
1. 5.1 Introduction
2. 5.2 Block Diagram
3. 5.3 Overview
  1. 5.3.1 Physical Interface
  2. 5.3.2 Debug Access Ports
4. 5.4 Debug Features
5. 5.5 Behavior in Low Power Modes
6. 5.6 Restricting Debug Access
7. 5.7 Mailbox (DSSM)
8. 5.8 Mailbox Events
  1. 5.8.1 CPU Interrupt Event (AON_DBG_COMB)
9. 5.9 Software Considerations
10. 5.10 DBGSS Registers
6 Power, Reset, and Clocking
1. 6.1 Introduction
2. 6.2 System CPU Modes
3. 6.3 Supply System
  1. 6.3.1 Internal DC/DC Converter and Global LDO
4. 6.4 Power States
5. 6.5 Digital Power Partitioning
6. 6.6 Clocks
7. 6.7 Resets
8. 6.8 AON (REG3V3) Register Bank
7 Internal Memory
1. 7.1 SRAM
2. 7.2 VIMS
3. 7.3 FLASH
  1. 7.3.1 FLASH Registers
8 Hardware Security Module (HSM)
1. 8.1 Introduction
2. 8.2 Overview
3. 8.3 One-Time-Programmable (OTP) Controller
  1. 8.3.1 High-Level Sequence to Handle OTP Requests
4. 8.4 Mailbox and Register Access Firewall
5. 8.5 DMA Firewall
6. 8.6 Coprocessor
7. 8.7 HSM FW
8. 8.8 HSM Registers
9. 8.9 HSMCRYPTO Registers
9 Device Boot and Bootloader
1. 9.1 Device Boot and Programming
2. 9.2 Flash Programming
3. 9.3 Device Management Command Interface
  1. 9.3.1 SACI Communication Protocol
  2. 9.3.2 SACI Commands
4. 9.4 Bootloader Support
  1. 9.4.1 Bootloader v.s Secure Boot
5. 9.5 ROM Serial Bootloader
10Device Configuration
1. 10.1 Guidelines for Securely Configuring Your Device
2. 10.2 Factory Configuration (FCFG)
3. 10.3 Customer Configuration (CCFG)
4. 10.4 Security Configuration (SCFG)
11Secure Boot
1. 11.1 Secure Boot
2. 11.2 Execution Flow
3. 11.3 ROM API
  1. 11.3.1 HAPI (Hardware API)
  2. 11.3.2 Registers
4. 11.4 Configuration
5. 11.5 Generic Image Format
6. 11.6 Application Update
  1. 11.6.1 Image Format
7. 11.7 Secondary Secure Bootloader Update
  1. 11.7.1 Image Format
  2. 11.7.2 Update Pattern
8. 11.8 Key Update
  1. 11.8.1 Image Format
9. 11.9 Antirollback
10. 11.10 Version Log (VLOG)
  1. 11.10.1 Record structure
11. 11.11 Fallback
12. 11.12 ROM Panic
12General Purpose Timers (LGPT)
1. 12.1 Overview
2. 12.2 Block Diagram
3. 12.3 Functional Description
4. 12.4 Timer Modes
5. 12.5 LGPT0 Registers
6. 12.6 LGPT1 Registers
7. 12.7 LGPT2 Registers
8. 12.8 LGPT3 Registers
13Algorithm Processing Unit (APU)
1. 13.1 Introduction
2. 13.2 APU Related Collateral
3. 13.3 Functional Description
4. 13.4 APU Operation
5. 13.5 Interrupts and Events
6. 13.6 Data Representation
7. 13.7 Data Memory
8. 13.8 Software
9. 13.9 APU Registers
14Voltage Glitch Monitor (VGM)
1. 14.1 Overview
2. 14.2 Features and Operation
15System Timer (SYSTIM)
1. 15.1 Overview
2. 15.2 Block Diagram
3. 15.3 Functional Description
  1. 15.3.1 Common Channel Features
  2. 15.3.2 Interrupts and Events
4. 15.4 SYSTIM Registers
16Real Time Clock (RTC)
1. 16.1 Introduction
2. 16.2 Block Diagram
3. 16.3 Interrupts and Events
4. 16.4 CAPTURE and COMPARE Configurations
  1. 16.4.1 CHANNEL 0 - COMPARE CHANNEL
  2. 16.4.2 CHANNEL 1—CAPTURE CHANNEL
5. 16.5 RTC Registers
17Low Power Comparator (SYS0)
1. 17.1 Introduction
2. 17.2 Block Diagram
3. 17.3 Functional Description
4. 17.4 SYS0 Registers
18Battery Monitor, Temperature Sensor, and DCDC Controller (PMUD)
1. 18.1 Introduction
2. 18.2 Functional Description
  1. 18.2.1 BATMON
  2. 18.2.2 DCDC
3. 18.3 PMUD Registers
19Micro Direct Memory Access (µDMA)
1. 19.1 Introduction
2. 19.2 Block Diagram
3. 19.3 Functional Description
4. 19.4 DMA Registers
20Advanced Encryption Standard (AES)
1. 20.1 Introduction
  1. 20.1.1 AES Performance
2. 20.2 Functional Description
3. 20.3 Encryption and Decryption Configuration
4. 20.4 AES Registers
5. 20.5 CRYPTO Registers
21Analog to Digital Converter (ADC)
1. 21.1 Overview
2. 21.2 Block Diagram
3. 21.3 Functional Description
4. 21.4 Advanced Features
5. 21.5 ADC Registers
22I/O Controller (IOC)
1. 22.1 Introduction
2. 22.2 Block Diagram
3. 22.3 I/O Mapping and Configuration
4. 22.4 Edge Detection
5. 22.5 GPIO
6. 22.6 I/O Pins
7. 22.7 Unused Pins
8. 22.8 Debug Configuration
9. 22.9 IOC Registers
10. 22.10 GPIO Registers
23Universal Asynchronous Receiver/Transmitter (UART-LIN)
1. 23.1 Introduction
2. 23.2 Block Diagram
3. 23.3 UART Functional Description
4. 23.4 UART-LIN Specification
5. 23.5 Interface to µDMA
6. 23.6 Initialization and Configuration
7. 23.7 UART Registers
24Serial Peripheral Interface (SPI)
1. 24.1 Overview
  1. 24.1.1 Features
  2. 24.1.2 Block Diagram
2. 24.2 Signal Description
3. 24.3 Functional Description
4. 24.4 µDMA Operation
5. 24.5 Initialization and Configuration
6. 24.6 SPI Registers
25Inter-Integrated Circuit (I2C)
1. 25.1 Introduction
2. 25.2 Block Diagram
3. 25.3 Functional Description
4. 25.4 Initialization and Configuration
5. 25.5 I2C Registers
26Inter-IC Sound (I2S)
1. 26.1 Introduction
2. 26.2 Block Diagram
3. 26.3 Clock Architecture
4. 26.4 Signal Descriptions
5. 26.5 Functional Description
6. 26.6 Memory Interface
7. 26.7 Samplestamp Generator
8. 26.8 Error Detection
9. 26.9 Usage
  1. 26.9.1 Start-Up Sequence
  2. 26.9.2 Shutdown Sequence
10. 26.10 I2S Configuration Guideline
11. 26.11 I2S Registers
27CAN-FD
1. 27.1 Introduction
2. 27.2 Functions
3. 27.3 MCAN Subsystem
4. 27.4 MCAN Functional Description
5. 27.5 CC27xx MCAN Wrapper
6. 27.6 MCAN Clock Enable
7. 27.7 Additional Notes
8. 27.8 CANFD Registers
28Radio
1. 28.1 Introduction
2. 28.2 Block Diagram
3. 28.3 Overview
4. 28.4 Radio Usage Model
  1. 28.4.1 CRC and Whitening
5. 28.5 LRFDDBELL Registers
6. 28.6 LRFDMDM32 Registers
7. 28.7 LRFDPBE Registers
8. 28.8 LRFDPBE32 Registers
9. 28.9 LRFDRFE Registers
10. 28.10 LRFDRFE32 Registers
11. 28.11 LRFDRXF Registers
12. 28.12 LRFDS2R Registers
13. 28.13 LRFDTRC Registers
14. 28.14 LRFDTXF Registers
29Revision History

2.4.4 Custom Datapath Extension (CDE)

Four custom instructions have been added as part of CDE. These are the common instructions supported by HW acceleration leveraging the CDE feature of CM33. Ternary MAC and BNN are special types with weight quantization supported to 2 bits and 1 bit respectively, therefore enabling multiple operations, such as multiply and accumulate, in a single clock cycle. Similarly, MMA (8x8 MAC) supports better throughput than using core instructions of ARM CPU. The batch normalization layer is also an integral part of CNN and is repeated depending on the network topology. This helps to speed up the training process, and hence instruction support for BN is important for overall performance of a typical network.

Matrix Multiplication - TMA
Matrix Multiply and Accumulate - MMA
Batch Normalization - BN
Support for Binary Neural Network - BNN

The custom instruction is this format:

CX3{A} {cond}, <coproc>, <Rd>, <Rn>, <Rm>, #<imm>
CX3D{A} {cond}, <coproc>, <Rd>, <Rd+1>, <Rn>, <Rm>, #<imm>

Which of the four instructions to execute is decided by the #<imm>field, and the opcodes are below:

#imm=0 TMA (Signed)
#imm=1 BNORM
#imm=2 BNN
#imm=3 TMA (Unsigned)
#imm=4 MMA (Signed)
#imm=5 MMA (Unsigned)

The pseudocode for each of the instructions is mentioned below:

#define COPROC 0

#define imm_TMA4X4S 0

#define imm_BNORM4 1

#define imm_BNN16X4 2

#define imm_TMA4X4U 3

#define imm_MMA2X2S 4

#define imm_MMA2X2U 5

uint64_t Rd;

uint64_t Y;

uint32_t Rn, Rm;

Ternary Matrix Multiply and Accumulate (TMA)

Y = __arm_cx3da(COPROC, Rd, Rn, Rm, imm_TMA4X4S); //Y is 64 bit result

Y = {Rd+1(t+1), Rd(t+1)}

Rd(t+1) = {Y[1], Y[0]}

Rd+1(t+1) = {Y[3], Y[2]}

Y[3] = Saturate (sign_extend(Rd+1[31:16]) + Rm[7:6] * sign_extend(Rn[31:24]) + Rm[5:4] * sign_extend(Rn[23:16]) + Rm[3:2] * sign_extend(Rn[15:8]) + Rm[1:0] * sign_extend(Rn[7:0]))

Y[2] = Saturate (sign_extend(Rd+1[15:0]) + Rm[7:6] * sign_extend(Rn[31:24]) + Rm[5:4] * sign_extend(Rn[23:16]) + Rm[3:2] * sign_extend(Rn[15:8]) + Rm[1:0] * sign_extend(Rn[7:0]))

Y[1] = Saturate (sign_extend(Rd[31:16]) + Rm[7:6] * sign_extend(Rn[31:24]) + Rm[5:4] * sign_extend(Rn[23:16]) + Rm[3:2] * sign_extend(Rn[15:8]) + Rm[1:0] * sign_extend(Rn[7:0]))

Y[0] = Saturate (sign_extend(Rd[15:0]) + Rm[7:6] * sign_extend(Rn[31:24]) + Rm[5:4] * sign_extend(Rn[23:16]) + Rm[3:2] * sign_extend(Rn[15:8]) + Rm[1:0] * sign_extend(Rn[7:0]))

Y = __arm_cx3da(COPROC, Rd, Rn, Rm, imm_TMA4X4U); //Y is 64 bit result

Y = {Rd+1(t+1), Rd(t+1)}

Rd(t+1) = {Y[1], Y[0]}

Rd+1(t+1) = {Y[3], Y[2]}

Y[3] = Saturate (sign_extend(Rd+1[31:16]) + Rm[7:6] * Rn[31:24] + Rm[5:4] * Rn[23:16] + Rm[3:2] * Rn[15:8] + Rm[1:0] * Rn[7:0])

Y[2] = Saturate (sign_extend(Rd+1[15:0]) + Rm[7:6] * Rn[31:24] + Rm[5:4] * Rn[23:16] + Rm[3:2] * Rn[15:8] + Rm[1:0] * Rn[7:0])

Y[1] = Saturate (sign_extend(Rd[31:16]) + Rm[7:6] * Rn[31:24] + Rm[5:4] * Rn[23:16] + Rm[3:2] * Rn[15:8] + Rm[1:0] * Rn[7:0])

Y[0] = Saturate (sign_extend(Rd[15:0]) + Rm[7:6] * Rn[31:24] + Rm[5:4] * Rn[23:16] + Rm[3:2] * Rn[15:8] + Rm[1:0] * Rn[7:0])

Binary Neural Network (BNN)

Y = __arm_cx3da(COPROC, Rd, Rn, Rm, imm_BNN16X4); //Y is 64 bit result

Y = {Rd+1(t+1), Rd(t+1)}

Rd(t+1) = {Y[1], Y[0]}

Rd+1(t+1) = {Y[3], Y[2]}

Y[3] = sign_extend(Rd+1[31:16]) + sign_extend(POPCOUNT(Rn[31:16] XNOR Rm[31:16]))

Y[2] = sign_extend(Rd+1[15:0]) + sign_extend(POPCOUNT(Rn[31:16] XNOR Rm[15:0]))

Y[1] = sign_extend(Rd[31:16]) + sign_extend(POPCOUNT(Rn[15:0] XNOR Rm[31:16]))

Y[0] = sign_extend(Rd[15:0]) + sign_extend(POPCOUNT(Rn[15:0] XNOR Rm[15:0]))

Matrix Multiply and Accumulate (MMA)

Y = __arm_cx3da(COPROC, Rd, Rn, Rm, imm_MMA2X2S); //Y is 64 bit result

Y = {Rd+1(t+1), Rd(t+1)}

Rd(t+1) = {Y[1], Y[0]}

Rd+1(t+1) = {Y[3], Y[2]}

{Y[1], Y[0]} = Saturate(sign_extend(Rd) + { sign_extend(Rn[7:0])* sign_extend(Rm[7:0]) + sign_extend(Rn[15:8])* sign_extend(Rm[15:8)]})

{Y[3], Y[2]} = Saturate(sign_extend(Rd+1) + { sign_extend(Rn[23:16])* sign_extend(Rm[23:16]) + sign_extend(Rn[31:24])* sign_extend(Rm[31:24])})

Y = __arm_cx3da(COPROC, Rd, Rn, Rm, imm_MMA2X2U); //Y is 64 bit result

Y = {Rd+1(t+1), Rd(t+1)}

Rd(t+1) = {Y[1], Y[0]}

Rd+1(t+1) = {Y[3], Y[2]}

{Y[1], Y[0]} = Saturate(sign_extend(Rd) + {Rn[7:0] * sign_extend(Rm[7:0]) + Rn[15:8] * sign_extend(Rm[15:8)]})

{Y[3], Y[2]} = Saturate(sign_extend(Rd+1) + {Rn[23:16] * sign_extend(Rm[23:16]) + Rn[31:24] * sign_extend(Rm[31:24])})

Batch Normalization (BN)

Y = __arm_cx3da(COPROC, Rd, Rn, Rm, imm_BNORM4); //Y is 32 bit result

Y = Rd(t+1)

Cycle 1

Rd(t+1)[15:8] = clamp( [((Rd[31:24]* sign_extend(Rn[15:8]))<<8) + (Rd[23:16]* sign_extend(Rn[15:8]))] >> Rm[21:17] )

Rd(t+1)[7:0] = clamp( [((Rd[15:8]* sign_extend(Rn[7:0]))<<8) + (Rd[7:0]* sign_extend(Rn[7:0]))] >> Rm[16:12] )

Cycle 2

Rd(t+1)[31:24] = clamp( [((Rd+1[31:24]* sign_extend(Rn[31:24]))<<8) + (Rd+1[23:16]* sign_extend(Rn[31:24]))) >> Rm[31:27] )

Rd(t+1)[23:16] = clamp( [((Rd+1[15:8]* sign_extend(Rn[23:16]))<<8) + (Rd+1[7:0]* sign_extend(Rn[23:16]))] >> Rm[26:22] )

Other important points:

<coproc> has to be 0
2x32 bit registers (Rd, Rd+1) represent 4x16 bit data, each 16-bit data to be treated as signed and Rd, Rd+1 individually are not signed 32 bit numbers. Applicable for instructions #imm - 0, 1, 2, 3
2x32 bit registers (Rd, Rd+1) represent 2x32 bit data, each to be treated as signed. Applicable for instructions #imm - 4,5
Additional decodes from instruction opcodes - Applicable only for BN
1. Clamp High - Use upper 9 bit value (11:3) as signed value directly as clamp high value
2. Clamp Low - Use lower 3 bits to decode i.e. (2:0) : 000 → 0, 001 → -2, 010 → -4, 011 → -8, 100 → -16, 101 → -32, 110 → -64, 111 → -128
Some SCB registers that should to be programmed based on security state of the processor:
1. SCB -->CPACR (Coprocessor Access Control Register) - The CPACR register specifies the access privileges for coprocessors.
2. SCB -->NSACR (Non secure Access Control Register) - The NSACR register defines the Non-secure access permissions for both the FPU and coprocessors CP
3. SCB -->CPPWR (Coprocessor Power Control Register) - Applicable for co-processor and not for CDE logic since CDE logic shares its power domain with the CPU
The 'popcount' block in the data flow diagram of BN operation calculates the number of 1s in the 16 bit signal which is input to the block.