SPRUIL1D May 2019 – December 2024 DRA829J , DRA829J-Q1 , DRA829V , DRA829V-Q1 , TDA4VM , TDA4VM-Q1
For SOC components outside the NAVSS and Compute Cluster (CC), the device provides a pool of address mapping resources composed of Arm SMMUv3 TBU instances, TI PVU instances and TI PAT instances. At system configuration time, software programs the SOC infrastructure (including SOC CBASS and NAVSS CBASS) such that I/O modules and hardware accelerators are associated with specific address mapping resources. I/O modules may also be configured to master transactions directly through NAVSS CBASS with trusted physical addresses by setting the transaction's CBA ATYPE bits either in the specific module's configuration or with the SOC CBASS port logic.
UDMA has its own dedicated set of four PVU address mapping resource modules that are in series with the connections from UDMA.
Components in the compute cluster that master transactions implement the Armv8 two-level translation scheme or equivalent functionality using PAT and PVU modules.
The options for transactions mastered by I/O, HWA or UDMA are summarized in Table 8-32.
| Source | Hop 1 | Hop 2 | Hop 3 | Hop 4 | Hop 5 | Hop 6 | Hop 7 | Notes |
|---|---|---|---|---|---|---|---|---|
| I/O device | SOC switch | CBASS | Destination | Trusted physical address | ||||
| I/O device | SOC switch | CBASS | TBU | CBASS | Destination | Non-realtime transaction | ||
| I/O device | SOC switch | CBASS | PAT | CBASS | Destination | Real-time, trusted transaction | ||
| I/O device | SOC switch | CBASS | PVU | CBASS | Destination | Real-time, trusted transaction | ||
| I/O device | SOC switch | CBASS | PAT | CBASS | PVU | CBASS | Destination | Real-time, trusted transaction |
| HWA | CBASS | Destination | Trusted physical address | |||||
| HWA | CBASS | TBU | CBASS | Destination | Non-realtime transaction | |||
| HWA | CBASS | PAT | CBASS | Destination | Real-time, trusted transaction | |||
| HWA | CBASS | PVU | CBASS | Destination | Real-time, trusted transaction | |||
| HWA | CBASS | PAT | CBASS | PVU | CBASS | Destination | Real-time, trusted transaction | |
| UDMA | CBASS | Destination | Trusted physical address | |||||
| UDMA | CBASS | TBU | CBASS | Destination | Non-realtime transaction | |||
| UDMA | PVU | CBASS | Destination | Real-time, trusted transaction | ||||
| UDMA | CBASS(1) | PAT | CBASS | Destination | Real-time, trusted transaction | |||
| UDMA | CBASS(1) | PAT | CBASS | PVU | CBASS | Destination | Real-time, trusted transaction |
Figure 8-12 shows an overview of VirtSS operation.
Figure 8-12 Functional DiagramAggregated requests from I/O or HWA modules are mapped directly to NAVSS CBASS ports. When using I/O address translation, normal switching in NAVSS CBASS passes transactions to dedicated CBASS ports connected to the address mapping resource ports. The logic on the path through each TBU, PVU or PAT module reads transactions, performs translations, optionally modifies transaction side-band information, and sends translated transactions back into the interconnect to be switched to the transaction's next destination.
The address mapping resource modules have a moderate number of cached or stored translations that can service translations with low latency. All of the TBU instances connect to a single TCU through the DTI protocol. When a TBU lookup miss occurs, a request is sent from the TBU using the DTI protocol over an AXI switch interconnect to the TCU. The requested translation may be cached in the TCU. If the translation is not present, table walking state machines load the translation, typically in multiple memory accesses, from tables via additional transactions through CBASS and MSMC with DDR memory.
Exception events can occur when PAT or PVU lookups are requested but there is not a matching memory range.
The architecture and microarchitecture of the TBU and TCU are specified by Arm documentation [http://infocenter.arm.com].
Figure 8-13 Functional ExampleThe address mapping resource modules have limited command bandwidth. The peak command rate of the TBU, PVU and PAT is one command per clock cycle, at the CBASS clock rate. There is some level of hit-under-miss support in the TBU allow commands to proceed through the TBU while misses are being serviced by the TCU. Depending on the TBU access pattern, the command bandwidth may be much less than the peak rate.
Software assignment of I/O modules or HWAs to address translation resource requires some estimation of the bandwidth or command rate of the specific I/O modules in the system, the characteristics of the address sequence per virtual machine (which may include multiple channels), the page sizes used for the address translation and the number of VMs expected to use the I/O modules. As a crude rule-of-thumb, a specific TBU should support near peak rate if 3×SUMi(VMi × Channelsi) for the virtual machines and each virtual machine's channels is less than 128.
Latencies grow roughly by orders of magnitude as translations proceed from the TBU through TCU. Accesses that hit the TBU uTLB or miss the TBU uTLB and hit the TBU MTLB have a throughput of one access per cycle. There is a < 100 cycle penalty on TBU uTLB miss that misses in TBU MTLB and hits in TCU. Subsequent accesses may not observe this penalty because of hit under miss processing. There is a < 1000 cycle penalty on TBU uTLB miss that misses in TBU MTLB and misses in TCU MTLB and require table walk. Hit-under-miss behavior is not beneficial in this case.
| Mapping Resource | Peak bandwidth supported (assuming 256b interconnect at 500MHz) over all instances |
|---|---|
| 1 TBU for I/O, UDMA and HWA modules | 16 GB/s |
| 4 PAT (parallel small and large size instances) for I/O and HWA modules | 64 GB/s |
| 4 PVU for I/O and HWA modules | 64 GB/s |
| 4 PVU for UDMA | 64 GB/s |
This architecture depends on proper configuration to reach performance targets. The software configuration of the SOC and NAVSS components that control the mapping of I/O devices and hardware accelerators to address mapping resources must minimize contention for the limited TBU entries. Failure to load balance the TBU instances has several undesirable consequences. First, the latency of the initial hop through NAVSS CBASS to the TBU is modulated by the throughput of the TBU pair. Frequent TBU misses that require TCU lookups reduce the TBU throughput and can increase the congestion in CBASS. Second, long term severe congestion in links results in head-of-line (HOL) blocking for the other I/O devices or hardware accelerators that have been aggregated onto the same CBASS input port. This results in undesirable performance interference between I/O devices or hardware accelerators.
In addition to performance considerations, this architecture depends on proper configuration to avoid deadlocks. If all the available hit-under-miss capacity in the TBU and TCU are exhausted and there is a TBU miss that must be handled by the TCU and the TCU must access memory, that path to memory must not be obstructed by the transaction that is waiting on the TCU response to the TBU. If this obstruction occurs then there will be no forward progress on that TBU transaction and a deadlock will occur. For deadlock-free operation, the TCU path through CBASS to MSMC and DDR cannot share resources with any path from and I/O device or hardware accelerator to a TBU instance.