SPRUJC6 User guide

SPRUJC6A December 2024 – July 2025 AM2752-Q1 , AM2754-Q1

12.4.4.1.1 RL2 Overview

The Remote L2 (RL2) is responsible for caching 1 to 16MB of target space using system memory for the actual cache data storage (remote cache data storage memory). This significantly reduces the size of the L2 cache module while allowing flexible configuration for target applications.

The RL2 does not modify any request for the CPU that is within a RL2 caching range. This allows the critical word first (CWF) to be processed by the downstream module for best performance. A downstream CBA3-to-CBA4 bridge is suggested if it's required for a downstream CBA4 fabric be able to split any request that is a wrapping burst that is not aligned to the burst size boundary and targeting system SRAM.

Since the RL2 is expected to cache data from a slow peripheral like a Flash device, the request is aligned to the burst boundary and the RL2 restores the critical word first returned data to the CPU. RL2 supports split burst returns, all requested burst must be returned as full bus words (8 bytes). Split burst returns less than 8 bytes disable the RL2 or post an error.

The Remote L2 (RL2) is an 8 Way set associative read allocate LRU cache. The RL2 allocation is based on full cache line reads. The SET index is based on the programmed operating size.

The RL2 uses 32-byte cache lines which matches the R5 CPU cache line so that best performance is achieved. That is, the RL2 does not read more data than the CPU requested. Once a cache line is in the RL2, the CPU can read any quanta of data from that cache line.

Since the RL2 cache data is in system memory, the system must ensure the RL2 is protected from other accesses to prevent cache corruption. The RL2 passes the secure, priv and privid from the originating initiator for any operations that DO NOT go to the translated remote cache data storage memory. Any transaction to the translated remote cache data storage memory contains the secure, priv and privid from the RL2 Privilege InterFace (PIF).

Since the RL2 filters an address range to be cached, the RL2 passes through any address not within that range or crosses a cache line boundary, the SOC must ensure that the placement of the RL2 does not create a loop in the fabric. The proper use of the RL2 is that the CPUs are upstream to the RL2, and the flash, and remote cache data storage memory is downstream to the RL2.

The RL2 can be used to cache peripherals like flash, PCIe or Hyperlink, so the SOC need to ensure that these targets are downstream of the RL2.

The RL2 does not change the size or 32-byte offset of any transaction going through the module, that is, the byte count or offset (caddress[4:0]) of any command are left as is. This allows the cbytecnt, wcnt, rcnt, rbytecnt and ralign[4:0] to be passed through unmodified. Since the remote cache data storage memory addresses are aligned to 2K boundaries, and WAYs are aligned to 32 boundaries, the offset alignment within a cache access to the remote cache data storage memory is always the same as the requested offset.

The RL2 support a Dual Mode which allows two cache lines to share a single WAY, offering double the remote cache data storage memory while only using the same number of total cache line entries supported. Since Dual Mode shares the same bits within a WAY, the cacheable range is reduced to support the management for the two sub cache lines.