SPRUJ79 November 2024 F29H850TU , F29H859TU-Q1
Flash memory is typically used to store application code. During code execution, instructions are fetched from contiguous memory addresses, except when a discontinuity occurs. Usually, the portion of code that resides in contiguous address locations makes up the majority of the application code, and is referred to as linear code. To improve the performance of linear code execution, the Flash read interface includes a code prefetch mechanism and block cache. The prefetch mechanism and block cache are available on FRI-1 (and FRI-2, if present). Figure 9-2 shows a functional block diagram of the Flash prefetch mechanism and block cache.
The prefetch mechanism does a look-ahead prefetch on linear address increments, starting from the address of the last instruction fetch. The Flash prefetch mechanism is disabled by default. To enable prefetch mode, set the PREFETCH_EN bit in the FRIx_INTF_CTRL register to 1, or call the Flash_enablePrefetch() driverlib function.
Each instruction fetch from Flash memory reads out 256 bits total—128 bits from each half of an interleaved pair (not counting ECC bits). The starting address of the access from Flash is automatically aligned to a 256-bit boundary, such that the instruction location is within the 256 bits to be fetched. When the prefetch mechanism is enabled, the 256 bits read from the instruction fetch are stored in a 128-bit wide by 4-level deep instruction prefetch buffer. The contents of this prefetch buffer are then sent to the CPU for processing as required.
The C29 CPU receives instruction packets up to 128 bits wide, over a 128-bit program read bus. Each instruction packet can contain a combination of 16-bit, 32-bit or 48-bit instructions. While the instructions are processing through the CPU, the Flash prefetch mechanism automatically initiates another access to the Flash bank to prefetch the next 256 bits. In this manner, the Flash prefetch mechanism works in the background to keep the instruction prefetch buffer as full as possible. Using this technique, the overall efficiency of sequential code execution from Flash is significantly improved.
In addition to the prefetch buffer, the Flash read interface includes a block cache mechanism, consisting of two 256-bit-wide by 16-deep blocks. The block cache sits between the prefetch buffer and the Flash banks, and is loaded with 256-bit data simultaneously with the prefetch buffer. Whenever there is a discontinuity in execution (such as a branch instruction), the mechanism checks if the requested data is already in the block cache. If so, the instruction data is read from the block cache and fed into the prefetch buffer, which then sends the data to the CPU on the next cycle. This improves the latency for discontinuity instructions to just one wait state, boosting code performance in short-branch loops and other minor discontinuities. To enable the block cache, set the CODE_CACHE_EN bit in the FRIx_INTF_CTRL register to 1, or call the Flash_enableCodeCache() driverlib function.
Because the block cache interfaces between the prefetch buffer and the Flash banks, the prefetch mechanism must be enabled for the block cache to function. If the block cache is enabled without the prefetch buffer, neither mechanism ever gets loaded, and Flash performance is equivalent to both mechanisms being turned off. Always enable the prefetch buffer when enabling the block cache.