# Application Note **OptiFlash Memory Technology**



Sanmveg Saini

#### ABSTRACT

What is OptiFlash memory technology?

OptiFlash memory technology is a TI patented technology that enables cost effective and scalable highperformance Microcontrollers (MCU) with external flash. With traditional MCUs the ratio of Flash:SRAM typically ranges between 8:1 to perhaps even 12:1. However, with TI's AM26x MCUs equipped with OptiFlash and large on-chip SRAM (OCSRAM) and tightly coupled memory (TCM), it is possible to implement a cost optimized system, where the OCSRAM scales efficiently with external on-PCB Flash. Below figure depicts a TI Sitara<sup>™</sup> AM26x MCU interfacing to external flash via OptiFlash compared to a traditional MCU architecture.



# **Table of Contents**

| 1 Why External on-PCB Flash?                      | 2 |
|---------------------------------------------------|---|
| 2 OptiFlash Detailed Overview                     |   |
| 2.1 OptiFlash System KPIs Key Performance Indices |   |
| 3 Summary                                         |   |
| 4 References                                      |   |
| 5 Revision History                                |   |
|                                                   |   |

# Trademarks

Sitara<sup>™</sup> is a trademark of Texas Instruments. All trademarks are the property of their respective owners.

1



# 1 Why External on-PCB Flash?

MCU memory needs and CPU performance levels requirements are continuously increasing. MCU industry product roadmaps with up to 5K DMIPs and 64MB of on-chip flash are common, but it is well understood that embedded flash technologies are not expected to scale beyond 22 nm due to the high voltage (HV) gate oxide-based transistors required for programing and erasing flash bits. For example, on 28nm, 18 additional masks or reticles are required (vs. the CMOS only process technology).<sup>1, 2.</sup>

In comparison, typical 8MB Octal Serial Peripheral Interface (OSPI) flash can vary in cost from ~\$0.50 - ~0.80. Figure 1-1 shows the AM263P CPU + TCM architecture with the flash sub-system (FSS) that includes OptiFlash.



Figure 1-1. AM263P CPU + TCM Architecture

The additional cost of embedded flash technology results in either high-cost MCUs or architectures that reduce the amount of OCSRAM to achieve a specific cost point.

Since OCSRAM or TCM is always required for highly deterministic, low latency applications (such as real-time control), MCU architectures with a larger OCSRAM to Flash ration perform better. For example, the AM263P TCM has an access time of 2.5 ns and OCSRAM worst-case accesses can vary between 60 ns-90 ns.

Finally, alternative non-volatile memory (NVM) technologies such as, phase-change SRAM (PC-SRAM) or magnetic RAM (MRAM), are not <u>yet</u> ready for use in volume production of high reliability, low defective parts per million (DPPM) applications such as automotive and industrial.



Figure 1-2. OptiFlash Architecture Diagram

# 2 OptiFlash Detailed Overview

OptiFlash consists of TI patented hardware and software enhancements that can accelerate boot from on-PCB flash and enable secure (ISO 21434) and high integrity (ISO 26262, IEC 61508) compliant data transfers. OptiFlash allows TI MCUs to improve the Flash:SRAM ratio to anywhere between 8:1 and 4:1. OptiFlash can also provide the flexibility of addressing up to 128MB of external on-PCB flash. A detailed overview of the various accelerators that are a part of OptiFlash shown in Figure 1-2.

**Boot/Overlay Accelerator**: comprises of fast local copy (FLC), a dedicated DMA engine, capable of readingwhile-writing to download code (from external flash) while allowing CPU execution in parallel. "Up to 2MB sized boot images can be downloaded in approximately 9 mS (milliseconds). System initialization time will depend on the application. From Software side, application layout in memory based on call graph is done to optimally leverage the pre-fetch hardware.

**Remote L2 (RL2) Cache**: comprises of customized caches for on-PCB flash for read-only data/code that can reduce flash read access time by as much as ~ 90%.

**Smart Placement**: Provides tools to implement profiling-based application optimization and uses TI Arm CLANG Compiler enhancement to profile applications software and identify deadline critical software code to place either in TCM or OCSRAM for an up to approximately 20% - 40% performance boost.

The above image shows impact of smart placement on other variables. All of this is possible because of compiler enhancement and new tools which are part of smart placement.

**OptiShare**: Tools to automatically identify common code across cores which leverages hardware feature of Region Address Translator (RAT) to reduce code size by placing shared core/read-only-data single time in the memory.

**XIP (eXecute-in-place) Safety**: Implements on-the-fly (in-line) hardware single error correct, dual error detect (SECDED) Error Correction Code (ECC) to improve data integrity for Functional Safety Compliant Applications. Includes four syndromes per 32-byte chunk, ECC in address and MAC and a safety compliant time-out-gasket (TOG) that interrupts the CPU if the on-PCB flash is 'hung' for some reason.

**XIP Security**: implements on-the-fly hardware decryption and authentication for Cyber Security (for example, ISO 21434) compliance. Includes (per client) firewall to prevent un-intended access from an un-authorized host.



**Firmware-over-the-air (FOTA) Updates**: hardware acceleration for XIP + simultaneous WRITE which could enable 10x – 80x reduction in XIP down-time while performing RWW.

# 2.1 OptiFlash System KPIs Key Performance Indices

A direct comparison of embedded flash MCU devices to OptiFlash devices is not relevant as overall architectures are different. Yet, as mentioned previously, both require application developers to execute timecritical code from on-chip memory to meet necessary processing timelines. To show how this balance of flash and on-chip memory performance can be achieved, TI has developed a set of system KPIs that measure OptiFlash performance and its constituent accelerators and tools. Following KPIs are being measured using application-1 that emulates a poorly cached AutoSAR application, and Application-2, which is a real-world networking example with Lwip client-server + Mbed TLS use case.

| Test | KPI                                | Without OptiFlash               | OptiFlash Enabled                                                                               | App. Use case   |
|------|------------------------------------|---------------------------------|-------------------------------------------------------------------------------------------------|-----------------|
| XIP  | Basic (without Safety<br>Security) | CPU DMIPs loss of 2-3x observed | DMIPs degradation limited to 1.1x with 128kB RL2.                                               | App-1 and App-2 |
|      | W/ Safety and Security             |                                 | DMIPs degradation limited<br>to 1.4x with Hardware<br>accelerators for in-line<br>ECC and OTFA. | App-1 and App-2 |

#### Table 2-1. DMIPS Loss With and Without OptiFlash RL2

Note

In the above results, the "with safety and security" scenario includes in-line error-correction-code (ECC) and on-the-fly-authentication (OTFA).

Table 2-2 show the impact of the configurable RL2 cache. A cache size larger than 128KB did not show further improvement in XIP performance. The optimal RL2 cache size also eliminated the difference in processing timelines with and without security and safety. Note that degradation is in comparison of internal RAM. For example, when L2 cache was disabled, application performed 2.4 times worst when run from external flash in comparison to internal RAM.

| Test/ Cache Size Used (kB) |     | Performance<br>Degradation With Safety<br>and Security | Performance<br>Degradation Without<br>Safety and Security | App. Use Case |  |
|----------------------------|-----|--------------------------------------------------------|-----------------------------------------------------------|---------------|--|
| RL2 access size            | 0   | 2.4x                                                   | 2.2x                                                      | App-1         |  |
|                            | 16  | 2.2x                                                   | 1.9x                                                      |               |  |
|                            | 32  | 1.9x                                                   | 1.7x                                                      |               |  |
|                            | 128 | 1.1x                                                   | 1.1x                                                      |               |  |
|                            |     |                                                        |                                                           |               |  |

#### Table 2-2. Impact of the Configurable RL2 Cache

The Smart Placement tool was used to analyze the application and place time-sensitive code or data in TCM, OCRAM, or flash. Table 2-3 showed that the Smart Placement Tool enabled 19% application execution time improvement when utilized for both code and data.

| Test/ TCM Size Used              | (kB) | Data vs. Code |        | % Improvement<br>With Smart<br>Placement | App Use Case |
|----------------------------------|------|---------------|--------|------------------------------------------|--------------|
| Execution time                   | 0    | n/a           | 27,583 | N/A                                      | App 1        |
| improved with Smart<br>Placement | 64   | code          | 25,342 | 9%                                       |              |
|                                  | 64   | code + data   | 22,537 | 19%                                      |              |

#### Table 2-3. Impact of Smart Placement Tool on App1

In another OptiFlash XIP test, an EtherNet/IP protocol application was implemented with XIP mode and then with XIP using the Smart Placement tool. As can be seen, the CPU loading was reduced and the worst-case jitter was notably improved with Smart Placement.

4

# Table 2-4. Impact of Smart Placement Tool on OOB EtherNet/IP Protocol Application

| Test                  | Max. CPU Loading (%) | Worst case jitter | App. Use case                    |
|-----------------------|----------------------|-------------------|----------------------------------|
| XIP                   | 98.91                | 115.7             | EtherNet/IP protocol application |
| XIP + Smart Placement | 85.97 (13% better)   | 68 (40% better)   |                                  |

The OptiShare technology was used to optimize code sharing among R5F cores for an IPC application on the MCU+ SDK. When using OptiShare, the code size was reduced by 10%.

### Table 2-5. Impact of OptiShare on OOB IPC Example

| Test                                  | Code Size (kB) | Memory Footprint Optimized<br>(%) | App. Use Case                  |
|---------------------------------------|----------------|-----------------------------------|--------------------------------|
| Code size reduction with<br>OptiShare | 73             | ~10 (lower code size)             | SDK Out-of-box IPC application |

# 3 Summary

High performance microcontroller when used with external flash provides some key advantages such as cost and memory scalability. However, external flash comes with its own challenges which are addressed by the OptiFlash. OptiFlash is an ecosystem of hardware, drivers and tools that aim to optimize application performance, boot time and lower memory wastage, and so forth at system level and some of these KPIs has been verified using simulation and on actual silicon. OptiFlash also provides more features such as FOTA accelerator to accelerate and simplify FOTA implementation. Hence, with OptiFlash all the advantages of external flash can be taken without facing the challenges that external flash provides, as they are already solved by this technology.

#### 4 References

- Selecting the Optimal Flash Device for your Embedded Application
- The Case for De-Integrating Embedded Flash

# **5 Revision History**

NOTE: Page numbers for previous revisions may differ from page numbers in the current version.

| C | hanges from Revision * (November 2023) to Revision A (November 2023) | Page |
|---|----------------------------------------------------------------------|------|
| • | Updated Section 2                                                    | 3    |
| • | Updated Section 2.1                                                  | 4    |

# IMPORTANT NOTICE AND DISCLAIMER

TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATA SHEETS), DESIGN RESOURCES (INCLUDING REFERENCE DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES "AS IS" AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD PARTY INTELLECTUAL PROPERTY RIGHTS.

These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriate TI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicable standards, and any other safety, security, regulatory or other requirements.

These resources are subject to change without notice. TI grants you permission to use these resources only for development of an application that uses the TI products described in the resource. Other reproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third party intellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims, damages, costs, losses, and liabilities arising out of your use of these resources.

TI's products are provided subject to TI's Terms of Sale or other applicable terms available either on ti.com or provided in conjunction with such TI products. TI's provision of these resources does not expand or otherwise alter TI's applicable warranties or warranty disclaimers for TI products.

TI objects to and rejects any additional or different terms you may have proposed.

Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265 Copyright © 2023, Texas Instruments Incorporated