SPRACZ5 December   2021 TDA4VM

 

  1.   Trademarks
  2. 1Introduction
  3. 2Enabling Thermal Shutdown Mandatory Step
  4. 3Thermal Mitigation Strategies at a High Level
    1. 3.1 Strategy 1: Auditing the Power Domains That Contribute to the Highest Power Consumption
    2. 3.2 Strategy 2: Disable Loading of Remote Core Firmware
    3. 3.3 Strategy 3: Disabling Modules on TDA4
      1. 3.3.1 Example: Disabling PCIe Instances on 7.3
    4. 3.4 Strategy 4: Dynamic Frequency Scaling (DFS)
    5. 3.5 Strategy 5: How to Reduce Frequency of Other Cores
  5. 4References

Introduction

This application report is applicable for J7ES/TDA4VM and derivatives. Before reading further, it is important to highlight the difference between TDA4 Silicon Revision 1.1 and 1.0 with regard to thermal sensors:

It is recommended to use Silicon Revision 1.1 for further experiments, however if for some reason you need to use Silicon Revision 1.0, be aware of the Errata i2128 — "VTM: VTM Temperature Monitors (TEMPSENSORs) should Use a Software Trimming Method", details can be found in J721E DRA829/TDA4VM Processors Silicon Revision 1.1/1.0. Based on the silicon revisions the method employed to read on-die temperatures differ.

Note: For comparison, the Jacinto 6 devices such TDA2/DRA7 support thermal management from Linux with the (including Dynamic Voltage Frequency Scaling (DVFS)) through "cpufreq" feature. When SoC temperature goes beyond a programmable threshold the Linux thermal framework employs the registered cooling agents to control the heat. On Jacinto 6, the thermal framework employs "cpufreq" to reduce the OPP (lower voltage and frequency). Since J7 supports a single OPP, software can lower the frequency and keeping the voltage at the same level.
Figure 1-1 TDA4VM Cooling Strategies
Note: Customer needs to ensure that post A72 frequency reduction the system meets the expected use case performance. SDK tries to maximize performance and silicon entitlement. Customer needs to cut down unwanted resources by following the example. This in turn will help the thermal cause.

DIE ID is useful to get the device-specific details, this is used by TI for further analysis. This is one of the first bits of information that need to be collected. Use the following commands from Linux command line to read the DIE ID Registers:

echo `devmem2 0x43000020 w | tail -n1`
echo `devmem2 0x43000024 w | tail -n1`
echo `devmem2 0x43000028 w | tail -n1`
echo `devmem2 0x4300002c w | tail -n1`
Note: The same register read operation can be performed with CCS or Lauterbach.
Note: How to check if reboot happened because of Thermal Shutdown (TSHUT).

CTRLMMR_WKUP_RESET_SRC_STAT Register For Cold boot, the value of this register is 0x0 (1st fresh boot). In the event of TSHUT, then the next boot in Linux:

devmem2 0x43000050 w
Read at address  0x43000050 (0xffff86750050): 0x01000000

So bit24 is set corresponding to THERMAL_RST indicating the reset due to TSHUT.