SPRACX1A April 2021 – April 2021 TDA4VM , TDA4VM-Q1

3 Visual Localization on TDA4VM

This section describes how each of the subtasks that make up the visual localization algorithm described here maps seamlessly to the TDA4VM device. This application consists of three primary steps: Image pre-processing, DKAZE feature extraction and localization. Since the TDA4x family of devices were designed with applications like this in mind, each of these subtasks can be mapped to specialized hardware within the device, that ensures efficient and accurate execution of the tasks. A diagram of the TDA4VM device, the first variant of the TDA4x family available to customers, is shown in Figure 3-1.

The diagram shown below as Figure 3-1 is a block diagram that details the key components that make up the TDA4VM SoC. These include, a Deep Learning hardware accelerator coupled to a C7x DSP, a few general purpose Arm® cores, a vision pre-processing hardware accelerator and hardware accelerators designed specifically for certain widely used CV tasks. Next, the sub-tasks that make up the algorithm are mapped to the different components of the SoC in Figure 3-2.

Figure 3-1 TDA4VM Diagram

The first subtask, image pre-processing, can be performed entirely on the on-chip Vision Pre-Processing Accelerator, or VPAC, that includes an Image Signal Processor, or ISP. This module takes the image from the camera, which comes across on a CSI-2 interface, and performs the pre-processing steps necessary before further processing. The VPAC module on the TDA4VM consists of a Raw Front End (RFE), a dual noise filter, a global and local tone mapping module, a flexible color processing module, lens distortion correction, and a scaling engine. More information about the VPAC can be found here.

The next sub-task, DKAZE feature extraction, can be performed using the on-chip DNN hardware accelerator, the C7x/MMA. The C7x/MMA is an HWA designed specifically to accelerate commonly used Deep Learning operations. The C7x/MMA is one of the most power efficient Deep Learning Accelerators in the market today, since it was designed with automotive and industrial application in mind, by engineers with decades of experience. The C7x/MMA module also boasts one of the best power to TOPS ratios of any device in the market today. More information about C7x/MMA can be found here.

Finally, the remaining visual localization subtasks are performed on one of the DSPs available on the SoC, either the C7x or C66x.

The subtask mapping from the visual localization algorithm to the TDA4VM device is shown as a the flow diagram below in Figure 3-2.

Figure 3-2 Visual Localization Algorithm Flow on TDA4VM