SPRADB0 may 2023 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM68A , AM69A

7 Building the End Application

Once the model is confirmed to work, an end-to-end application can be built around it.

An end-to-end application means live input from a camera, preprocessing, running the model itself, and doing something with the output suited to a real-life use-case. In the retail scanner use-case, that something is using the identified objects to populate and display an order for the customer.

To build a data processing pipeline in Linux, gstreamer (GST) is recommended. With the GST plugins used across edgeai-gst-apps, everything but the application code can be done purely with a GST pipeline. GST pipelines allow different stages of processing to happen in a parallel streaming fashion, which improves overall performance as compared to a single-threaded program. TI offers plugins to interact with hardware accelerators for image processing and deep learning. For the end application, GST can use appsink and appsrc plugins to expose an interface into and out of application code, respectively.

In the retail-scanner application (see gst_configs.py), a reference to an appsink is retrieved and used to “pull samples” that hold a raw byte array, representing the chunk of data. This might be an image, a chunk of audio, or the output tensor of a neural network. The structure of the data needs to be known before-hand, such as dimensions and pixel-format. For typical data like images (in GST terms, raw/x-video), the “caps” which describe input and output of a GST plugin give information about the resolution and pixel format.

When first designing the retail scanner demo application, it was most straight-forward to pull input frames from the camera in RGB format at full resolution from the camera. Python application code handled preprocessing, inference, postprocessing, and adding a receipt-like image to the final frame to display. The application borrowed from the main apps_python code, but extra image processing within display.py was slow and had highly variable latency. This worked for flushing all the components and testing functionality, but performance was unacceptable for an interactive application at about 5-6 fps with up to 3 seconds latency. The mobilenetv2SSD model recognizing foods was plenty fast at >60-fps, but the application code on Arm CPU was too slow to keep up with model inference or the 30-fps camera.