People detection across diverse settings in real-time

Detect people in a wide variety of scenes with vision-based AI in >120 FPS using AI-accelerators

Overview

Application overview

Many applications across retail, security, factory safety and automotive assistance rely on the ability to accurately detect people. Vision-based systems, like automated checkout kiosks or self-parking cars, use cameras to perceive the environment in great detail. There are many strategies to detecting people in images; neural networks are the most robust across all types of scenes, lighting conditions and orientations. However, this task is computationally intensive to perform in real-time with low latency.

AI acceleration on the C7™ NPU with TI Deep Learning (TIDL) software is optimized for vision-centric tasks. Convolutional and transformer-based neural network architectures can run as quickly as the camera captures images, with performance to spare.

Additional on-chip accelerators like the Vision Preprocessing Accelerator (VPAC) performing image signal processor (ISP) functions handles image preprocessing functions like image-scaling, noise filtering, low-light enhancement, lens dewarping and other ISP functions. This ensures the processor can analyze images with low latency and efficient memory utilization, leaving plenty of headroom for the rest of the application, including complex postprocessing or even more AI models.

Starting evaluation

Data collection

People detection is a common vision AI task, and many public datasets are available. Most of TI's pretrained models have been trained on the COCO dataset from Microsoft, which contains nearly 100 different types of objects, including people. Large crowd-sourced datasets like COCO contain images from a wide variety of camera types, lighting conditions and scenes. This variety will improve the robustness and accuracy of the model.

Data can also be collected manually if your use-case requires maximal accuracy in specific scenes or conditions. It is best to collect data with the same type of camera, lens included, that will be used in practice.

Before training, the data must be annotated or labeled. Typically, humans in the frame are detected and localized with an object detection neural network, which uses a bounding box label represented as two pairs of X,Y pixel coordinates. In some applications like blurring the background for video conferencing, segmentation models are the better choice; these will use an image mask (i.e. a class label for each individual pixel) or polygons to represent the precise regions where people are in the image.

Data quality assessment

When collecting data for people detection, consider carefully what kind of variations will exist in the setting and scene.

Will the lighting conditions be consistent or variable? Is the same camera and lens always used? Will people be in full view always, or will they be only partly in the frame or even blocked, i.e. occluded, by other objects or people?

Your dataset should incorporate these variations as needed. Variations can also be artificially introduced using dataset augmentation libraries like albumentations or imgaug, but take care that labels still match where the people are and added noise does not hide anyone.

Build and train your model

Developers may start from pretrained models in the TI Model Zoo. If this gives sufficient accuracy for the application, then this is the fastest and simplest way to build your product.

Models can be trained on new data with CCStudio™ Edge AI Studio or edgeai-modelmaker within the edgeai-tensorlab repository on GitHub. This enables prevalidated object detection models like YOLO X. Your model will be trained on your dataset, starting from a version of the model that was pretrained on the MS COCO dataset. This reduces the number of data samples needed by several orders of magnitude. A proof-of-concept model may only need a few dozen images, though a production model will require several hundred at a minimum, ideally thousands for robust accuracy.

Otherwise, developers should train their model with frameworks like PyTorch or Tensorflow, and export the final version to ONNX or LiteRT (formerly tensorflow-lite) format, respectively. The exported model must be compiled with our deep learning software like edgeai-tidl-tools to be accelerated on the C7 NPU for AM6xA and TDA processors.

Find the right model for your needs

Many model architectures, like YOLO-X are defined to have subvariants like nano (n), small (s) or large (l). This will keep the same core structure of the network, but may increase or decrease the number of repeated layers or the size of those layers. Such subvariants makes it simple for developers to scale the model's accuracy and complexity (and thus, latency) without requiring they investigate entirely new architectures. Similarly, larger resolution models will increase latency, but can also improve accuracy.

Subvariants of many models, e.g. N=nano, S=small, L=large will increase the complexity and accuracy, at the cost of inference latency. Similarly, larger resolution images will increase latency, but can also improve accuracy. It is often best to start from the latency requirement, seek models that meet this target and then chose among those with the best accuracy on standard metrics like MS COCO. The TI Model Zoo hosts per-device model benchmarks to aid this selection process.

Deploying your model

Model deployment requires the model be compiled beforehand to optimize it for the target hardware. With tools like Edge AI Studio and edgeai-modelmaker, compilation is automatic. Otherwise, compiling models will require a separate step through software packages like edgeai-tidl-tools on the TI GitHub.

Model artifacts are deployed through runtimes like ONNX Runtime, LiteRT (formerly tensorflow-lite), and TVM using TI Deep Learning (TIDL) as the hardware backend for acceleration.

To deploy the model into an end-to-end vision application, use edgeai-gst-apps, which composes the pipeline with multiple stages of hardware acceleration for pre-processing and post-processing the image, in addition to accelerating the AI model itself.

In addition to edgeai-gst-apps, on GitHub, a demo application showing people tracking in retail environments that incorporates more application logic and visualization. Live tracking indicates how long a person has been in the scene and shows statistics for occupancy. Additionally, a heatmap displays where occupants have spent their time, which may hold valuable information for advertising or retail product positioning.

Choosing the right device for you

Device selection will depend on the level of AI performance required and the camera throughput (resolution and framerate). Refer to the table below for performance comparison across different devices. Note: For comprehensive benchmarks of these devices, use the model selection tool available on Edge AI Studio.

The benchmarks in the table below were produced using SDK version 10.1.

Product number	Processing core	NPU available	Visual people detection benchmarks
AM62A7	4x Arm® Cortex®-A53 + C7™ NPU	2 TOPS	8.69 ms 115 FPS	14.51 ms 68 FPS	38.6 ms 25 FPS
TDA4VE-Q1	4x Arm® Cortex®-A53 + C7™ NPU	8 TOPS	4.23 ms 236 FPS	5.46 ms 183 FPS	10.1 ms 99 FPS

Product number

Processing core

NPU available

Visual people detection benchmarks

YOLOX-Nano(416x416)

SSD-Mobilenetv2 (512x512)

YOLOX-Small (640x640) latency (ms)

AM62A7

4x Arm®
Cortex®-A53 + C7™ NPU

2 TOPS

8.69 ms

115 FPS

14.51 ms

68 FPS

38.6 ms

25 FPS

TDA4VE-Q1

4x Arm®
Cortex®-A53 + C7™ NPU

8 TOPS

4.23 ms

236 FPS

5.46 ms

183 FPS

10.1 ms

99 FPS

Get started

All the hardware, software and resources you’ll need to get started

Hardware

SK-AM62A-LP
The AM62A is the lowest-cost AI-accelerated device in the AM6xA family, and is best suited for evaluation. A generic USB camera or webcam can be used for image capture and model evaluation on live data.

Software & development tools

PROCESSOR-SDK-LINUX-AM62A
The Edge AI processor SDK is Linux-based and includes the necessary software components to run a compiled model with hardware acceleration. Other Edge AI accelerated processors may be substituted for AM62A.

CCStudio™ Edge AI Studio
This tool contains tools for training, compiling and deploying a model to TI edge AI processors. A model selection tool is available to view pre-generated benchmarks of popular models.

Command-Line tools
Tools for Micro Processor devices with Linux and TIDL support. TI's edge AI solution simplifies the whole product life cycle of DNN development and deployment by providing a rich set of tools and optimized libraries.

Supporting resources

arrow-right

Demo for people tracking

Demo application for deploying a people detection model and person-tracking postprocessing with visualizations.

Similar use cases

Industrial | Building automation | Vision

Object detection in vision systems with 25x lower latency

Find and localize specific objects, people in real-time at high framerate with AI-accelerated processors and industry-standard software.