People detection across diverse settings in real-time
Detect people in a wide variety of scenes with vision-based AI in >120 FPS using AI-accelerators
Application overview
Many applications across retail, security, factory safety and automotive assistance rely on the ability to accurately detect people. Vision-based systems, like automated checkout kiosks or self-parking cars, use cameras to perceive the environment in great detail. There are many strategies to detecting people in images; neural networks are the most robust across all types of scenes, lighting conditions and orientations. However, this task is computationally intensive to perform in real-time with low latency.
AI acceleration on the C7™ NPU with TI Deep Learning (TIDL) software is optimized for vision-centric tasks. Convolutional and transformer-based neural network architectures can run as quickly as the camera captures images, with performance to spare.
Additional on-chip accelerators like the Vision Preprocessing Accelerator (VPAC) performing image signal processor (ISP) functions handles image preprocessing functions like image-scaling, noise filtering, low-light enhancement, lens dewarping and other ISP functions. This ensures the processor can analyze images with low latency and efficient memory utilization, leaving plenty of headroom for the rest of the application, including complex postprocessing or even more AI models.
Starting evaluation
Data collection
People detection is a common vision AI task, and many public datasets are available. Most of TI's pretrained models have been trained on the COCO dataset from Microsoft, which contains nearly 100 different types of objects, including people. Large crowd-sourced datasets like COCO contain images from a wide variety of camera types, lighting conditions and scenes. This variety will improve the robustness and accuracy of the model.
Data can also be collected manually if your use-case requires maximal accuracy in specific scenes or conditions. It is best to collect data with the same type of camera, lens included, that will be used in practice.
Before training, the data must be annotated or labeled. Typically, humans in the frame are detected and localized with an object detection neural network, which uses a bounding box label represented as two pairs of X,Y pixel coordinates. In some applications like blurring the background for video conferencing, segmentation models are the better choice; these will use an image mask (i.e. a class label for each individual pixel) or polygons to represent the precise regions where people are in the image.
Data quality assessment
When collecting data for people detection, consider carefully what kind of variations will exist in the setting and scene.
Will the lighting conditions be consistent or variable? Is the same camera and lens always used? Will people be in full view always, or will they be only partly in the frame or even blocked, i.e. occluded, by other objects or people?
Your dataset should incorporate these variations as needed. Variations can also be artificially introduced using dataset augmentation libraries like albumentations or imgaug, but take care that labels still match where the people are and added noise does not hide anyone.
Build and train your model
Developers may start from pretrained models in the TI Model Zoo. If this gives sufficient accuracy for the application, then this is the fastest and simplest way to build your product.
Models can be trained on new data with CCStudio™ Edge AI Studio or edgeai-modelmaker within the edgeai-tensorlab repository on GitHub. This enables prevalidated object detection models like YOLO X. Your model will be trained on your dataset, starting from a version of the model that was pretrained on the MS COCO dataset. This reduces the number of data samples needed by several orders of magnitude. A proof-of-concept model may only need a few dozen images, though a production model will require several hundred at a minimum, ideally thousands for robust accuracy.
Otherwise, developers should train their model with frameworks like PyTorch or Tensorflow, and export the final version to ONNX or LiteRT (formerly tensorflow-lite) format, respectively. The exported model must be compiled with our deep learning software like edgeai-tidl-tools to be accelerated on the C7 NPU for AM6xA and TDA processors.
Find the right model for your needs
Many model architectures, like YOLO-X are defined to have subvariants like nano (n), small (s) or large (l). This will keep the same core structure of the network, but may increase or decrease the number of repeated layers or the size of those layers. Such subvariants makes it simple for developers to scale the model's accuracy and complexity (and thus, latency) without requiring they investigate entirely new architectures. Similarly, larger resolution models will increase latency, but can also improve accuracy.
Subvariants of many models, e.g. N=nano, S=small, L=large will increase the complexity and accuracy, at the cost of inference latency. Similarly, larger resolution images will increase latency, but can also improve accuracy. It is often best to start from the latency requirement, seek models that meet this target and then chose among those with the best accuracy on standard metrics like MS COCO. The TI Model Zoo hosts per-device model benchmarks to aid this selection process.
Deploying your model
Model deployment requires the model be compiled beforehand to optimize it for the target hardware. With tools like Edge AI Studio and edgeai-modelmaker, compilation is automatic. Otherwise, compiling models will require a separate step through software packages like edgeai-tidl-tools on the TI GitHub.
Model artifacts are deployed through runtimes like ONNX Runtime, LiteRT (formerly tensorflow-lite), and TVM using TI Deep Learning (TIDL) as the hardware backend for acceleration.
To deploy the model into an end-to-end vision application, use edgeai-gst-apps, which composes the pipeline with multiple stages of hardware acceleration for pre-processing and post-processing the image, in addition to accelerating the AI model itself.
In addition to edgeai-gst-apps, on GitHub, a demo application showing people tracking in retail environments that incorporates more application logic and visualization. Live tracking indicates how long a person has been in the scene and shows statistics for occupancy. Additionally, a heatmap displays where occupants have spent their time, which may hold valuable information for advertising or retail product positioning.
Choosing the right device for you
Device selection will depend on the level of AI performance required and the camera throughput (resolution and framerate). Refer to the table below for performance comparison across different devices. Note: For comprehensive benchmarks of these devices, use the model selection tool available on Edge AI Studio.
The benchmarks in the table below were produced using SDK version 10.1.
| Product number | Processing core | NPU available | Visual people detection benchmarks | ||||
|---|---|---|---|---|---|---|---|
|
YOLOX-Nano(416x416) |
SSD-Mobilenetv2 (512x512) |
YOLOX-Small (640x640) latency (ms) |
|||||
| AM62A7 | 4x Arm® | 2 TOPS | |||||
| TDA4VE-Q1 | 4x Arm® | 8 TOPS | |||||
All the hardware, software and resources you’ll need to get started
Hardware
SK-AM62A-LP
The AM62A is the lowest-cost AI-accelerated device in the AM6xA family, and is best suited for evaluation. A generic USB camera or webcam can be used for image capture and model evaluation on live data.
Software & development tools
PROCESSOR-SDK-LINUX-AM62A
The Edge AI processor SDK is Linux-based and includes the necessary software components to run a compiled model with hardware acceleration. Other Edge AI accelerated processors may be substituted for AM62A.
CCStudio™ Edge AI Studio
This tool contains tools for training, compiling and deploying a model to TI edge AI processors. A model selection tool is available to view pre-generated benchmarks of popular models.
Command-Line tools
Tools for Micro Processor devices with Linux and TIDL support. TI's edge AI solution simplifies the whole product life cycle of DNN development and deployment by providing a rich set of tools and optimized libraries.
Supporting resources
Demo application for deploying a people detection model and person-tracking postprocessing with visualizations.
Industrial | Building automation | Vision
Find and localize specific objects, people in real-time at high framerate with AI-accelerated processors and industry-standard software.