SPRADP7A February   2025  – March 2025 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM67A , TDA4AEN-Q1

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
  5. 2Building Blocks of an RGB-IR Vision Pipeline
    1. 2.1 CSI Receiver
    2. 2.2 Image Signal Processor
    3. 2.3 Video Processing Unit
    4. 2.4 TI Deep Learning Acceleration
    5. 2.5 GStreamer and TIOVX Frameworks
  6. 3Performance Considerations and Benchmarking Tools
  7. 4Reference Design
    1. 4.1 Camera Module
    2. 4.2 Sensor Driver
    3. 4.3 CSI-2 Rx Driver
    4. 4.4 Image Processing
    5. 4.5 Deep Learning for Driver and Occupancy Monitoring
    6. 4.6 Reference Code and Applications
  8. 5Application Examples and Benchmarking
    1. 5.1 Application 1: Single-stream Capture and Visualization with GST
    2. 5.2 Application 2: Dual-stream Capture and Visualization with GST and TIOVX Frameworks
    3. 5.3 Application 3: Representative OMS-DMS + Video Telephony Pipeline in GStreamer
  9. 6Summary
  10. 7References
  11. 8Revision History

Deep Learning for Driver and Occupancy Monitoring

Driver monitoring systems (DMS) and occupancy monitoring systems (OMS) are typically separate processing paths from an image analysis and deep-learning perspective. In both cases, the IR frames from the RBG-IR camera are typically used. This way, the vehicle interior can be sufficiently illuminated with non-visible light to allow accurate monitoring while preserving the driver’s nighttime vision.

The images are therefore analyzed as single-channel, grayscale images. Rather than providing 3-channel RGB data, a single channel is being processed, thereby reducing processing requirements and DDR bandwidth. However, analyzing single-channel (for example, grayscale) images implies that the neural network models are also trained on such data, whereas typical models are trained for 3-channel RGB. TIDL is fully capable of processing an arbitrary number of input channels and resolutions.

Deep Learning for Driver Monitoring

Driver monitoring must determine when the driver is or is not attentive to the road. This can nominally be identified as fatigue or distractions. In both cases, the driver’s head position, gaze and eyes are of primary interest. Eye and eyelid movements occur rapidly, so analysis must be at an appropriate frame rate, often around 30 FPS. Local regulatory standards like Euro NCAP can alter this requirement. Simpler DMS systems can use head-pose only, but this is unable to handle difficult lizard type scenarios, in which the driver’s head position points toward the road but the eyes are looking elsewhere, such as a cell phone.

A typical flow for DMS can look like the following in Figure 4-4. Please note there are several viable approaches and techniques. For example, some systems use head-pose detection instead of gaze detection to determine driver distractedness.

 Driver Monitoring Image Analysis FlowFigure 4-4 Driver Monitoring Image Analysis Flow

The deep learning models provide information about the driver’s attentiveness. However, some degree of postprocessing across frames is required. For example, a single frame showing closed-eyes can be a blink, but several in a row can indicate drowsiness or microsleep. Similarly, looking away from the road in front of the vehicle can indicate distraction or be a necessary driving activity, like looking in the direction of an upcoming turn. In this way, deep learning algorithms for DMS must provide sufficiently high frame rate to enable such tracking from across multiple frames.

Deep Learning for Occupancy Monitoring

Occupancy monitoring collects information about which seats are occupied within the vehicle and how seatbelts are being utilized. This changes less quickly than a driver’s head position and eye movements, so the frame rate requirements are lower; 1 to 5 FPS are acceptable in most circumstances. However, the region of interest is larger, typically the entire vehicle’s interior as opposed to the driver’s seat only. Therefore, models must run at higher resolution and have higher processing requirements. OMS are responsible for checking which seats are occupied, if seatbelts are used correctly, and how airbags must be deployed in case of a crash.

An example data flow for occupancy monitoring is shown in Figure 4-5. A single image can be processed by multiple stages of neural networks to determine how many passengers are present, how the passengers are positioned, and how this affects airbag deployments.

 Occupancy Monitoring Image Analysis
                    FlowFigure 4-5 Occupancy Monitoring Image Analysis Flow

Operating Multiple Models with TIDL

TIDL allows multiple deep learning models to be loaded at the same time. So long as the models’ weights and configurations fit into the available persistent DDR space, the application can initialize and run multiple models in any order. No special handling is required for concurrent calls to TIDL.

Several of the models described in this report have different frame rate requirements and different levels of complexity. TIDL enables prioritization and preemption of models to fit such applications. For example, the DMS models, which require high frame rate, can be run at a higher priority to make sure the models run quickly enough with respect to the inter-frame latency. Then OMS models, which have lower FPS requirement and can be larger, can run at lower priority to take advantage of unused cycles between DMS frames. Developers need to analyze the runtime latencies of the models and make sure there is sufficient headroom to run each model at the required frame rate and within a latency bound.