SPRADB0 may   2023 AM62A3 , AM62A3-Q1 , AM62A7 , AM62A7-Q1 , AM68A , AM69A

 

  1.   1
  2.   Abstract
  3.   Trademarks
  4. 1Introduction
    1. 1.1 Intended Audience
    2. 1.2 Host Machine Information
  5. 2Creating the Dataset
    1. 2.1 Collecting Images
    2. 2.2 Labelling Images
    3. 2.3 Augmenting the Dataset (Optional)
  6. 3Selecting a Model
  7. 4Training the Model
    1. 4.1 Input Optimization (Optional)
  8. 5Compiling the Model
  9. 6Using the Model
  10. 7Building the End Application
    1. 7.1 Optimizing the Application With TI’s Gstreamer Plugins
    2. 7.2 Using a Raw MIPI-CSI2 Camera
  11. 8Summary
  12. 9References

Training the Model

Once a model architecture is selected and a dataset is created, the next step is to retrain the model on a new dataset.

TI provides a suite of easy-to-use tools for model training and compilation. Any training framework can be used to train the deep learning model, so long as it only contains layers supported on the device's accelerator. Experts may wish to use tools they are more familiar with, such as training with PyTorch and exporting to ONNX for compilation. Less experienced users can use TI's tools like Model Composer and edgeai-modelmaker. The steps covered here use edgeai-modelmaker. An advantage to using TI tools is that compilation can be handled automatically at the end of training, such that Section 5 can be entirely skipped.

Edgeai-modelmaker, once setup/installed, uses a separate training framework like edgeai-mmdetection to perform the training itself. This starts from a pretrained model and fine-tunes via transfer learning by resetting the last layer of the network and using a low learning rate. For models supported in model zoo, this downloads the pretrained weights. Note that when using this process, layers before the last layer are not frozen and will change during training.

To train a model, the steps are as follows; refer to modelmaker READMEs for the most up-to-date instructions:

  • Setup the modelmaker repository, which includes setting up other training frameworks and TI tools. They are held in the same parent directory. There is a setup_all.sh script in the edgeai-modelmaker repo that handles this, including setup for other training frameworks.
    • A virtual python environment using Python 3.6 is recommended.
  • Place the training samples / files into "edgeai-modelmaker/data/projects/PROJECT_NAME" .
    • Run modelmaker on one of the example config YAML files to see the structure of these directories. It follows the COCO format with an "annotations" directory (containing one "instances.json" file in COCO format) and a directory of images.
    • The images must all be in the same directory (called images) without any subdirectories. The training framework assumes all image files to be in the same “images” directory.
  • Create a config file that points to the project data, selects the model, training parameters, and so forth. For more information, see config_detect_foods.yaml for an example.
  • Run the starter script "run_modelmaker.sh" using the config file above as the first argument.
    • See modelmaker’s README for how to set the device target, since processors like the AM62A includes a different C7xMMA deep learning accelerator such that the compilation tools for that target architecture are different that those of TDA4VM, AM68A, and so forth. It relies on a TIDL_TOOLS_PATH environment variable, which the SH script will set if not predefined.

If a GPU is present on the training machine, it is highly recommended to configure it for training, as it provides substantial speedup. The most important component of configuration is ensuring that the CUDA version is consistent with the version Pytorch/torchvision is compiled against. The one shipped with modelmaker at the time of this writing is CUDA 11.3. Correctly setting up these drivers can be a pain point, as there is a relationship between the CUDA version and the NVIDIA display/graphics driver.

If the dataset is fairly small (<10000 images), it is good to start from a network that is pretrained on a large, generic dataset like imagenet1k for image classification or COCO for object detection. Larger networks generally require more samples for training due to the increased number of parameters

For the food-recognition dataset of ~2500 images (after train-test split and augmentations), training took approximately 1.5 hrs on an A2000 GPU (6 GB VRAM) with 12-core CPU, 16GB RAM, and SSD for 30 epochs of training. This dataset reached a mean average precision (mAP; a common object detection accuracy metric) score of 0.68, which is extremely high. This stems from two facts:

  • The validation set used by the training framework automatically included post-augmented files, so some validation data was unfairly similar to some training samples, boosting reported accuracy.
  • The environment/background is highly controlled to provide consistency. The model performs well within the limited context, but may not generalize well to a totally new setting, such as trying to recognize the objects against a green or blue background. This may or may not important, depending on the use case. (1)

A full evaluation on the final test set was not performed for the retail-scanner model. Rather, a visual inspection was done to assert the model performed reasonably well before moving on to the next stage. Several iterations of training were performed by varying the number of epochs, degree of augmentation, and variety of augmentations.

In a retail scanner application, this is ostensibly unimportant given the typically controlled environment. However, a bad actor may attempt to trick the system bringing an oddly colored background, such as a purple piece of paper, such that items on it are difficult to recognize. New backgrounds can be substituted in as another form of augmentation.