Large-scale, use-case-specific synthetic data is becoming increasingly important in real-world computer vision and AI workflows. According to the NVIDIA Technical Blog, NVIDIA is revolutionizing the creation of physics-based virtual replicas of environments such as factories and retail spaces using digital twins, enabling accurate simulations of real-world environments.
Augmenting AI with synthetic data
Built on NVIDIA Omniverse, NVIDIA Isaac Sim is a comprehensive application designed to facilitate the design, simulation, testing, and training of AI-powered robots. Isaac Sim’s Omni.Replicator.Agent (ORA) extension is specifically used to generate synthetic data for training computer vision models, including the TAO PeopleNet Transformer and the TAO ReIdentificationNet Transformer.
This approach is part of NVIDIA’s broader strategy to improve multi-camera tracking (MTMC) vision AI applications. NVIDIA aims to improve the accuracy and robustness of these models by generating high-quality synthetic data and fine-tuning the base models for specific use cases.
ReIdentificationNet Overview
ReIdentificationNet (ReID) is a network used to track and identify objects across multiple camera views in MTMC and real-time location system (RTLS) applications. It extracts embeddings from detected object crops to capture essential information such as shape, texture, color, and appearance. This allows for the identification of similar objects across multiple cameras.
Accurate ReID models are essential for multi-camera tracking, as they help correlate objects across different camera views and maintain continuous tracking. The accuracy of these models can be significantly improved by fine-tuning them with synthetic data generated from ORA.
Model architecture and pre-training
The ReIdentificationNet model takes RGB image crops of size 256 x 128 as input and outputs an embedding vector of size 256 for each image crop. The model supports ResNet-50 and Swin transformer backbones, and the Swin variant is a human-centric baseline model pretrained on about 3 million image crops.
For pre-training, NVIDIA adopted a self-supervised learning technique called SOLIDER, which is built on DINO (label-free self-distillation). SOLIDER uses prior knowledge of human image crops to generate pseudo-semantic labels, and learns human representations with semantic information. The pre-training dataset includes a combination of NVIDIA proprietary datasets and Open Images V5.
Fine-tuning the ReID model
Fine-tuning involves training the pre-trained model on a variety of supervised person re-identification datasets, including both synthetic and real NVIDIA proprietary datasets. This process helps mitigate issues such as identity transitions, which occur when the system incorrectly associates identities due to high visual similarity between different individuals or changes in appearance over time.
To fine-tune the ReID model, NVIDIA recommends using ORA to generate synthetic data so that the model learns the unique characteristics and nuances of a specific environment, resulting in more reliable identification and tracking.
Simulation and data generation
Isaac Sim and Omniverse Replicator Agent extensions are used to generate synthetic data to train the ReID model. Best practices for configuring the simulation include considering factors such as the number of characters, character uniqueness, camera placement, and character motion.
For ReIdentificationNet, the number of characters and uniqueness are very important. The model benefits from more unique IDs. Camera placement is also important, as the cameras should be placed to cover the entire floor area where characters are expected to be detected and tracked. Character motion in Isaac Sim ORA can be customized to provide flexibility and variety in movement.
Training and Evaluation
Once the synthetic data is generated, it is prepared and sampled to train the TAO ReIdentificationNet model. Training tricks such as ID loss, triplet loss, center loss, random erasure augmentation, warmup learning rate, BNNeck, and label smoothing can improve the accuracy of the ReID model during the fine-tuning process.
The evaluation script is used to validate the accuracy of the ReID model before and after fine-tuning. Metrics such as Rank 1 accuracy and Mean Average Precision (mAP) are used to evaluate the performance of the model. Fine-tuning using synthetic data has been shown to significantly increase the accuracy scores, as demonstrated in NVIDIA’s internal testing.
Distribution and Conclusion
After fine-tuning, the ReID model can be exported to ONNX format for deployment in MTMC or RTLS applications. This workflow allows developers to improve the accuracy of ReID models without extensive labeling work, while leveraging ORA’s flexibility and developer-friendly TAO API.
Image source: Shutterstock