In an effort to advance the field of gesture recognition, NVIDIA has been leveraging synthetic data to improve the capabilities of models such as PoseClassificationNet. According to an NVIDIA blog post written by Monika Jhuria, this approach is especially useful in scenarios where collecting real-world data is expensive or impractical.
Challenges in Action Recognition
Gesture recognition models are designed to identify and classify human movements, such as walking or waving. However, developing robust models that can accurately recognize a wide range of behaviors across a variety of scenarios remains challenging. The biggest obstacle is acquiring sufficient and diverse training data. Synthetic data generation (SDG) emerges as a practical solution to this problem by simulating real-world scenarios through 3D simulation.
Synthetic Data Generation with NVIDIA Isaac Sim
NVIDIA’s Isaac Sim, a reference application built on NVIDIA Omniverse, plays a key role in generating synthetic data. It is used in various areas such as retail stores, sports, warehouses, hospitals, etc. This process involves generating artificial data from a 3D simulation that mimics real data, allowing the model to evolve efficiently through iterative training.
Creating a human action recognition dataset
NVIDIA has developed a method to generate datasets for gesture recognition models using Isaac Sim. This involves creating action animations and extracting key points as input to the model. Isaac Sim’s Omni.Replicator.Agent extension facilitates the creation of synthetic data in a variety of 3D environments, providing features such as multi-camera consistency and location randomization.
Extend model capabilities with synthetic data
The generated synthetic data is used to extend the capabilities of the spatial-temporal graph convolutional network (ST-GCN) model. This model detects human actions based on skeletal information. NVIDIA’s approach includes training models such as PoseClassificationNet on 3D skeletal data generated by Isaac Sim using NVIDIA TAO for efficient training and fine-tuning.
Training and testing results
In our tests, the ST-GCN model trained only on synthetic data achieved an impressive average accuracy of 97% across 85 task classes. This performance was further validated using the NTU-RGB+D dataset, showing that the model can generalize well even when applied to real data that has not been explicitly trained.
Scale and scale data generation
NVIDIA also explored using NVIDIA OSMO, a cloud-native orchestration platform, to scale the data creation process. This significantly accelerates data generation, allowing you to generate thousands of samples with different action animations and camera angles.
For more information about NVIDIA’s approach to extending action recognition models using synthetic data, see the NVIDIA blog.
Image source: Shutterstock