NVIDIA’s EMBark revolutionizes training large-scale recommender systems.

Ted Hisokawa
November 21, 2024 02:40

NVIDIA introduces EMBark, which optimizes the embedding process to power deep learning recommendation models and significantly increases training efficiency for large-scale systems.

In an effort to increase the efficiency of large-scale recommender systems, NVIDIA introduced EMBark, a new approach that aims to optimize the embedding process of deep learning recommendation models. According to NVIDIA, recommender systems play a central role in the Internet industry, and training them efficiently is a critical task for many companies.

Challenges of training recommendation systems

Deep learning recommendation models (DLRMs) often incorporate billions of identity features and require robust training solutions. Recent advances in GPU technology, such as NVIDIA Merlin HugeCTR and TorchRec, have improved DLRM training by leveraging GPU memory to handle large-scale identity feature embeddings. However, as the number of GPUs increases, the communication overhead during embedding becomes a bottleneck, sometimes accounting for more than half of the total training overhead.

EMBark’s innovative approach

EMBark, presented at RecSys 2024, addresses these challenges by implementing a 3D flexible sharding strategy and communication compression techniques, aiming to balance the load during training and reduce communication time for embedding. The EMBark system includes three core components: an embedding cluster, a flexible 3D sharding scheme, and a sharding planner.

Includes cluster

These clusters promote efficient training by grouping similar features and applying custom compression strategies. EMBark categorizes clusters into data-parallel (DP), reduction-based (RB), and unique-based (UB) types, each suitable for different training scenarios.

Flexible 3D sharding method

This innovative scheme allows precise control of workload balancing across GPUs by leveraging 3D tuples to represent each shard. This flexibility addresses imbalance issues found in traditional sharding methods.

Sharding Planner

The sharding planner uses a greedy search algorithm to determine the optimal sharding strategy and improves the training process based on hardware and embedding configuration.

Performance and Evaluation

The efficiency of EMBark was tested on NVIDIA DGX H100 nodes, demonstrating significant improvements in training throughput. Across a variety of DLRM models, EMBark achieves an average 1.5x increase in training speed, with some configurations being up to 1.77x faster than existing methods.

EMBark significantly improves the efficiency of large-scale recommender system models by strengthening the embedding process, setting a new standard for deep learning recommender systems. To get more detailed insight into EMBark’s performance, you can view its research paper.

Image source: Shutterstock

NVIDIA’s EMBark revolutionizes training large-scale recommender systems.

Stablecoin startups surpass 2021 venture capital peaks as institutional money spills.

Gala Games improves leader board rewards and introduces preference systems.

Ether Leeum Whale starts a $ 11 million leverage betting in the 30% increase in ETH prices.

$ 90m NOBITEX HACK: Fault by layer

Block3 Unveils Prompt-To-Game AI Engine As Presale Launches

$70M Committed To Boba Network As Foundation Concludes BOBA Token Agreement With FTX Recovery Trust

Limitless Raise $4m Strategic Funding, Launch Points Ahead Of TGE

Take Advantage Of BJMining’s Passive Income Opportunities As The XRP Ecosystem Rises

Circle is looking for a US Trust Bank Charter for USDC Reserve Management.

Hyra Network Honored As “Technology Startup Of The Year” At The 2025 Globee® Awards

Shheikh.io Launches SHHEIKH Token Presale For Blockchain-Backed Real‑World Asset Investments

What should I do with encryption?

AAS Miner Will Become The Top Free Cloud Mining Platform For Passive Income From Mining Cryptocurrencies Such As BTC And ETH In 2025

Bitcoin is integrated into less than $ 108,000, but the eyes are set for $ 115,000.

Top Insights

$ 90m NOBITEX HACK: Fault by layer

Block3 Unveils Prompt-To-Game AI Engine As Presale Launches

$70M Committed To Boba Network As Foundation Concludes BOBA Token Agreement With FTX Recovery Trust

Most Popular

Arkham Intelligence introduces a comprehensive token balance chart.

The LTO Network provides a layer 1 blockchain to combat counterfeiting.

eth2 quick update number 3

NVIDIA’s EMBark revolutionizes training large-scale recommender systems.

Challenges of training recommendation systems

EMBark’s innovative approach

Includes cluster

Flexible 3D sharding method

Sharding Planner

Performance and Evaluation

Related Posts