NVIDIA announced that it has developed a new fine-tuning method called Weight-Decomposed Low-Rank Adaptation (DoRA) that provides a high-performance alternative to the widely used Low-Rank Adaptation (LoRA). According to the NVIDIA Technology Blog, DoRA improves both the learning ability and stability of LoRA without introducing additional inference overhead.
Advantages of DoRA
DoRA has demonstrated significant performance gains across a variety of large language models (LLMs) and vision language models (VLMs). For example, in the common sense reasoning task, DoRA outperformed LoRA with improvements of +3.7 points on Llama 7B and +4.4 points on Llama 3 8B. DoRA also showed better results in multi-turn benchmarks, image/video text understanding, and visual command coordination tasks.
This innovative method was accepted as an oral paper at ICML 2024, demonstrating its reliability and potential impact in the field of machine learning.
DoRA’s Mechanism
DoRA works by decomposing the pre-trained weights into magnitude and direction components and fine-tuning both. This method ensures efficient fine-tuning by leveraging LoRA for orientation adaptation. After the training process is over, DoRA merges the fine-tuned components back into the pre-trained weights, preventing additional latency during inference.
Visualizing the magnitude and directional differences between DoRA and the pretrained weights shows that DoRA makes significant directional adjustments with minimal change in magnitude, which is very similar to a full-fine-tuning (FT) learning pattern.
Performance across models
In a variety of performance benchmarks, DoRA consistently outperforms LoRA. For example, in large-scale language models, DoRA significantly improves common sense reasoning and conversation/following directions. In visual language models, DoRA shows excellent results in image-to-text and video-to-text understanding and visual instruction tuning tasks.
Large-scale language model
Comparative studies highlight that DoRA outperforms LoRA on common sense reasoning benchmarks and multi-turn benchmarks. In tests, DoRA demonstrates strong performance by achieving higher average scores across a variety of datasets.
Vision Language Model
DoRA also excels at vision language models, outperforming LoRA in tasks such as image-to-text understanding, video-to-text understanding, and visual command coordination. The effectiveness of this method is evident through higher average scores across multiple benchmarks.
Compression Awareness LLM
DoRA can be integrated into the QLoRA framework to improve the accuracy of low-bit pretrained models. Our joint efforts with Answer.AI on the QDoRA project have shown that QDoRA outperforms FT and QLoRA on Llama 2 and Llama 3 models.
Text-to-Image Generation
DoRA’s applications extend to text-to-image personalization via DreamBooth, delivering significantly better results than LoRA on challenging data sets such as 3D icons and Lego sets.
Meaning and future applications
DoRA is poised to become the default choice for fine-tuning AI models compatible with LoRA and its variants. Its efficiency and effectiveness make it an invaluable tool for applying foundational models to a variety of applications, including NVIDIA Metropolis, NVIDIA NeMo, NVIDIA NIM, and NVIDIA TensorRT.
For more information, visit the NVIDIA Technology Blog.
Image source: Shutterstock