According to together.ai, the success of Llama-3 is remarkable, showing that open-source models are closing the gap with their closed-source counterparts. By leveraging proprietary data, customers have been able to fine-tune small open-source software (OSS) models like Llama-3 to achieve higher accuracy than top closed-source models.
Fine-tuning process
Together AI’s platform allows users to fine-tune Llama-3-8B on proprietary data to create custom models that outperform large-scale OSS alternatives like Llama-3-70B and are comparable to leading closed-source models like GPT-4—all at a fraction of the cost. Our detailed guide shows how a fine-tuned Llama-3-8B model improves accuracy from 47% to 65%, outperforming Llama-3-70B’s 64% and approaching GPT-4’s 71% accuracy.
The fine-tuning process involves several steps, including transforming the dataset, uploading and validating the dataset, starting the fine-tuning job, and running the evaluation to compare the results. The initial step is to download the Math Instruct dataset from HuggingFace, clean it, and convert it into a JSONL file format suitable for the Together platform.
Transform Data Set
The transformation process involves loading the original JSON data, defining the Llama-3 prompt format, and converting the data into the correct format. This formatted data set is then validated using Together’s SDK before being uploaded for fine-tuning.
Upload and fine-tune
Once the dataset is ready, it is uploaded to Together AI via the Python SDK. Then, a fine-tuning job is created using the Llama-3-8B base model, specifying the dataset, number of epochs, and other parameters. Users can monitor the fine-tuning job via the Together AI dashboard.
Evaluation and Results
After fine-tuning, the performance of the model is evaluated using 1000 math problems. The accuracy of the fine-tuned Llama-3-8B model is compared with the baseline Llama-3-8B, Llama-3-70B, and GPT-4. The fine-tuned model achieves an accuracy of 65.2%, which outperforms the baseline model’s 47.2% and Llama-3-70B’s 64.2%, and approaches the 71.4% accuracy of GPT-4.
According to the results, the fine-tuned Llama-3-8B model outperforms the baseline model by about 20%, outperforms the best OSS model Llama-3-70B, and achieves over 90% of the accuracy of GPT-4. In addition, the fine-tuned model is faster, 50x cheaper than GPT-4, and provides full ownership of the model and weights.
conclusion
This fine-tuning approach demonstrates that small open source models like Llama-3-8B can be customized to perform specific tasks with high accuracy, speed, and cost efficiency. Users can fine-tune the models using proprietary data and host them on Together AI or run them independently, maintaining full control and ownership.
Trained on math problems, the Llama-3-8B model outperforms leading OSS models and approaches the performance of GPT-4 with a total fine-tuning cost of less than $100 on Together AI.
Image source: Shutterstock