Felix Pinkston
May 29, 2025 09:46
Mixed agent sorting (MOAA) is a breakthrough training method that enhances large language models by utilizing open source group intelligence as described in the new ICML 2025 paper.
Agents Sort (MOAA) shows significant development in the artificial intelligence field that optimizes the performance of the Lange Language Models (LLMS) as presented in the ICML 2025 paper. According to Together.ai, MOAA acts as an innovative training method that utilizes the collective intelligence of open source LLM to achieve efficient model performance.
MOAA introduction
MOAA integrated this ensemble into a single model based on the foundation built by the MOA (Mix-of-Agents) approach that previously surpassed GPT-4O. This method distilled the group intelligence of several models in a smaller and more efficient form, dealing with the high calculation cost and architectural complexity related to the MOA.
Improvement of performance
MOAA has strengthened its small model to achieve up to 10 times the size of performance. This is achieved while maintaining the cost efficiency and efficiency of small models. In fact, the model developed by MOAA has emphasized the potential of AI’s open source development by showing competitive performance for much larger models.
Experimental verification
In the experimental settings, MOAA was tested in several sorting benchmarks, including Alpacaeval 2, Arena-Hard and MT-Bench. These benchmarks include direct response comparison with GPT-4 to ensure consistent and high quality evaluation. The result indicates that the microsypeed models by the MOAA method have significant performance improvements and even surpasses models trained with more powerful data sets such as GPT-4O.
Cost efficiency
In terms of cost, MOAA provides more economical alternatives to using closed source models. For example, to create a Ultrafeedback sub-set with MOAA, $ 366 was required compared to $ 429 of the GPT-4O, and the cost reduction was reduced by achieving excellent performance.
Direct preference optimization
MOAA further improves the model performance through the Direct Preference Optimization (DPO) to improve the model by aligning the preference using the reward model. This approach greatly improves the performance of the trained models with supervised fine adjustment (SFT), showing the efficacy of MOAA in the preference alignment.
Self -improvement pipeline
The introduction of MOAA opens the way for its own AI development pipeline. By integrating MOAA production data, even the most powerful models in the MOA mix can achieve significant performance improvements, suggesting that continuous improvement is possible without relying on more powerful LLM.
As the AI community continues to explore the potential of the open source model, the MOAA is a promising way to develop the function of LLM, providing an expandable and efficient path for future AI development.
Image Source: Shutter Stock