Crypto Flexs
  • DIRECTORY
  • CRYPTO
    • ETHEREUM
    • BITCOIN
    • ALTCOIN
  • BLOCKCHAIN
  • EXCHANGE
  • TRADING
  • SUBMIT
Crypto Flexs
  • DIRECTORY
  • CRYPTO
    • ETHEREUM
    • BITCOIN
    • ALTCOIN
  • BLOCKCHAIN
  • EXCHANGE
  • TRADING
  • SUBMIT
Crypto Flexs
Home»ADOPTION NEWS»Inference Engine 2.0 with Together AI, Turbo, and Lite Endpoints Announced
ADOPTION NEWS

Inference Engine 2.0 with Together AI, Turbo, and Lite Endpoints Announced

By Crypto FlexsJuly 21, 20242 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Inference Engine 2.0 with Together AI, Turbo, and Lite Endpoints Announced
Share
Facebook Twitter LinkedIn Pinterest Email

Terryl Dickey
18 July 2024 18:41

Together AI launches Inference Engine 2.0, offering Turbo and Lite endpoints for improved performance, quality, and cost efficiency.





Together AI announced the release of its new Inference Engine 2.0, which includes Turbo and Lite endpoints. This new inference stack is designed to deliver significantly faster decoding throughput and superior performance compared to existing solutions.

Performance Improvement

According to together.ai, Together Inference Engine 2.0 delivers 4x faster decoding throughput than open source vLLM and 1.3x to 2.5x faster than commercial solutions like Amazon Bedrock, Azure AI, Fireworks, and Octo AI. The engine achieves over 400 tokens per second on Meta Llama 3 8B thanks to advances in FlashAttention-3, faster GEMM and MHA kernels, quality-preserving quantization, and speculative decoding.

New Turbo and Lite endpoints

Together AI introduces new Turbo and Lite endpoints starting with Meta Llama 3. These endpoints balance performance, quality, and cost, allowing enterprises to avoid compromises. Together Turbo closely matches the quality of full-precision FP16 models, while Together Lite provides the most cost-effective and scalable Llama 3 models available.

Turbo endpoints provide fast FP8 performance while maintaining quality, are consistent with the FP16 reference model, and outperform other FP8 solutions on AlpacaEval 2.0. These Turbo endpoints are priced at $0.88 per million tokens for 70B and $0.18 per million for 8B, making them significantly cheaper than GPT-4o.

Together Lite endpoints provide high-quality AI models at a low cost using INT4 quantization, and for Llama 3 8B Lite, it is 6x cheaper than GPT-4o-mini at $0.10 per million tokens.

Adoption and Approval

More than 100,000 developers and companies, including Zomato, DuckDuckGo, and The Washington Post, are already leveraging the Together Inference Engine for their generative AI applications. Rinshul Chandra, COO of Food Delivery at Zomato, praised the engine for its high quality, speed, and accuracy.

Technological innovation

Together Inference Engine 2.0 incorporates several technological advancements, including FlashAttention-3, custom speculators, and quality-preserving quantization techniques. These innovations contribute to the engine’s superior performance and cost-effectiveness.

Future outlook

Together AI plans to continue pushing the boundaries of AI acceleration. The company aims to ensure that the Together Inference Engine remains at the forefront of AI technology by expanding support for new models, technologies, and kernels.

Turbo and Lite endpoints for the Llama 3 model are available starting today, with plans to expand to other models soon. Visit the Together AI pricing page for more information.

Image source: Shutterstock


Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Ether risks a $1.7K retest as traders fail to overcome a key resistance area.

April 4, 2026

Leonardo AI unveils comprehensive image editing suite with six model options

March 19, 2026

Ether Funds Turn Negative, But Bears Still Retain Control: Why?

March 11, 2026
Add A Comment

Comments are closed.

Recent Posts

Bitcoin Climbs Higher, but Sellers Defend $75,000 Area

April 17, 2026

DeFi, NFTs, And The Future Of Liquidity-Driven Blockchain

April 17, 2026

Solana (SOL) Upside Builds, $90 Currently Main Battlegrounds

April 16, 2026

Utexo And X402 Enable USDT Payments For The Agent Economy With Near-Instant Settlement

April 16, 2026

TSMC profits increase 58% due to surge in demand for AI chips

April 16, 2026

Tyga Enters 1win VIP Program, As Platform Blends Crypto And Entertainment

April 16, 2026

The Ethereum Foundation is still selling ETH after staking 70,000 coins.

April 16, 2026

ETH futures open interest rises as institutional investors return.

April 16, 2026

Bybit CEO Ben Zhou On Trust, AI, And The New Financial Platform At Paris Blockchain Week 2026

April 15, 2026

Bitunix Exchange Receives ISO 27001:2022 Certification, Enhancing Strong Protection for User Data

April 15, 2026

Bitunix Exchange Secures ISO 27001:2022 Certification, Reinforcing Strong Protection Of User Data

April 15, 2026

Crypto Flexs is a Professional Cryptocurrency News Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of Cryptocurrency. We hope you enjoy our Cryptocurrency News as much as we enjoy offering them to you.

Contact Us : Partner(@)Cryptoflexs.com

Top Insights

Bitcoin Climbs Higher, but Sellers Defend $75,000 Area

April 17, 2026

DeFi, NFTs, And The Future Of Liquidity-Driven Blockchain

April 17, 2026

Solana (SOL) Upside Builds, $90 Currently Main Battlegrounds

April 16, 2026
Most Popular

iFinex announced a collaboration agreement with the government of El Salvador to create a legal framework for digital assets and securities.

May 18, 2024

Ripple must share financial statements, XRP institutional sales data, and court rules on SEC requests.

February 6, 2024

A Trader’s Insight on Market Dynamics and Investment Strategies

November 27, 2023
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms and Conditions
© 2026 Crypto Flexs

Type above and press Enter to search. Press Esc to cancel.