TEAL, Introducing Training-Free Activation Sparsity to Improve LLM Efficiency

Jack Anderson
September 1, 2024 08:34

TEAL provides a learning-free approach to activation sparsity that significantly improves the efficiency of large-scale language models (LLMs) with minimal degradation.

TEAL (Training-Free Activation Sparsity in LLMs) has emerged as a groundbreaking approach to improve the efficiency of large-scale language models (LLMs) without additional training. According to together.ai, the method achieves 40-50% activation sparsity with minimal degradation by applying size pruning to the hidden state throughout the model. This innovation allows transferring fewer weights to on-chip memory, solving the memory-bound nature of LLM inference and translating into a 1.53-1.8x wall-clock speedup in single-batch decoding.

background

LLM is known for its enormous size, which makes it difficult during inference, mainly due to the speed limitation of transferring parameters from device memory to registers. Various techniques such as quantization, weight sparsity, and speculative decoding have been developed to address this ‘memory wall’. Activation sparsity, which utilizes zero values in the hidden state, is a less explored method that avoids transferring unnecessary weight channels during decoding.

Older models like OPT-175B exhibit high activation sparsity, allowing significant speedups with methods like DejaVu. However, newer models like LLaMA have moved to SwiGLU variants, making these methods difficult to apply. Recent studies have attempted to ‘recover’ models that exhibit activation sparsity, but these models require extensive retraining on large datasets.

Motivational Research: Activation Distribution Characteristics of LLM

Studies have shown that the hidden states of LLM are outliers, zero-centered, and have similar distribution shapes across layers. Specifically, the states before MLP and Attention Blocks are Gaussian in shape, and the intermediate states are Laplacian in shape. This suggests that many low-amplitude activations can be eliminated with negligible model degradation, a notion also observed in other studies such as CATS.

teal

TEAL introduces optimizations by sparsifying all tensors in the model, achieving near-zero degradation at 25% sparsity and minimal degradation at 40% sparsity. At 50% sparsity, the Llama-3 variant shows slightly more degradation than its predecessors Llama-2 and Mistral. TEAL outperforms CATS by sparsifying all tensors and producing lower error by sparsifying the input.

Improved hardware recognition speed

To benchmark real-world speedups, TEAL is integrated with GPT-Fast, achieving significant speedups of up to 1.53x and 1.8x at 40% and 50% sparsity, respectively. The kernel is faster than cuBLAS at 0% sparsity, but there is still room for further optimization.

Compatibility with quantization

TEAL also demonstrates compatibility with quantization, another technique for efficient LLM inference. Combining activation sparsity and quantization opens up a new regime for transferring memory to GPU registers, leading to faster inference speeds.

Application

The most immediate application of TEAL is to accelerate inference in resource-constrained edge settings, especially in single-batch scenarios. It also enables inference providers like Together AI, which hosts over 100 open-source models on large fleets of GPUs, to serve their models more efficiently.

Image source: Shutterstock

TEAL, Introducing Training-Free Activation Sparsity to Improve LLM Efficiency

Hong Kong regulators have set a sustainable finance roadmap for 2026-2028.

ETH has recorded a negative funding rate, but is ETH under $3K discounted?

AAVE price prediction: $185-195 recovery target in 2-4 weeks

Cryptocurrency ETFs are diverse: Bitcoin is experiencing $60 million in outflows. ETH, SOL, and XRP funds are shown in green.

Cryptocurrency outflows reach $1.7 billion, but tokenized metals attract investors.

A sharp drop in spot trading volume triggered a significant Bitcoin correction, with Anchor Mining standing out amidst market turmoil with a stable daily return of $3,656.

Brevis and BNB Chain Expand Privacy Infrastructure Partnership –

LabGemTraders Launches FairCarats FCAR Utility Vouchers, Private Sales Coming Soon

How high can $SHIB go in the next cryptocurrency rally?

Onre Tokenized Pool Audit Summary

NFT sales drop 38% due to weakening cryptocurrency market

The cryptocurrency veteran is back with caricatures, privacy apps, and Gasless L2.

Ethereum leverage remains at an all-time high. What happens next?

Hong Kong regulators have set a sustainable finance roadmap for 2026-2028.

Top Insights

Cryptocurrency ETFs are diverse: Bitcoin is experiencing $60 million in outflows. ETH, SOL, and XRP funds are shown in green.

Cryptocurrency outflows reach $1.7 billion, but tokenized metals attract investors.

A sharp drop in spot trading volume triggered a significant Bitcoin correction, with Anchor Mining standing out amidst market turmoil with a stable daily return of $3,656.

Most Popular

COTI announces Ethereum ecosystem growth fund to promote privacy

SHIBA INU (SHIB) and Dogecoin (DOGE) holders are 16,736%of Rally Progast Tempts buyers that are accumulated as Little PEPE (Lilpepe).

Japan’s Big Three Banks to Test Cross-Border Stablecoin Transfer Platform

TEAL, Introducing Training-Free Activation Sparsity to Improve LLM Efficiency

background

Motivational Research: Activation Distribution Characteristics of LLM

teal

Improved hardware recognition speed

Compatibility with quantization

Application

Related Posts