AMD, a leading semiconductor supplier, has made significant progress in optimizing the hardware efficiency of artificial intelligence (AI) algorithms. According to AMD.com, the company’s latest research paper is ‘Unified progressive depth pruner for CNN and Vision Transformer‘ was accepted at the prestigious AAAI 2024 conference. In this paper, we introduce a new depth pruning method designed to improve performance across a variety of AI models.
Motivation for model optimization
Deep neural networks (DNNs) have become essential for a variety of industrial applications and require continuous model optimization. In this context, techniques such as model pruning, quantization, and efficient model design are very important. Existing channel-wise pruning methods face the problem of depth-wise convolutional layers due to sparse computations and few parameters. Additionally, these methods often suffer from high parallel computing demands, resulting in suboptimal hardware utilization.
To address these issues, the AMD research team proposed DepthShrinker and Layer-Folding techniques to optimize MobileNetV2 by reducing model depth through reparameterization. Despite their potential, these methods have limitations, such as potential loss of accuracy and constraints in certain regularization layers such as LayerNorm, making them unsuitable for vision transducer models.
Innovative depth pruning approach
AMD’s new depth pruning method introduces a progressive training strategy and a new block pruning technique that can optimize both CNN and vision transformer models. This approach ensures high utilization of baseline model weights, thereby increasing accuracy. Additionally, this method can effectively prune the vision transformer model by processing existing regularization layers, including LayerNorm.
The AMD deep pruning strategy converts complex, slow blocks into simpler, faster blocks through block merging. This involves replacing activation layers with ID layers and LayerNorm layers with BatchNorm layers to facilitate reparameterization. The reparameterization technique then merges the BatchNorm layers, adjacent convolutional layers, or fully connected layers and skips the connections.
Core technology
The depth pruning process includes four main steps: supernet training, subnet discovery, subnet training, and subnet merging. Initially, a supernet is constructed based on the base model by incorporating block modifications. After learning the supernet, a search algorithm identifies the optimal subnet. We then apply an incremental training strategy to optimize the subnets while minimizing accuracy loss. Finally, the subnets are merged into a shallower model using a reparameterization technique.
Benefits and Performance
AMD’s depth pruning method offers several key contributions:
- A unified and efficient depth pruning method for CNN and vision transformer models.
- An incremental training strategy for subnet optimization combined with a new block pruning strategy using reparameterization.
- A comprehensive experiment showing good pruning performance across a variety of AI models.
Experimental results show that AMD’s method achieves up to 1.26x speedup on the AMD Instinct™ MI100 GPU accelerator, with only 1.9% degradation in top-1 accuracy. This approach has been tested on several models, including ResNet34, MobileNetV2, ConvNeXtV1, and DeiT-Tiny, showing its versatility and efficiency.
In conclusion, AMD’s integrated depth pruning method represents a significant advance in optimizing AI model performance. Its applicability to both CNNs and vision transformer models highlights its potential impact on future AI developments. AMD plans to further apply this method to more transformer models and tasks.
Image source: Shutterstock
. . .
tag