According to the NVIDIA Blog, Oracle Cloud Infrastructure (OCI) has announced the availability of NVIDIA L40S GPU bare metal instances. This expansion is aimed at meeting the growing demand for advanced technologies such as generative AI, large-scale language models (LLMs), and digital twins.
NVIDIA L40S, now available for order from OCI
The NVIDIA L40S GPU is designed to provide multi-workload acceleration for a wide range of applications, including generative AI, graphics, and video. Featuring fourth-generation Tensor Cores and support for the FP8 data format, it is ideal for training and fine-tuning small- to medium-sized LLMs and performing inference across a wide range of use cases.
For example, a single L40S GPU can generate up to 1.4x more tokens per second than a single NVIDIA A100 Tensor Core GPU in a Llama 3 8B with NVIDIA TensorRT-LLM. The L40S also excels at graphics and media acceleration, making it ideal for advanced visualization and digital twin applications. It delivers up to 3.8x higher real-time ray tracing performance than its predecessor and supports NVIDIA DLSS 3 for faster rendering and smoother frame rates.
OCI will be offering the L40S GPU in the BM.GPU.L40S.4 bare metal compute form factor, featuring four NVIDIA L40S GPUs with 48GB of GDDR6 memory each. This setup includes 7.38TB of local NVMe drives, 112 cores of a 4th generation Intel Xeon CPU, and 1TB of system memory. This configuration eliminates virtualization overhead for high-throughput and latency-sensitive AI or machine learning workloads.
“We chose OCI AI infrastructure with bare metal instances and NVIDIA L40S GPUs for 30% more efficient video encoding,” said Sharon Kamel, CEO of Beamr Cloud. “This reduces storage and network bandwidth consumption by up to 50%, resulting in faster file transfers and improved end-user productivity.”
Single GPU H100 VM coming soon from OCI
OCI will soon be launching a compute virtual machine form factor VM.GPU.H100.1 accelerated by a single NVIDIA H100 Tensor Core GPU. This new offering aims to provide cost-effective, on-demand access to enterprises looking to leverage the power of NVIDIA H100 GPUs for generative AI and high-performance computing (HPC) workloads.
A single H100 GPU can generate over 27,000 tokens per second for Llama 3 8B, delivering up to 4x the throughput of a single A100 GPU at FP16 precision. The VM.GPU.H100.1 shape includes 2 x 3.4TB of NVMe drive capacity, 13 cores of 4th generation Intel Xeon processors, and 246GB of system memory, making it suitable for a wide range of AI tasks.
GH200 bare metal instance available for validation
OCI also provided the BM.GPU.GH200 compute shape for customer testing. This shape features the NVIDIA Grace Hopper Superchip and NVLink-C2C, providing high-bandwidth, cache-coherent 900GB/s connectivity between the NVIDIA Grace CPU and the Hopper GPU. This setup can improve performance by up to 10x for applications running terabytes of data compared to the NVIDIA A100 GPU.
Optimized software for enterprise AI
Maximizing the potential of GPU-accelerated compute instances requires an optimized software layer. NVIDIA NIM, part of the NVIDIA AI Enterprise software platform available on the OCI Marketplace, provides a set of microservices designed to securely and reliably deploy high-performance AI model inference.
NIM pre-built containers optimized for NVIDIA GPUs provide improved cost of ownership, faster time to market, and enhanced security. These microservices can be easily deployed on OCI, enabling enterprises to develop world-class generative AI applications.
For more information, visit the NVIDIA blog.
Image source: Shutterstock