Felix Pinkston
February 27, 2025 10:52
NVIDIA’s KVIKIO provides high -performance remote IO functions to provide data processing for cloud workloads using object storage services such as S3 and Azure Blob Storage.
NVIDIA introduced KVIKIO, a tool designed to optimize the remote IO tasks for workloads using object storage services such as Amazon S3, Google Cloud Storage and Azure Blob Storage. According to NVIDIA, this innovation is especially advantageous for applications with many data executed in the cloud environment. According to NVIDIA, efficient data access is important for preventing bottlenecks.
Understanding object storage
Object storage services are designed to manage and provide vast amounts of data. However, to effectively use these services, it is very different from the existing local file system, so it is necessary to understand the behavior. One of the main differences is that it is a higher and variable waiting time related to reading and writing in object storage.
Data transmission optimization
NVIDIA suggests that computing nodes are located in the same cloud area, close to storage services to improve data transmission speed. This setting minimizes the network waiting time and ultimately limits the data transfer speed, which improves the reliability of the data transfer.
File format and size
Cloud Native file formats such as Apache Parquet and Cloud Optimized Geotiff can greatly improve data access efficiency. This format reduces unnecessary data transfer by allowing selective metadata reading and data downloads. In addition, optimizing the file size in the range of dozens to hundreds of megabytes can improve performance by amazing the overhead of the HTTP request.
Concurrence for performance improvement
In order to maximize the performance of remote storage services, concurrency is essential. Since object storage services are designed to handle numerous requests at the same time, many simultaneous requests allow users to increase throughput. This approach is especially effective when using Python’s thread pool or asyncio for parallel processing.
NVIDIA KVIKIO’s advantages
KVIKIO automatically cleans and runs at the same time as a small request. In particular, it facilitates efficient readings to host or device memory when the GPU direct storage is activated. According to the benchmark, KVIKIO achieves higher throughput compared to other libraries such as BOTO3 when reading S3’s data.
Benchmark insight
According to the performance benchmark, KVIKIO can achieve impressive throughput when reading data from S3 to EC2 instances. For example, the 1GB file read by the G4DN.XLARGE EC2 instance showed a higher handwriting for the optimal point. Similarly, the task -sized adjustment affects the maximum processing volume and achieves the best performance when the work size is too small or too large.
In a scenario associated with 360 park files read in the DASK Worker process, KVIKIO shows efficiency in handling large data tasks by activating almost 20Gbps through a single node in S3.
For data professionals who want to alleviate IO bottlenecks in cloud -based workflow, NVIDIA KVIKIO offers attractive solutions. By implementing this strategy, users can greatly improve data processing speed and overall performance.
Image Source: Shutter Stock