According to the NVIDIA Technology Blog, the skills gap between Python developers and the CUDA C++ ecosystem will be significantly reduced with the introduction of Numbas. This innovative tool automatically converts CUDA C++ APIs to Numba bindings, increasing performance features accessible to Python developers.
Bridging the gap
Numba has long enabled Python developers to write CUDA kernels using a C++-like syntax. However, an extensive array of CUDA C++-specific libraries, such as CUDA Core Compute Libraries and cuRAND, remained inaccessible to Python users. Manually binding each library to Python was a cumbersome and error-prone process.
About Numbas
Numbast solves this problem by building an automated pipeline that reads the top-level declarations from CUDA C++ header files, serializes them, and generates Numba extensions. This process ensures consistency and keeps Python bindings in sync with updates to the CUDA libraries.
Proving the power of Numbas
An example demonstrating the capabilities of Numbas is creating a simple Numba binding. myfloat16
CUDA-inspired structures float16
header. This demo shows how C++ declarations can be converted to Python accessible bindings, allowing developers to take advantage of the performance benefits of CUDA within a Python environment.
practical application
One of the first bindings supported through Numbas is bfloat16
Data types that can interoperate with PyTorch torch.bfloat16
. This integration allows the development of custom compute kernels that leverage CUDA’s built-in capabilities for efficient processing.
Architecture and features
Numbas consists of two main components: AST_Canopy
The Numba bindings that parse and serialize C++ headers, and the Numbas layer itself that generates the Numba bindings. AST_Canopy
Ensuring environment detection at runtime and providing flexibility in parsing compute functions, Numbast acts as a translation layer between C++ and Python.
Performance and future prospects
Bindings generated with Numbas are optimized over external function calls, and future improvements are expected to further reduce the performance gap between the Numba kernel and the default CUDA C++ implementation. Future releases promise additional bindings, including NVSHMEM and CCCL, to expand the tool’s utility.
For more information, visit the NVIDIA Technology Blog.
Image source: Shutterstock