Numbast Bridges CUDA C++ and Python Ecosystems

On Oct 25, 2024

Luisa Crawford
Oct 25, 2024 05:33

Numbast introduces an automated pipeline to convert CUDA C++ APIs into Numba bindings, enhancing Python developers’ access to CUDA’s performance.

The technological gap between Python developers and the CUDA C++ ecosystem is set to narrow significantly with the introduction of Numbast, according to the NVIDIA Technical Blog. This innovative tool automates the conversion of CUDA C++ APIs into Numba bindings, enhancing the performance capabilities accessible to Python developers.

Bridging the Gap

Numba has long enabled Python developers to write CUDA kernels using a syntax similar to C++. However, the vast array of libraries exclusive to CUDA C++, such as the CUDA Core Compute Libraries and cuRAND, remained out of reach for Python users. Manually binding each library to Python has been a cumbersome and error-prone process.

Introducing Numbast

Numbast addresses this issue by establishing an automated pipeline that reads top-level declarations from CUDA C++ header files, serializes them, and generates Numba extensions. This process ensures consistency and keeps Python bindings in sync with updates in CUDA libraries.

Demonstrating Numbast’s Capabilities

An illustrative example of Numbast’s functionality is the creation of Numba bindings for a simple myfloat16 struct, inspired by CUDA’s float16 header. This demo showcases how C++ declarations are transformed into Python-accessible bindings, allowing developers to operate with CUDA’s performance advantages within a Python environment.

Practical Application

One of the first supported bindings through Numbast is the bfloat16 data type, which can interoperate with PyTorch’s torch.bfloat16. This integration enables the development of custom compute kernels that leverage CUDA intrinsics for efficient processing.

Architecture and Functionality

Numbast comprises two main components: AST_Canopy, which parses and serializes C++ headers, and the Numbast layer itself, which generates Numba bindings. AST_Canopy ensures environment detection at runtime and offers flexibility in compute capability parsing, while Numbast serves as the translation layer between C++ and Python.

Performance and Future Prospects

Bindings generated with Numbast are optimized through foreign function invocation, with future enhancements expected to further close the performance gap between Numba kernels and native CUDA C++ implementations. Upcoming releases promise additional bindings, including NVSHMEM and CCCL, expanding the tool’s utility.

For more information, visit the NVIDIA Technical Blog.

Image source: Shutterstock