OpenAI, the nonprofit venture whose professed mission is the ethical advancement of AI, has released the first version of the Triton language, an open source project that allows researchers to write GPU-powered deep learning projects without needing to know the intricacies of GPU programming for machine learning.
Triton 1.0 uses Python (3.6 and up) as its base. The developer writes code in Python using Triton’s libraries, which are then JIT-compiled to run on the GPU. This allows integration with the rest of the Python ecosystem, currently the biggest destination for developing machine learning solutions. It also allows leveraging the Python language itself, instead of reinventing the wheel by developing a new domain-specific language.
Triton’s libraries provide a set of primitives that, reminiscent of NumPy, provide a variety of matrix operations, for instance, or functions that perform reductions on arrays according to some criterion. The user combines these primitives in their own code, adding the
@triton.jit decorator compiled to run on the GPU. In this sense Triton also resembles Numba, the project that allows numerically intensive Python code to be JIT-compiled to machine-native assembly for speed.
Simple examples of Triton at work include a vector addition kernel and a fused softmax operation. The latter example, it’s claimed, can run many times faster than the native PyTorch fused softmax for operations that can be done entirely in GPU memory.
Triton is a young project and currently available for Linux only. Its documentation is still minimal, so early-adopting developers may have to examine the source and examples closely. For instance, the
triton.autotune function, which can be used to define parameters for optimizing JIT compilation of a function, is not yet documented in the Python API section for the library. However,
triton.autotune is demonstrated in Triton’s matrix multiplication example.