Skip to content

Dao-AILab/quack

Repository files navigation

🦆 QuACK: A Quirky Assortment of CuTe Kernels 🦆

Kernels are written in the CuTe-DSL.

Installation

pip install quack-kernels

Requirements

  • H100 or B200 GPU
  • CUDA toolkit 12.9+
  • Python 3.12

Kernels 🐥

  • 🦆 RMSNorm forward + backward
  • 🦆 Softmax forward + backward
  • 🦆 Cross entropy forward + backward
  • 🦆 Layernorm forward

Upcoming:

  • 🦆 Rotary forward + backward

Usage

from quack import rmsnorm, softmax, cross_entropy

Documentations

[2025-07-10] We have a comprehensive blogpost on how to get memory-bound kernels to speed-of-light, right in the comfort of Python thanks to the CuTe-DSL.

Performance

See our blogpost for the details.

Development

To set up the development environment:

pip install -e '.[dev]'
pre-commit install

About

A Quirky Assortment of CuTe Kernels

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 7

Languages