This repository contains experimental implementations for learning and optimizing GPU programs using CUDA.
All examples are written in C++ with CUDA, focusing on performance analysis and architectural understanding.
Each experiment targets specific aspects of CUDA optimization, such as memory access patterns, warp behavior, and kernel launch strategies.
- Practice writing and optimizing GPU kernels with CUDA
- Explore real-world performance characteristics (shared memory, warp divergence, etc.)
- Summarize key findings in short blog posts (linked below)
Brief write-ups are published on my technical blog:
π https://yaikeda.github.io/cuda-examples-blog/
- CUDA Toolkit: 12.6
- Platform: Windows + Power Shell
- Visual Studio: 2022 (not use IDE but cl.exe)
If you're a student, a tinkerer, or just curious β please feel free to explore, learn, and reach out! π
This project is licensed under MIT.