Personal Repo to Study CUDA and GPU programming, specifically learn how to implement many kernels from scratch.
Seen the Conv2D simple implentation with Inference and Backpropagation
Understanding the difference between shared and global memory in CUDA
- Understand CUDA memory hierarchy
- Implement a simple CUDA kernel
- Optimize CUDA kernel for memory access
- Profile CUDA kernel performance
- Compare CPU vs GPU performance
- Conv_1D CPU vs GPU performance
- Easier to perform experiments (parse agrs for kernel size, stride, padding)
- Conv_2D CPU vs GPU performance
- Easier to perform experiments (parse agrs for kernel size, stride, padding)
- Conv_3D CPU vs GPU performance
- Easier to perform experiments (parse agrs for kernel size, stride, padding)
- Complete MNIST Example in CUDA/C++, this can ve an adventure
- Added a Conv1D implementation in CUDA, with GPU vs CPU performance comparison
- Added a training loop for Conv2D deformation in CUDA, with lots of comments to understand it
- Started a C++ and CUDA project, guide from here
- Next Step: Import OpenCV and open Image with OpenCV.