OpenCL implementation of Zero-mean Normalized Cross Correlation (ZNCC) - Project for 521288S Multiprocessor Programming, Spring 2023, @UniOulu
- Middlebury Stereo Datasets
- Project structure and configuration mixes guidlines from Canonical Project Structure, pitchfork and cmake_template
- C++17
- LodePNG
- OpenMP
- OpenCL
- CUDA
- resizeFactor: 1
- winSize: 9
- maxDisp: 32
- ccThresh: 32
- occThresh = 16
Method ⬇️\Resolution ➡️ | 2940x2016 | 1470x1008 | 735x504 |
---|---|---|---|
Single-Threaded | 348 | - | - |
Multi-Threaded | 50.7 | - | - |
OpenMP | 51.8 | - | - |
SIMD | 15.4 | - | - |
OpenCL (GPU) | 4.1 | - | - |
OpenCL (APU) | 4.0 | - | - |
OpenCL (CPU) | 11.6 | - | - |
OpenCL Optimized (GPU) | 2.2 | - | - |
OpenCL Optimized (APU) | 1.8 | - | - |
OpenCL Optimized (CPU) | DNR | - | - |
CUDA | 2.2 | - | - |
[Benchmark results as runtime in seconds] |
### AMD APP
Platform summary:
Name: AMD Accelerated Parallel Processing
Vendor: Advanced Micro Devices, Inc.
Version: OpenCL 2.1 AMD-APP (3516.0)
Device summary:
Name: gfx90c
Max compute units: 8
Global memory size (bytes): 12980584448
Max work group size: 256
Kernel work group size: 14757395258967641292
### NVIDIA CUDA
Platform summary:
Name: NVIDIA CUDA
Vendor: NVIDIA Corporation
Version: OpenCL 3.0 CUDA 12.0.94
Device summary:
Name: NVIDIA GeForce RTX 3050 Laptop GPU
Max compute units: 16
Global memory size (bytes): 4294443008
Max work group size: 1024
Kernel work group size: 14757395258967641292
### Intel OpenCL
Platform summary:
Name: Intel(R) OpenCL
Vendor: Intel(R) Corporation
Version: OpenCL 3.0 WINDOWS
Device summary:
Name: AMD Ryzen 7 5800HS with Radeon Graphics
Max compute units: 16
Global memory size (bytes): 33721454592
Max work group size: 8192
Kernel work group size: 14757395258967641292
Full device specs are available here
- ZNCC single-threaded.
- ZNCC multi-threaded.
- ZNCC OpenMP.
- ZNCC SIMD.
- ZNCC OpenCL.
- ZNCC OpenCL optimization.
- ZNCC CUDA.
- Benchmarking all implementations.
- Advanced profiling (Orbit?)
- Unit tests (optional).
- Automatic data downloader (optional).
- Switch from CUDA implementation to Khronos OpenCL SDK with ICD Loader (optional).
- Day 1: Wasted a whole day trying to run OpenCL (CUDA version) on WSL, turns out, that's not supported.
- Day 2: Setup OpenCL environment correctly on Windows, including
- NVIDIA GPU support.
vcpkg
package manager.conan
package manager.cmake
build system.meson
build system.- Hello world for OpenCL.
- Day 3:
- First OpenCL kernel.
- PNG image loading with
LodePNG
. - Improved the project's structure.
- Day 4: Initial ZNCC implementation, naive and single threaded on CPU.
- Day 5: C++ Multithreading and OpenMP implementations.
- Day 6: Learning more about OpenCL.
- Day 7: Initial OpenCL implementation of ZNCC, mostly a copy paste of the C++ code with minor editing and a kernel wrapper.
- Day 8: Researching the topic of profiling and tracing OpenCL programs, the ecosystem looks like a mess!
- Day 9: Some refactoring and learning more about SIMD.
- Day 10: SIMD implementation, with better loop structures to take data locality into account.
- Day 11: Reworking CMake configuration to solve OpenCL issues on WSL, SIMD issues on Windows.
- Day 12: Improved and Optimized OpenCL implementation, based on ideas from the SIMD implementation.
- Day 13: Trying to figure out the write way to compile and link cuda files suing CMake.
- Day 14:
- CUDA implementation.
OpenCL pipes, failed.
- Day 15: Code cleanup and report.
- CUDA Refresher
- CUDA C++ Programming Guide
- CUDA C++ Best Practices Guide
- NVIDIA Ampere Architecture In-Depth
- GPU Performance Background User's Guide
- Khronos OpenCL-SDK: generic, requires vendor ICD
- NVIDIA CUDA
- AMD ROCm
- AMD Software: Adrenaline Edition: AMD GPUs & APUs
- AMD OpenCL SDK: unmaintained
- Intel OpenCL Runtimes
- Intel OpenCL CPU Runtime Github | Intel: oneAPI DPC++, x64 CPUs including AMD
- Intel OpenCL Runtime: iGPU
- Beignet: unmaintained, old Intel CPUs
- POCL: generic
- clinfo
- Oclgrind: An OpenCL device simulator and debugger
- OCL Intercept: Intercept Layer for Debugging and Analyzing OpenCL Applications
- GPUPerfAPI: The GPU Performance API
- Radeon Developer Tool Suit: Radeon GPU Profiler, Analyzer & Memory Visualizer