Skip to content

OpenCL implementation of Zero-mean Normalized Cross Correlation (ZNCC) - Project for 521288S Multiprocessor Programming, Spring 2023, @unioulu

License

Notifications You must be signed in to change notification settings

husmen/DepthZNCC

Repository files navigation

DepthZNCC

OpenCL implementation of Zero-mean Normalized Cross Correlation (ZNCC) - Project for 521288S Multiprocessor Programming, Spring 2023, @UniOulu

Notes

How to

Requirements

  • C++17
  • LodePNG
  • OpenMP
  • OpenCL
  • CUDA

Benchmarks:

  • resizeFactor: 1
  • winSize: 9
  • maxDisp: 32
  • ccThresh: 32
  • occThresh = 16
Method ⬇️\Resolution ➡️ 2940x2016 1470x1008 735x504
Single-Threaded 348 - -
Multi-Threaded 50.7 - -
OpenMP 51.8 - -
SIMD 15.4 - -
OpenCL (GPU) 4.1 - -
OpenCL (APU) 4.0 - -
OpenCL (CPU) 11.6 - -
OpenCL Optimized (GPU) 2.2 - -
OpenCL Optimized (APU) 1.8 - -
OpenCL Optimized (CPU) DNR - -
CUDA 2.2 - -
[Benchmark results as runtime in seconds]

OpenCL Platform/Device Info

### AMD APP
Platform summary:
Name: AMD Accelerated Parallel Processing
Vendor: Advanced Micro Devices, Inc.
Version: OpenCL 2.1 AMD-APP (3516.0)

Device summary:
Name: gfx90c
Max compute units: 8
Global memory size (bytes): 12980584448
Max work group size: 256
Kernel work group size: 14757395258967641292

### NVIDIA CUDA
Platform summary:
Name: NVIDIA CUDA
Vendor: NVIDIA Corporation
Version: OpenCL 3.0 CUDA 12.0.94

Device summary:
Name: NVIDIA GeForce RTX 3050 Laptop GPU
Max compute units: 16
Global memory size (bytes): 4294443008
Max work group size: 1024
Kernel work group size: 14757395258967641292

### Intel OpenCL
Platform summary:
Name: Intel(R) OpenCL
Vendor: Intel(R) Corporation
Version: OpenCL 3.0 WINDOWS

Device summary:
Name: AMD Ryzen 7 5800HS with Radeon Graphics
Max compute units: 16
Global memory size (bytes): 33721454592
Max work group size: 8192
Kernel work group size: 14757395258967641292

Full device specs are available here

TODO

  • ZNCC single-threaded.
  • ZNCC multi-threaded.
  • ZNCC OpenMP.
  • ZNCC SIMD.
  • ZNCC OpenCL.
  • ZNCC OpenCL optimization.
  • ZNCC CUDA.
  • Benchmarking all implementations.
  • Advanced profiling (Orbit?)
  • Unit tests (optional).
  • Automatic data downloader (optional).
  • Switch from CUDA implementation to Khronos OpenCL SDK with ICD Loader (optional).

Development diary

  • Day 1: Wasted a whole day trying to run OpenCL (CUDA version) on WSL, turns out, that's not supported.
  • Day 2: Setup OpenCL environment correctly on Windows, including
    • NVIDIA GPU support.
    • vcpkg package manager.
    • conan package manager.
    • cmake build system.
    • meson build system.
    • Hello world for OpenCL.
  • Day 3:
    • First OpenCL kernel.
    • PNG image loading with LodePNG.
    • Improved the project's structure.
  • Day 4: Initial ZNCC implementation, naive and single threaded on CPU.
  • Day 5: C++ Multithreading and OpenMP implementations.
  • Day 6: Learning more about OpenCL.
  • Day 7: Initial OpenCL implementation of ZNCC, mostly a copy paste of the C++ code with minor editing and a kernel wrapper.
  • Day 8: Researching the topic of profiling and tracing OpenCL programs, the ecosystem looks like a mess!
  • Day 9: Some refactoring and learning more about SIMD.
  • Day 10: SIMD implementation, with better loop structures to take data locality into account.
  • Day 11: Reworking CMake configuration to solve OpenCL issues on WSL, SIMD issues on Windows.
  • Day 12: Improved and Optimized OpenCL implementation, based on ideas from the SIMD implementation.
  • Day 13: Trying to figure out the write way to compile and link cuda files suing CMake.
  • Day 14:
    • CUDA implementation.
    • OpenCL pipes, failed.
  • Day 15: Code cleanup and report.

References

OpenMP

OpenCL

CUDA

OpenCL 2.0/3.0 Implementations

OpenCL Profiling

About

OpenCL implementation of Zero-mean Normalized Cross Correlation (ZNCC) - Project for 521288S Multiprocessor Programming, Spring 2023, @unioulu

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published