Compilation help for Windows, Linux, WSL

How to compile ggllm.cpp:

Recommended with cmake: (change the CUBLAS flag to 0 to disable CUDA requirements and support)

git clone https://github.com/cmp-nct/ggllm.cpp
cd ggllm.cpp
rm -rf build; mkdir build; cd build
# if you do not have cuda in path:
export PATH="/usr/local/cuda/bin:$PATH"
# in case of problems, this sometimes helped
#export CPATH="/usr/local/cuda/targets/x86_64-linux/include:"
#export LD_LIBRARY_PATH="/usr/local/cuda/lib64:"
cmake -DLLAMA_CUBLAS=1 -DCUDAToolkit_ROOT=/usr/local/cuda/ ..  
cmake --build . --config Release
# find the binaries in ./bin
# falcon_main, falcon_quantize, falcon_perplexity

Building with make (fallback):

export LLAMA_CUBLAS=1;
# if you do not have "nvcc" in your path:
# export PATH="/usr/local/cuda/bin:$PATH"
make falcon_main falcon_quantize falcon_perplexity

Windows and Demos
Note: those tutorials are before the latest performance patches
NOTE: Do not use Powershell. Use Visual Studio Code or cmd.exe. Powershell is not compatible with cmake
Note: Do not use Visual Studio compilers, use the Community edition. The official VS compiler is bugged and needs custom help to work with cuda
Video tutorial for Windows compilation without WSL:
https://www.youtube.com/watch?v=BALw669Qeyw
Another demo of Falcon 40B at 5 bit quantization:
https://www.youtube.com/watch?v=YuTMFL1dKgQ&ab_channel=CmpNct
The speed can be seen at 35 tokens/sec start gradually lowering over context - that has been solved in the meantime

Installing on WSL (Windows Subsystem for Linux)

# Use --no-mmap in WSL OR copy the model into a native directory (not /mnt/) or it will get stuck loading (thanks @nauful)
#Choose a current distro:
wsl.exe --list --online
wsl --install -d distro
# cmake 3.16 is required and the cuda toolset
# If you run an old distro you can upgrade (like apt update; apt upgrade; apt full-upgrade; pico /etc/apt/sources.list/; apt update; apt upgrade; apt full-upgrade; apt autoremove; lsb_release -a); then wsl --shutdown and restart it
# install cuda WSL toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
dpkg -i cuda-keyring_1.0-1_all.deb
apt-get update; apt-get -y install cuda
# you might need to add it to your path:
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
export PATH="/usr/local/cuda/bin:$PATH"
# now start with a fresh cmake and all should work

Random Windows hints
This one cost me 12 hours: If you have CUDA problems as if you were running on an old CUDA version or old GPU (compute version) or cuda compilation problems: Close all powershell windows (really all of them) Close any command shells, close visual studio (make sure all are closed) Then try again in a fresh shell, it might just work now. It's not a "path" problem, it's something deeper in Windows 11.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilation help for Windows, Linux, WSL

Clone this wiki locally