-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(Python backends): Add Python backends CPU builds #5990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dencies when not needed & use CPU Torch for CPU build Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
…ild type - bare metal build tweak Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
…request contains - deadlock/OOM prevention Signed-off-by: rampa3 <[email protected]>
…k/OOM prevention Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
…ansformers Signed-off-by: rampa3 <[email protected]>
…revention Signed-off-by: rampa3 <[email protected]>
… Dockerfile Signed-off-by: rampa3 <[email protected]>
Signed-off-by: rampa3 <[email protected]>
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This reverts commit 39f32b0. Made error when applying the CPU requirements - "+cpu" applies only for pinned versions. Signed-off-by: rampa3 <[email protected]>
…se all other configs use Signed-off-by: rampa3 <[email protected]>
@@ -0,0 +1,64 @@ | |||
ARG BASE_IMAGE=ubuntu:22.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why having a separate Dockerfile? when build-type is empty we already treat it as a CPU build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why having a separate Dockerfile? when build-type is empty we already treat it as a CPU build
This Dockerfile is supposed to build vLLM from source, as PyPi similar to Torch only has CUDA release. The aim behind the CPU builds is to, where we don't need any other changes, use CPU specific builds of libraries, as for example just installing torch
using PyPi repository on a CPU image adds baggage of more than 4 GBs worth of NVIDIA CUDA dependencies the package pulls in (It is well visible on the master CI build of Kitten TTS right now - the TTS itself is not GPU accelerated, but since one of the libs wants Torch, you get more than 5 GB extra dependencies in Torch + CUDA. I have just for fun built it locally with edited requirements to preinstall CPU Torch, and it fell to 1.16 GB image size.). That is why I went and blanket added extra index pointing to CPU releases of Torch everywhere. With vLLM, it is a bit more complicated - to get CPU release, it has to be built from source. We have a part of install.sh
for that, but that part never runs with normal Dockerfile, as at some point, the argument FROM_SOURCE
was removed. Since the build also has its specific deps, I made extra Dockerfile that installs build deps according to vLLM docs about building the CPU version. It also successfully builds, but for some reason crashes on init when called by LocalAI, and I have no idea how to properly get the whole stacktrace - GRPC returns only part of it. This is one of the reasons why the PR is a draft, not straight PR - I want to try to get this CPU build working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I rename the file in preparation for potential addition of ROCm and XPU parts into it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the point of building vllm from CPU, but I've just run a diff manually locally here against the two Dockerfiles (Dockerfile.python
and Dockerfile.vllmcpu
) and I don't see notable differences. My point is more that I think we can still use the same Dockerfile, and handle the installation bits directly in the make/install of the backend, unless am I missing something?
--- backend/Dockerfile.vllmcpu 2025-08-08 16:43:25.145194390 +0200
+++ backend/Dockerfile.python 2025-08-08 16:43:15.812600946 +0200
@@ -1,11 +1,9 @@
ARG BASE_IMAGE=ubuntu:22.04
FROM ${BASE_IMAGE} AS builder
-ARG BACKEND=vllm
+ARG BACKEND=rerankers
ARG BUILD_TYPE
ENV BUILD_TYPE=${BUILD_TYPE}
-ARG FROM_SOURCE=true
-ENV FROM_SOURCE=${FROM_SOURCE}
ARG CUDA_MAJOR_VERSION
ARG CUDA_MINOR_VERSION
ARG SKIP_DRIVERS=false
@@ -30,20 +28,81 @@ RUN apt-get update && \
curl python3-pip \
python-is-python3 \
python3-dev llvm \
- python3-venv make \
- wget \
- gcc-12 g++-12 \
- libtcmalloc-minimal4 \
- libnuma-dev \
- ffmpeg \
- libsm6 libxext6 \
- libgl1 \
- jq lsof && \
+ python3-venv make && \
apt-get clean && \
- update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12 && \
rm -rf /var/lib/apt/lists/* && \
pip install --upgrade pip
+
+# Cuda
+ENV PATH=/usr/local/cuda/bin:${PATH}
+
+# HipBLAS requirements
+ENV PATH=/opt/rocm/bin:${PATH}
+
+# Vulkan requirements
+RUN <<EOT bash
+ if [ "${BUILD_TYPE}" = "vulkan" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
+ apt-get update && \
+ apt-get install -y --no-install-recommends \
+ software-properties-common pciutils wget gpg-agent && \
+ wget -qO - https://packages.lunarg.com/lunarg-signing-key-pub.asc | apt-key add - && \
+ wget -qO /etc/apt/sources.list.d/lunarg-vulkan-jammy.list https://packages.lunarg.com/vulkan/lunarg-vulkan-jammy.list && \
+ apt-get update && \
+ apt-get install -y \
+ vulkan-sdk && \
+ apt-get clean && \
+ rm -rf /var/lib/apt/lists/*
+ fi
+EOT
+
+# CuBLAS requirements
+RUN <<EOT bash
+ if [ "${BUILD_TYPE}" = "cublas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then
+ apt-get update && \
+ apt-get install -y --no-install-recommends \
+ software-properties-common pciutils
+ if [ "amd64" = "$TARGETARCH" ]; then
+ curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
+ fi
+ if [ "arm64" = "$TARGETARCH" ]; then
+ curl -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
+ fi
+ dpkg -i cuda-keyring_1.1-1_all.deb && \
+ rm -f cuda-keyring_1.1-1_all.deb && \
+ apt-get update && \
+ apt-get install -y --no-install-recommends \
+ cuda-nvcc-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+ libcufft-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+ libcurand-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+ libcublas-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+ libcusparse-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} \
+ libcusolver-dev-${CUDA_MAJOR_VERSION}-${CUDA_MINOR_VERSION} && \
+ apt-get clean && \
+ rm -rf /var/lib/apt/lists/*
+ fi
+EOT
+
+# If we are building with clblas support, we need the libraries for the builds
+RUN if [ "${BUILD_TYPE}" = "clblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+ apt-get update && \
+ apt-get install -y --no-install-recommends \
+ libclblast-dev && \
+ apt-get clean && \
+ rm -rf /var/lib/apt/lists/* \
+ ; fi
+
+RUN if [ "${BUILD_TYPE}" = "hipblas" ] && [ "${SKIP_DRIVERS}" = "false" ]; then \
+ apt-get update && \
+ apt-get install -y --no-install-recommends \
+ hipblas-dev \
+ rocblas-dev && \
+ apt-get clean && \
+ rm -rf /var/lib/apt/lists/* && \
+ # I have no idea why, but the ROCM lib packages don't trigger ldconfig after they install, which results in local-ai and others not being able
+ # to locate the libraries. We run ldconfig ourselves to work around this packaging deficiency
+ ldconfig \
+ ; fi
# Install uv as a system package
RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh
ENV PATH="/root/.cargo/bin:${PATH}"
@@ -60,5 +119,5 @@ COPY python/common/ /${BACKEND}/common
RUN cd /${BACKEND} && make
FROM scratch
-ARG BACKEND=vllm
-COPY --from=builder /${BACKEND}/ /
+ARG BACKEND=rerankers
+COPY --from=builder /${BACKEND}/ /
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well. the dependency block I talk about are dependencies from APT, as listed in the vLLM docs. That means that requirements-cpu.txt
is not a way. They are GCC, the C++ libraries required to compile vLLM and few tools vLLM uses in its makfiles. Here is the block from vLLM docs that dictates the extra dependencies:
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends ccache git curl wget ca-certificates gcc-12 g++-12 libtcmalloc-minimal4 libnuma-dev ffmpeg libsm6 libxext6 libgl1 jq lsof
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
We already have some, but we are missing these:
- python3-venv
- make
- wget
- gcc-12
- g++-12
- libtcmalloc-minimal4
- libnuma-dev
- ffmpeg
- libsm6
- libxext6
- libgl1
- jq
- lsof
which all have to be installed as APT packages, since vLLM is compiled from C++ code. Normally, we just pull Python dependencies, as even CPU Torch is available already pre-compiled for the C++ parts. vLLM is compiled for CPU fully from scratch, so unless we decide to not ship CPU vLLM, we have to provide these somehow.
I can see if I can make it work with install.sh
, since as it is just a shell script, and the builder runs as root, it should work. The only thing is, if I put them here, those building from custom Dockerfiles won't thank me - people build for example on Arch builders, and putting that there limits build platform to Debian & derivative distros only (without manual intervention). Dockerfile was chosen not only as experimentation shortcut (only the fact that there was a separate one was a testing shortcut), but also to keep the backend source directory platform agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I think it's OK to put it in the Dockerfile.python
builder. Especially because at the end of the day that container is used only for building, so in the worse case we would have to copy the libraries to the final backend during the packaging phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion here probably would be to do this step-by-step for each backend, or at least treat vLLM separately to not make this PR go stale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion here probably would be to do this step-by-step for each backend, or at least treat vLLM separately to not make this PR go stale.
I agree. I think splitting it backend by backend will be the best way. I will prepare per backend branches and PRs for the ready ones ASAP. The working ones I will just have to figure out the CI for, the rest will be opened whenever I get a moment to sit down and finish them. Last few weeks were a bit busy, as I am in the middle of autumn terms for bachelor finals. I think with that, I will be closing this one then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sounds good to me, we can follow-up on the other PRs. Thanks! (and good luck with your finals! )
I think changes are in the correct direction, however we need to update the CI workflow in https://github.com/mudler/LocalAI/blob/master/.github/workflows/backend.yml and the backend gallery index accordingly https://github.com/mudler/LocalAI/blob/master/backend/index.yaml to have the cpu variants if missing |
Of course, though before doing all the CI and gallery work, I want to have all backends yielding usable output first, unless we decide to drop any. At the moment, I need to finish testing |
Signed-off-by: rampa3 <[email protected]>
Description
This PR provides tweaks and additions for building CPU builds of Python LocalAI backends. Based on discussion #5980.
That entails these changes:
/opt/intel/
requirements-cpu.txt
(except for exllama2, which is CUDA only backend, and for vLLM also patching build process) to use CPU Torch from Torch repository (416f212, 9e65421, 09a32ed, 131a590, b50cdd2, 7986a67, 2644b31, 6938d6d, 3c09e79, 5d4aad5) (PyPi contains only CUDA release)/opt/intel/
Notes for Reviewers
Current state of the CPU builds is (based on testing from a few days ago):
rfdetr-base
model from gallery is not valid (has official build, but it is being built against CUDA Torch from PyPi - the image is very big for no reason for CPU usage)Kokoro and rfdetr will need to be still tested. Faster-whisper I am not sure why it refuses to work, and need to know from someone running a GPU build if it is issue with CPU build or not, but the situation is weird, as only change is CPU Torch usage, and trascription itself works if dumped into a file instead of GRPC. vLLM will require some extra work to get the build process to install it in a way compatible with modern Python and uv. For these reasons, the PR will be a draft until the backends are tested, and issues with faster-whisper and vllm are resolved in some way.
Signed commits