-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
feat(Python backends): Add Python backends CPU builds #5990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
416f212
Fix requirements install order for bark to prevent pulling CUDA depen…
rampa3 b50cdd2
Use CPU Torch for CPU build of Chatterbox
rampa3 a572ddf
Patch libbackend to build Intel builds only if one is requested by bu…
rampa3 9e65421
Use CPU Torch for CPU build of Coqui
rampa3 09a32ed
Use CPU Torch for CPU build of diffusers
rampa3 704753d
Only use XPU in diffusers if available when requested
rampa3 6ba3b94
Ensure CPU mode usage if running diffusers on CPU
rampa3 e16f605
Force diffusers to use float32 only if running on CPU no matter what …
rampa3 64d4b70
Block bfloat16-only diffusers pipelines from running on CPU - deadloc…
rampa3 a25ff94
Extra CPU optimizations in diffusers
rampa3 131a590
Use CPU Torch for CPU build of faster-whisper
rampa3 1d94f2d
Add device type switching logic using Torch into faster-whisper
rampa3 39f32b0
Use CPU Torch for CPU build of kokoro
rampa3 2644b31
Use CPU Torch for CPU build of rerankers
rampa3 6938d6d
Use CPU Torch for CPU build of rfdetr
rampa3 3c09e79
Use CPU Torch for CPU build of transformers
rampa3 076fd9c
Create lib import code for CPU mode & only use XPU if available in tr…
rampa3 c77851d
Force transformers to use float32 only if running on CPU - deadlock p…
rampa3 5d4aad5
Update vLLM files for building CPU version from source & add CPU vLLM…
rampa3 fd5656b
Add CPU vLLM build logic into main Makefile
rampa3 e19f8b7
Revert "Use CPU Torch for CPU build of kokoro"
rampa3 7986a67
Pin kokoro Torch in CPU requirements to CPU version of the same relea…
rampa3 515ab68
Resolve conflicts and merge branch 'master' into python_builds_pr
rampa3 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
ARG BASE_IMAGE=ubuntu:22.04 | ||
|
||
FROM ${BASE_IMAGE} AS builder | ||
ARG BACKEND=vllm | ||
ARG BUILD_TYPE | ||
ENV BUILD_TYPE=${BUILD_TYPE} | ||
ARG FROM_SOURCE=true | ||
ENV FROM_SOURCE=${FROM_SOURCE} | ||
ARG CUDA_MAJOR_VERSION | ||
ARG CUDA_MINOR_VERSION | ||
ARG SKIP_DRIVERS=false | ||
ENV CUDA_MAJOR_VERSION=${CUDA_MAJOR_VERSION} | ||
ENV CUDA_MINOR_VERSION=${CUDA_MINOR_VERSION} | ||
ENV DEBIAN_FRONTEND=noninteractive | ||
ARG TARGETARCH | ||
ARG TARGETVARIANT | ||
|
||
RUN apt-get update && \ | ||
apt-get install -y --no-install-recommends \ | ||
build-essential \ | ||
ccache \ | ||
ca-certificates \ | ||
espeak-ng \ | ||
curl \ | ||
libssl-dev \ | ||
git \ | ||
git-lfs \ | ||
unzip \ | ||
upx-ucl \ | ||
curl python3-pip \ | ||
python-is-python3 \ | ||
python3-dev llvm \ | ||
python3-venv make \ | ||
wget \ | ||
gcc-12 g++-12 \ | ||
libtcmalloc-minimal4 \ | ||
libnuma-dev \ | ||
ffmpeg \ | ||
libsm6 libxext6 \ | ||
libgl1 \ | ||
jq lsof && \ | ||
apt-get clean && \ | ||
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12 && \ | ||
rm -rf /var/lib/apt/lists/* && \ | ||
pip install --upgrade pip | ||
|
||
# Install uv as a system package | ||
RUN curl -LsSf https://astral.sh/uv/install.sh | UV_INSTALL_DIR=/usr/bin sh | ||
ENV PATH="/root/.cargo/bin:${PATH}" | ||
|
||
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y | ||
|
||
# Install grpcio-tools (the version in 22.04 is too old) | ||
RUN pip install --user grpcio-tools==1.71.0 grpcio==1.71.0 | ||
|
||
COPY python/${BACKEND} /${BACKEND} | ||
COPY backend.proto /${BACKEND}/backend.proto | ||
COPY python/common/ /${BACKEND}/common | ||
|
||
RUN cd /${BACKEND} && make | ||
|
||
FROM scratch | ||
ARG BACKEND=vllm | ||
COPY --from=builder /${BACKEND}/ / |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
bark==0.1.5 | ||
transformers | ||
accelerate | ||
torch==2.4.1 | ||
torchaudio==2.4.1 | ||
--extra-index-url https://download.pytorch.org/whl/cpu | ||
torch==2.4.1+cpu | ||
torchaudio==2.4.1+cpu |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
bark==0.1.5 | ||
--extra-index-url https://download.pytorch.org/whl/cu118 | ||
torch==2.4.1+cu118 | ||
torchaudio==2.4.1+cu118 | ||
transformers | ||
accelerate | ||
accelerate |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
bark==0.1.5 | ||
torch==2.4.1 | ||
torchaudio==2.4.1 | ||
transformers | ||
accelerate | ||
accelerate |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
bark==0.1.5 | ||
--extra-index-url https://download.pytorch.org/whl/rocm6.0 | ||
torch==2.4.1+rocm6.0 | ||
torchaudio==2.4.1+rocm6.0 | ||
transformers | ||
accelerate | ||
accelerate |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,3 @@ | ||
bark==0.1.5 | ||
grpcio==1.71.0 | ||
protobuf | ||
certifi | ||
certifi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
accelerate | ||
torch==2.6.0 | ||
torchaudio==2.6.0 | ||
--extra-index-url https://download.pytorch.org/whl/cpu | ||
torch==2.6.0+cpu | ||
torchaudio==2.6.0+cpu | ||
transformers==4.46.3 | ||
chatterbox-tts |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
transformers==4.48.3 | ||
accelerate | ||
torch==2.4.1 | ||
coqui-tts | ||
--extra-index-url https://download.pytorch.org/whl/cpu | ||
torch==2.4.1+cpu | ||
coqui-tts |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
--extra-index-url https://download.pytorch.org/whl/cpu | ||
transformers | ||
accelerate | ||
torch | ||
torch==2.7.1+cpu | ||
kokoro | ||
soundfile | ||
soundfile |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
transformers | ||
accelerate | ||
torch==2.4.1 | ||
rerankers[transformers] | ||
--extra-index-url https://download.pytorch.org/whl/cpu | ||
torch==2.4.1+cpu | ||
rerankers[transformers] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why having a separate Dockerfile? when build-type is empty we already treat it as a CPU build
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Dockerfile is supposed to build vLLM from source, as PyPi similar to Torch only has CUDA release. The aim behind the CPU builds is to, where we don't need any other changes, use CPU specific builds of libraries, as for example just installing
torch
using PyPi repository on a CPU image adds baggage of more than 4 GBs worth of NVIDIA CUDA dependencies the package pulls in (It is well visible on the master CI build of Kitten TTS right now - the TTS itself is not GPU accelerated, but since one of the libs wants Torch, you get more than 5 GB extra dependencies in Torch + CUDA. I have just for fun built it locally with edited requirements to preinstall CPU Torch, and it fell to 1.16 GB image size.). That is why I went and blanket added extra index pointing to CPU releases of Torch everywhere. With vLLM, it is a bit more complicated - to get CPU release, it has to be built from source. We have a part ofinstall.sh
for that, but that part never runs with normal Dockerfile, as at some point, the argumentFROM_SOURCE
was removed. Since the build also has its specific deps, I made extra Dockerfile that installs build deps according to vLLM docs about building the CPU version. It also successfully builds, but for some reason crashes on init when called by LocalAI, and I have no idea how to properly get the whole stacktrace - GRPC returns only part of it. This is one of the reasons why the PR is a draft, not straight PR - I want to try to get this CPU build working.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, that Dockerfile could become
.vllm
instead of.vllmcpu
- every other GPU than NVIDIA needs to be built from source. But for the start, I focused on CPU, as that is the only platform I can reliably test.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I rename the file in preparation for potential addition of ROCm and XPU parts into it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the point of building vllm from CPU, but I've just run a diff manually locally here against the two Dockerfiles (
Dockerfile.python
andDockerfile.vllmcpu
) and I don't see notable differences. My point is more that I think we can still use the same Dockerfile, and handle the installation bits directly in the make/install of the backend, unless am I missing something?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well. the dependency block I talk about are dependencies from APT, as listed in the vLLM docs. That means that
requirements-cpu.txt
is not a way. They are GCC, the C++ libraries required to compile vLLM and few tools vLLM uses in its makfiles. Here is the block from vLLM docs that dictates the extra dependencies:We already have some, but we are missing these:
which all have to be installed as APT packages, since vLLM is compiled from C++ code. Normally, we just pull Python dependencies, as even CPU Torch is available already pre-compiled for the C++ parts. vLLM is compiled for CPU fully from scratch, so unless we decide to not ship CPU vLLM, we have to provide these somehow.
I can see if I can make it work with
install.sh
, since as it is just a shell script, and the builder runs as root, it should work. The only thing is, if I put them here, those building from custom Dockerfiles won't thank me - people build for example on Arch builders, and putting that there limits build platform to Debian & derivative distros only (without manual intervention). Dockerfile was chosen not only as experimentation shortcut (only the fact that there was a separate one was a testing shortcut), but also to keep the backend source directory platform agnostic.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I think it's OK to put it in the
Dockerfile.python
builder. Especially because at the end of the day that container is used only for building, so in the worse case we would have to copy the libraries to the final backend during the packaging phase.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my suggestion here probably would be to do this step-by-step for each backend, or at least treat vLLM separately to not make this PR go stale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I think splitting it backend by backend will be the best way. I will prepare per backend branches and PRs for the ready ones ASAP. The working ones I will just have to figure out the CI for, the rest will be opened whenever I get a moment to sit down and finish them. Last few weeks were a bit busy, as I am in the middle of autumn terms for bachelor finals. I think with that, I will be closing this one then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes sounds good to me, we can follow-up on the other PRs. Thanks! (and good luck with your finals! )