Skip to content

[Docs] Fix syntax highlighting of shell commands #19870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 23, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .buildkite/nightly-benchmarks/nightly-annotation.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Please download the visualization scripts in the post
- Download `nightly-benchmarks.zip`.
- In the same folder, run the following code:

```console
```bash
export HF_TOKEN=<your HF token>
apt update
apt install -y git
Expand Down
12 changes: 6 additions & 6 deletions docs/deployment/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ title: Using Docker
vLLM offers an official Docker image for deployment.
The image can be used to run OpenAI compatible server and is available on Docker Hub as [vllm/vllm-openai](https://hub.docker.com/r/vllm/vllm-openai/tags).

```console
```bash
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
Expand All @@ -22,7 +22,7 @@ docker run --runtime nvidia --gpus all \

This image can also be used with other container engines such as [Podman](https://podman.io/).

```console
```bash
podman run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
Expand Down Expand Up @@ -71,7 +71,7 @@ You can add any other [engine-args][engine-args] you need after the image tag (`

You can build and run vLLM from source via the provided <gh-file:docker/Dockerfile>. To build vLLM:

```console
```bash
# optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
DOCKER_BUILDKIT=1 docker build . \
--target vllm-openai \
Expand Down Expand Up @@ -99,7 +99,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--

??? Command

```console
```bash
# Example of building on Nvidia GH200 server. (Memory usage: ~15GB, Build time: ~1475s / ~25 min, Image size: 6.93GB)
python3 use_existing_torch.py
DOCKER_BUILDKIT=1 docker build . \
Expand All @@ -118,7 +118,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--

Run the following command on your host machine to register QEMU user static handlers:

```console
```bash
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
```

Expand All @@ -128,7 +128,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--

To run vLLM with the custom-built Docker image:

```console
```bash
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \
Expand Down
2 changes: 1 addition & 1 deletion docs/deployment/frameworks/anything-llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
vllm serve Qwen/Qwen1.5-32B-Chat-AWQ --max-model-len 4096
```

Expand Down
4 changes: 2 additions & 2 deletions docs/deployment/frameworks/autogen.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ title: AutoGen

- Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment

```console
```bash
pip install vllm

# Install AgentChat and OpenAI client from Extensions
Expand All @@ -23,7 +23,7 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-Instruct-v0.2
```
Expand Down
6 changes: 3 additions & 3 deletions docs/deployment/frameworks/cerebrium.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [Cerebrium](https://www.cerebr

To install the Cerebrium client, run:

```console
```bash
pip install cerebrium
cerebrium login
```

Next, create your Cerebrium project, run:

```console
```bash
cerebrium init vllm-project
```

Expand Down Expand Up @@ -58,7 +58,7 @@ Next, let us add our code to handle inference for the LLM of your choice (`mistr

Then, run the following code to deploy it to the cloud:

```console
```bash
cerebrium deploy
```

Expand Down
2 changes: 1 addition & 1 deletion docs/deployment/frameworks/chatbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
```

Expand Down
4 changes: 2 additions & 2 deletions docs/deployment/frameworks/dify.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ This guide walks you through deploying Dify using a vLLM backend.

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
vllm serve Qwen/Qwen1.5-7B-Chat
```

- Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):

```console
```bash
git clone https://github.com/langgenius/dify.git
cd dify
cd docker
Expand Down
4 changes: 2 additions & 2 deletions docs/deployment/frameworks/dstack.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [dstack](https://dstack.ai/),

To install dstack client, run:

```console
```bash
pip install "dstack[all]
dstack server
```

Next, to configure your dstack project, run:

```console
```bash
mkdir -p vllm-dstack
cd vllm-dstack
dstack init
Expand Down
4 changes: 2 additions & 2 deletions docs/deployment/frameworks/haystack.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac

- Setup vLLM and Haystack environment

```console
```bash
pip install vllm haystack-ai
```

## Deploy

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
vllm serve mistralai/Mistral-7B-Instruct-v0.1
```

Expand Down
4 changes: 2 additions & 2 deletions docs/deployment/frameworks/helm.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,15 @@ Before you begin, ensure that you have the following:

To install the chart with the release name `test-vllm`:

```console
```bash
helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f values.yaml --set secrets.s3endpoint=$ACCESS_POINT --set secrets.s3bucketname=$BUCKET --set secrets.s3accesskeyid=$ACCESS_KEY --set secrets.s3accesskey=$SECRET_KEY
```

## Uninstalling the Chart

To uninstall the `test-vllm` deployment:

```console
```bash
helm uninstall test-vllm --namespace=ns-vllm
```

Expand Down
6 changes: 3 additions & 3 deletions docs/deployment/frameworks/litellm.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ And LiteLLM supports all models on VLLM.

- Setup vLLM and litellm environment

```console
```bash
pip install vllm litellm
```

Expand All @@ -28,7 +28,7 @@ pip install vllm litellm

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
```

Expand Down Expand Up @@ -56,7 +56,7 @@ vllm serve qwen/Qwen1.5-0.5B-Chat

- Start the vLLM server with the supported embedding model, e.g.

```console
```bash
vllm serve BAAI/bge-base-en-v1.5
```

Expand Down
4 changes: 2 additions & 2 deletions docs/deployment/frameworks/open-webui.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ title: Open WebUI

2. Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
```

1. Start the [Open WebUI](https://github.com/open-webui/open-webui) docker container (replace the vllm serve host and vllm serve port):

```console
```bash
docker run -d -p 3000:8080 \
--name open-webui \
-v open-webui:/app/backend/data \
Expand Down
12 changes: 6 additions & 6 deletions docs/deployment/frameworks/retrieval_augmented_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Here are the integrations:

- Setup vLLM and langchain environment

```console
```bash
pip install -U vllm \
langchain_milvus langchain_openai \
langchain_community beautifulsoup4 \
Expand All @@ -26,14 +26,14 @@ pip install -U vllm \

- Start the vLLM server with the supported embedding model, e.g.

```console
```bash
# Start embedding service (port 8000)
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
```

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
# Start chat service (port 8001)
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
```
Expand All @@ -52,7 +52,7 @@ python retrieval_augmented_generation_with_langchain.py

- Setup vLLM and llamaindex environment

```console
```bash
pip install vllm \
llama-index llama-index-readers-web \
llama-index-llms-openai-like \
Expand All @@ -64,14 +64,14 @@ pip install vllm \

- Start the vLLM server with the supported embedding model, e.g.

```console
```bash
# Start embedding service (port 8000)
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
```

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
# Start chat service (port 8001)
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
```
Expand Down
16 changes: 8 additions & 8 deletions docs/deployment/frameworks/skypilot.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ vLLM can be **run and scaled to multiple service replicas on clouds and Kubernet
- Check that you have installed SkyPilot ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)).
- Check that `sky check` shows clouds or Kubernetes are enabled.

```console
```bash
pip install skypilot-nightly
sky check
```
Expand Down Expand Up @@ -71,7 +71,7 @@ See the vLLM SkyPilot YAML for serving, [serving.yaml](https://github.com/skypil

Start the serving the Llama-3 8B model on any of the candidate GPUs listed (L4, A10g, ...):

```console
```bash
HF_TOKEN="your-huggingface-token" sky launch serving.yaml --env HF_TOKEN
```

Expand All @@ -83,7 +83,7 @@ Check the output of the command. There will be a shareable gradio link (like the

**Optional**: Serve the 70B model instead of the default 8B and use more GPU:

```console
```bash
HF_TOKEN="your-huggingface-token" \
sky launch serving.yaml \
--gpus A100:8 \
Expand Down Expand Up @@ -159,15 +159,15 @@ SkyPilot can scale up the service to multiple service replicas with built-in aut

Start the serving the Llama-3 8B model on multiple replicas:

```console
```bash
HF_TOKEN="your-huggingface-token" \
sky serve up -n vllm serving.yaml \
--env HF_TOKEN
```

Wait until the service is ready:

```console
```bash
watch -n10 sky serve status vllm
```

Expand Down Expand Up @@ -271,13 +271,13 @@ This will scale the service up to when the QPS exceeds 2 for each replica.

To update the service with the new config:

```console
```bash
HF_TOKEN="your-huggingface-token" sky serve update vllm serving.yaml --env HF_TOKEN
```

To stop the service:

```console
```bash
sky serve down vllm
```

Expand Down Expand Up @@ -317,7 +317,7 @@ It is also possible to access the Llama-3 service with a separate GUI frontend,

1. Start the chat web UI:

```console
```bash
sky launch \
-c gui ./gui.yaml \
--env ENDPOINT=$(sky serve status --endpoint vllm)
Expand Down
6 changes: 3 additions & 3 deletions docs/deployment/frameworks/streamlit.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,21 @@ It can be quickly integrated with vLLM as a backend API server, enabling powerfu

- Start the vLLM server with the supported chat completion model, e.g.

```console
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
```

- Install streamlit and openai:

```console
```bash
pip install streamlit openai
```

- Use the script: <gh-file:examples/online_serving/streamlit_openai_chatbot_webserver.py>

- Start the streamlit web UI and start to chat:

```console
```bash
streamlit run streamlit_openai_chatbot_webserver.py

# or specify the VLLM_API_BASE or VLLM_API_KEY
Expand Down
2 changes: 1 addition & 1 deletion docs/deployment/integrations/llamastack.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ vLLM is also available via [Llama Stack](https://github.com/meta-llama/llama-sta

To install Llama Stack, run

```console
```bash
pip install llama-stack -q
```

Expand Down
Loading