Releases · intel/ipex-llm

Overview

This release introduces the latest update to the Multi-ARC vLLM serving solution, optimized for Intel Xeon + ARC platforms with ipex-llm vLLM. The new version delivers low latency and high throughput LLM serving with improved model compatibility and resource efficiency. Major component upgrades include: vLLM upgraded to 0.6.6, PyTorch upgraded to 2.6, oneAPI upgraded to 2025.0, oneCCL patch updated to 0.0.6.6.

New Features

Optimized vLLM serving for Intel Xeon + ARC multi-GPU platforms, enabling lower latency and higher throughput.
Supported various LLM models.
Enhanced support for loading models with minimal memory requirements.
Refined Docker image for improved ease of use and deployment.
Improved WebUI model connectivity and stability.
Added VLLM_LOG_OUTPUT=1 option to enable detailed input/output logging for vLLM.

Bug Fixes

Resolved multimodal issues including get_image failures and inference errors with models such as MiniCPM-V-2_6, Qwen2-VL, and GLM-4v-9B.
Fixed Qwen2-VL multi-request crash by removing Qwen2VisionAttention’s attention_mask and addressing mrope_positions instability.
Updated profile_run usage to avoid OOM (Out of Memory) crashes.
Resolved GQA kernel issues causing errors with multiple concurrent outputs.
Fixed --enable-prefix-caching none crash in specific cases.
Addressed low-bit overflow causing !!!!!! output error in DeepSeek-R1-Distill-Qwen-14B.
Resolved GPTQ and AWQ-related errors to improve compatibility across more models.

Docker Images

Highlights

Note: BigDL v2.3.0 has been updated to include functional and security updates. Users should update to the latest version.

Nano

Enhanced trace and quantization process (for PyTorch and TensorFlow model optimizations)
New inference optimization methods (including Intel ARC series GPU support, CPU fp16, JIT int8, etc.)
New inference/training features (including TorchCCL support, async inference pipeline, compressed model saving, automatic channels_last_3d, multi-instance training for customized TF train loop, etc.)
Performance enhancement and overhead reduction for inference optimized model
More user-friendly document and API design

Orca:

Step-by-step distributed TensorFlow and PyTorch tutorials for different data inputs.
Improvement and examples for distributed MMCV pipelines.
Further enhancement for Orca Estimator (more flexible PyTorch train loops via Hook, improved multi-output prediction, memory optimization for OpenVINO, etc.)

Chronos

70% latency reduction for Forecasters
New bigdl.chronos.aiops module for AIOps use case on top of Chronos algorithms.
Enhanced TF-based TCNForecaster to better accuracy

Friesian:

Automatic deployment of RecSys serving pipeline on Kubernetes with Helm Chart

PPML

TDX (both VM and CoCo) support for Big Data, DL Training & Serving (including TDX-VM orchestration & k8s deployment, TDXCC installation & deployment, attestation and key management support, etc.)
New Trusted Machine Learning toolkit (with secure and distributed SparkML & LightGBM support)
Trusted Big Data toolkit upgrade (>2x EPC usage reduction, Apache Flink support, Azure MAA support, multi-KMS support, etc.)
Trusted Deep Learning toolkit upgrade (with improved performance using BigDL Nano, tcmalloc, etc.)
Trusted DL Serving toolkit upgrade (with Torch Serve, TF-Serving, and improved throughput and latency)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Highlights

Uh oh!

Overview

New Features

Bug Fixes

Docker Images

Uh oh!

Uh oh!

Highlights

Uh oh!

Highlights

Uh oh!

Highlights

Uh oh!

Highlights

Uh oh!

Uh oh!

Uh oh!

Releases: intel/ipex-llm

2.3.0 nightly build

Uh oh!

IPEX-LLM release 2.2.0

Highlights

Uh oh!

Multi-Arc Serving release 0.1.0

Overview

New Features

Bug Fixes

Docker Images

Uh oh!

2.2.0 nightly build

Uh oh!

IPEX-LLM release 2.1.0

Highlights

Uh oh!

BigDL release 2.4.0

Highlights

Uh oh!

BigDL release 2.3.0

Highlights

Uh oh!

BigDL release 2.0.0

Highlights

Uh oh!

BigDL release 0.13.0

Uh oh!

BigDL release 0.12.2

Uh oh!