Skip to content

Commit faab245

Browse files
authored
Adding aic-hw-version Compile Options Support (#528)
This pull request introduces support for compile-time options via keyword arguments (`kwargs`), including the `aic-hw-version` parameter, which now accepts values `"ai100"` or `"ai200"`. If no value is provided, the default is `"ai100"`, representing the AI100 hardware. These enhancements allow users to tailor the `compile` API to better suit their specific requirements. ### Example Usage: ```python from QEfficient import QEFFAutoModelForCausalLM from transformers import AutoTokenizer model_name = "gpt2" model = QEFFAutoModelForCausalLM.from_pretrained(model_name, num_hidden_layers=2) model.compile(prefill_seq_len=128, ctx_len=256, num_cores=16, num_devices=1, **{'aic-hw-version': 'ai100'}) tokenizer = AutoTokenizer.from_pretrained(model_name) model.generate(prompts=["Hi there!!"], tokenizer=tokenizer) ``` > **Note:** Previously, the default value for `aic-hw-version` was `"2.0"`, which implicitly referred to AI100. This value is now deprecated and replaced with the explicit `"ai100"` identifier. --------- Signed-off-by: Abukhoyer Shaik <[email protected]>
1 parent 9d9e44a commit faab245

File tree

6 files changed

+68
-11
lines changed

6 files changed

+68
-11
lines changed

QEfficient/base/modeling_qeff.py

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -146,14 +146,20 @@ def compile(self, *args, **kwargs) -> Path:
146146
:mxfp6_matmul (bool): Use MXFP6 to compress weights for MatMul nodes to run faster on device. ``Defaults to False``.
147147
:mxint8_kv_cache (bool): Use MXINT8 to compress KV-cache on device to access and update KV-cache faster. ``Defaults to False``.
148148
:compiler_options: Pass any compiler option as input.
149-
Following flag can be passed in compiler_options to enable QNN Compilation path.
150-
:enable_qnn (bool): Enables QNN Compilation. ``Defaults to False. if not passed.``
151-
:qnn_config (str): Path of QNN Config parameters file. ``Defaults to None. if not passed``
152-
for QAIC compilation path, any flag that is supported by ``qaic-exec`` can be passed. Params are converted to flags as below:
153-
- aic_num_cores=16 -> -aic-num-cores=16
154-
- convert_to_fp16=True -> -convert-to-fp16
149+
150+
Following flag can be passed in compiler_options to enable QNN Compilation path.
151+
:enable_qnn (bool): Enables QNN Compilation. ``Defaults to False. if not passed.``
152+
:qnn_config (str): Path of QNN Config parameters file. ``Defaults to None. if not passed``
153+
154+
for QAIC compilation path, any flag that is supported by ``qaic-exec`` can be passed. Params are converted to flags as below:
155+
156+
- aic_num_cores=16 -> -aic-num-cores=16
157+
- convert_to_fp16=True -> -convert-to-fp16
158+
- aic_hw_version=ai100 -> -aic-hw-version=ai100
159+
- aic_hw_version=ai200 -> -aic-hw-version=ai200
155160
156161
``QEFFAutoModelForCausalLM`` Args:
162+
157163
:full_batch_size (int): Full batch size to allocate cache lines.
158164
:batch_size (int): Batch size to compile for. ``Defaults to 1``.
159165
:prefill_seq_len (int): Prefill sequence length to compile for. Prompt will be chunked according to this length.
@@ -311,8 +317,12 @@ def _compile(
311317
:qnn_config (str): Path of QNN Config parameters file. Any extra parameters for QNN compilation can be passed via this file. ``Defaults to None.``
312318
:compiler_options: Pass any compiler option as input.
313319
Any flag that is supported by `qaic-exec` can be passed. Params are converted to flags as below:
320+
314321
- aic_num_cores=16 -> -aic-num-cores=16
315322
- convert_to_fp16=True -> -convert-to-fp16
323+
- aic_hw_version=ai100 -> -aic-hw-version=ai100
324+
- aic_hw_version=ai200 -> -aic-hw-version=ai200
325+
316326
For QNN Compilation path, when enable_qnn is set to True, any parameter passed in compiler_options will be ignored.
317327
"""
318328
if onnx_path is None and self.onnx_path is None:
@@ -344,7 +354,13 @@ def _compile(
344354

345355
return self.qpc_path
346356

347-
command = constants.COMPILER + [f"-m={onnx_path}"]
357+
command = (
358+
constants.COMPILER
359+
+ [
360+
f"-aic-hw-version={compiler_options.pop('aic_hw_version', compiler_options.pop('aic-hw-version', constants.DEFAULT_AIC_HW_VERSION))}"
361+
]
362+
+ [f"-m={onnx_path}"]
363+
)
348364

349365
if mdp_ts_json_path := compiler_options.pop("mdp_load_partition_config", None):
350366
command.append(f"-mdp-load-partition-config={mdp_ts_json_path}")

QEfficient/cloud/infer.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,8 +143,12 @@ def main(
143143
:qnn_config (str): Path of QNN Config parameters file. ``Defaults to None.``
144144
:trust_remote_code (bool): Trust remote code execution. ``Defaults to False.``
145145
:kwargs: Pass any compiler option as input. Any flag that is supported by `qaic-exec` can be passed. Params are converted to flags as below:
146-
-allocator_dealloc_delay=1 -> -allocator-dealloc-delay=1
147-
-qpc_crc=True -> -qpc-crc
146+
147+
- `allocator_dealloc_delay=1` → `-allocator-dealloc-delay=1`
148+
- `qpc_crc=True` → `-qpc-crc`
149+
- `aic_hw_version=ai100` → `-aic-hw-version=ai100`
150+
- `aic_hw_version=ai200` → `-aic-hw-version=ai200`
151+
148152
149153
.. code-block:: bash
150154

QEfficient/compile/compile_helper.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from typing import List, Optional, Tuple
1414

1515
from QEfficient.compile.qnn_compiler import compile as qnn_compile
16+
from QEfficient.utils import constants
1617
from QEfficient.utils._utils import load_json, load_yaml
1718
from QEfficient.utils.logging_utils import logger
1819

@@ -77,7 +78,7 @@ def compile_kv_model_on_cloud_ai_100(
7778
"/opt/qti-aic/exec/qaic-exec",
7879
f"-m={onnx_path}",
7980
"-aic-hw",
80-
"-aic-hw-version=2.0",
81+
f"-aic-hw-version={kwargs.pop('aic_hw_version', kwargs.pop('aic-hw-version', constants.DEFAULT_AIC_HW_VERSION))}",
8182
f"-network-specialization-config={specializations_json}",
8283
"-convert-to-fp16",
8384
"-retained-state",
@@ -167,6 +168,10 @@ def compile(
167168
:allow_mxint8_mdp_io (bool): Allows MXINT8 compression of MDP IO traffic ``Defaults to False.``
168169
:enable_qnn (bool): Enables QNN Compilation. ``Defaults to False.``
169170
:qnn_config (str): Path of QNN Config parameters file. ``Defaults to None.``
171+
:kwargs: Pass any compiler option as input. Any flag that is supported by `qaic-exec` can be passed. Params are converted to flags as below:
172+
173+
- `aic_hw_version=ai100` → `-aic-hw-version=ai100`
174+
- `aic_hw_version=ai200` → `-aic-hw-version=ai200`
170175
171176
Returns:
172177
:str: Path to compiled ``qpc`` package.

QEfficient/transformers/models/modeling_auto.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,12 +287,20 @@ def compile(
287287
:num_cores (int): Number of cores used to compile the model.
288288
:mxfp6_matmul (bool, optional): Whether to use ``mxfp6`` compression for weights. ``Defaults to False``.
289289
:compiler_options (dict, optional): Additional compiler options.
290+
290291
For QAIC Compiler: Extra arguments for qaic-exec can be passed.
291292
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
292293
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
294+
295+
Params are converted to flags as below:
296+
297+
- aic_hw_version=ai100 -> -aic-hw-version=ai100
298+
- aic_hw_version=ai200 -> -aic-hw-version=ai200
299+
293300
For QNN Compiler: Following arguments can be passed.
294301
:enable_qnn (bool): Enables QNN Compilation.
295302
:qnn_config (str): Path of QNN Config parameters file. Any extra parameters for QNN compilation can be passed via this file.
303+
296304
Returns:
297305
:str: Path of the compiled ``qpc`` package.
298306
"""
@@ -1701,13 +1709,19 @@ def compile(
17011709
:num_speculative_tokens (int, optional): Number of speculative tokens to take as input for Speculative Decoding Target Language Model.
17021710
:prefill_only (bool): if ``True`` compile for prefill only and if ``False`` compile for decode only. Defaults to None, which compiles for both ``prefill and ``decode``.
17031711
:compiler_options (dict, optional): Additional compiler options. ``Defaults to None``.
1712+
17041713
For QAIC Compiler: Extra arguments for qaic-exec can be passed.
17051714
:mos (int, optional): Effort level to reduce on-chip memory. Defaults to -1, meaning no effort. ``Defaults to -1``.
17061715
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
17071716
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
1717+
17081718
Params are converted to flags as below:
1719+
17091720
- aic_num_cores=16 -> -aic-num-cores=16
17101721
- convert_to_fp16=True -> -convert-to-fp16
1722+
- aic_hw_version=ai100 -> -aic-hw-version=ai100
1723+
- aic_hw_version=ai200 -> -aic-hw-version=ai200
1724+
17111725
For QNN Compiler: Following arguments can be passed.
17121726
:enable_qnn (bool): Enables QNN Compilation.
17131727
:qnn_config (str): Path of QNN Config parameters file. Any extra parameters for QNN compilation can be passed via this file.
@@ -1960,6 +1974,10 @@ def compile(
19601974
:num_cores (int): Number of cores used to compile the model.
19611975
:mxfp6_matmul (bool, optional): Whether to use ``mxfp6`` compression for weights. ``Defaults to False``.
19621976
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
1977+
:compiler_options (dict, optional): Additional compiler options. ``Defaults to None``.
1978+
1979+
- aic_hw_version=ai100 -> -aic-hw-version=ai100
1980+
- aic_hw_version=ai200 -> -aic-hw-version=ai200
19631981
19641982
Other args are not yet implemented for AutoModelForSpeechSeq2Seq
19651983
Returns:

QEfficient/utils/constants.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,8 @@ def get_models_dir():
8585
ONNX_EXPORT_EXAMPLE_MIN_PS = 0.99
8686
ONNX_EXPORT_OPSET = 13
8787

88-
COMPILER = ["/opt/qti-aic/exec/qaic-exec", "-aic-hw", "-aic-hw-version=2.0"]
88+
COMPILER = ["/opt/qti-aic/exec/qaic-exec", "-aic-hw"]
89+
DEFAULT_AIC_HW_VERSION = "ai100"
8990

9091
# InternVL constants
9192
# Fixing the feature size with reference to OpenGVLab/InternVL2_5-1B, OpenGVLab/InternVL2_5-38B and OpenGVLab/InternVL2_5-78B

docs/source/quick_start.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,20 @@ To disable MQ, just pass single soc like below, below step will compile the mode
116116
```bash
117117
python -m QEfficient.cloud.infer --model_name gpt2 --batch_size 1 --prompt_len 32 --ctx_len 128 --mxfp6 --num_cores 16 --device-group [0] --prompt "My name is" --mos 1 --aic_enable_depth_first
118118
```
119+
### Device Selection for Inference
119120

121+
You can choose which device to run your inference on. By default, it will run on **AI 100 Core**.
122+
123+
To specify a different device, use the `aic-hw-version` option:
124+
```
125+
aic-hw-version = 'ai100' # Default
126+
aic-hw-version = 'ai200' # To run on AI 200 Core
127+
```
128+
129+
130+
```bash
131+
python -m QEfficient.cloud.infer --model_name gpt2 --batch_size 1 --prompt_len 32 --ctx_len 128 --mxfp6 --num_cores 16 --device_group [0] --prompt "My name is" --mos 1 --aic_enable_depth_first --aic-hw-version ai100
132+
```
120133
### Continuous Batching
121134

122135
Users can compile a model utilizing the continuous batching feature by specifying full_batch_size <full_batch_size_value> in the infer and compiler APIs. If full_batch_size is not provided, the model will be compiled in the regular way.

0 commit comments

Comments
 (0)