You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding aic-hw-version Compile Options Support (#528)
This pull request introduces support for compile-time options via
keyword arguments (`kwargs`), including the `aic-hw-version` parameter,
which now accepts values `"ai100"` or `"ai200"`. If no value is
provided, the default is `"ai100"`, representing the AI100 hardware.
These enhancements allow users to tailor the `compile` API to better
suit their specific requirements.
### Example Usage:
```python
from QEfficient import QEFFAutoModelForCausalLM
from transformers import AutoTokenizer
model_name = "gpt2"
model = QEFFAutoModelForCausalLM.from_pretrained(model_name, num_hidden_layers=2)
model.compile(prefill_seq_len=128, ctx_len=256, num_cores=16, num_devices=1, **{'aic-hw-version': 'ai100'})
tokenizer = AutoTokenizer.from_pretrained(model_name)
model.generate(prompts=["Hi there!!"], tokenizer=tokenizer)
```
> **Note:** Previously, the default value for `aic-hw-version` was
`"2.0"`, which implicitly referred to AI100. This value is now
deprecated and replaced with the explicit `"ai100"` identifier.
---------
Signed-off-by: Abukhoyer Shaik <[email protected]>
For QAIC Compiler: Extra arguments for qaic-exec can be passed.
291
292
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
292
293
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
294
+
295
+
Params are converted to flags as below:
296
+
297
+
- aic_hw_version=ai100 -> -aic-hw-version=ai100
298
+
- aic_hw_version=ai200 -> -aic-hw-version=ai200
299
+
293
300
For QNN Compiler: Following arguments can be passed.
294
301
:enable_qnn (bool): Enables QNN Compilation.
295
302
:qnn_config (str): Path of QNN Config parameters file. Any extra parameters for QNN compilation can be passed via this file.
303
+
296
304
Returns:
297
305
:str: Path of the compiled ``qpc`` package.
298
306
"""
@@ -1701,13 +1709,19 @@ def compile(
1701
1709
:num_speculative_tokens (int, optional): Number of speculative tokens to take as input for Speculative Decoding Target Language Model.
1702
1710
:prefill_only (bool): if ``True`` compile for prefill only and if ``False`` compile for decode only. Defaults to None, which compiles for both ``prefill and ``decode``.
1703
1711
:compiler_options (dict, optional): Additional compiler options. ``Defaults to None``.
1712
+
1704
1713
For QAIC Compiler: Extra arguments for qaic-exec can be passed.
1705
1714
:mos (int, optional): Effort level to reduce on-chip memory. Defaults to -1, meaning no effort. ``Defaults to -1``.
1706
1715
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
1707
1716
:allow_mxint8_mdp_io (bool, optional): Allows MXINT8 compression of MDP IO traffic. ``Defaults to False.``
1717
+
1708
1718
Params are converted to flags as below:
1719
+
1709
1720
- aic_num_cores=16 -> -aic-num-cores=16
1710
1721
- convert_to_fp16=True -> -convert-to-fp16
1722
+
- aic_hw_version=ai100 -> -aic-hw-version=ai100
1723
+
- aic_hw_version=ai200 -> -aic-hw-version=ai200
1724
+
1711
1725
For QNN Compiler: Following arguments can be passed.
1712
1726
:enable_qnn (bool): Enables QNN Compilation.
1713
1727
:qnn_config (str): Path of QNN Config parameters file. Any extra parameters for QNN compilation can be passed via this file.
@@ -1960,6 +1974,10 @@ def compile(
1960
1974
:num_cores (int): Number of cores used to compile the model.
1961
1975
:mxfp6_matmul (bool, optional): Whether to use ``mxfp6`` compression for weights. ``Defaults to False``.
1962
1976
:aic_enable_depth_first (bool, optional): Enables DFS with default memory size. ``Defaults to False``.
1977
+
:compiler_options (dict, optional): Additional compiler options. ``Defaults to None``.
1978
+
1979
+
- aic_hw_version=ai100 -> -aic-hw-version=ai100
1980
+
- aic_hw_version=ai200 -> -aic-hw-version=ai200
1963
1981
1964
1982
Other args are not yet implemented for AutoModelForSpeechSeq2Seq
Users can compile a model utilizing the continuous batching feature by specifying full_batch_size <full_batch_size_value> in the infer and compiler APIs. If full_batch_size is not provided, the model will be compiled in the regular way.
0 commit comments