Skip to content

Commit 4d2a4d8

Browse files
authored
[Docs Update:] Auto Classes are Separated from Python API (#550)
This PR is created for reformatting the documentation in a better way such as I have changed some content, indexing and linking. --------- Signed-off-by: Abukhoyer Shaik <[email protected]>
1 parent 6aaa75a commit 4d2a4d8

25 files changed

+2568
-1041
lines changed

QEfficient/cloud/execute.py

Lines changed: 48 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,24 +25,57 @@ def main(
2525
full_batch_size: Optional[int] = None,
2626
):
2727
"""
28-
Helper function used by execute CLI app to run the Model on ``Cloud AI 100`` Platform.
29-
30-
``Mandatory`` Args:
31-
:model_name (str): Hugging Face Model Card name, Example: ``gpt2``.
32-
:qpc_path (str): Path to the generated binary after compilation.
33-
``Optional`` Args:
34-
:device_group (List[int]): Device Ids to be used for compilation. if len(device_group) > 1. Multiple Card setup is enabled.``Defaults to None.``
35-
:local_model_dir (str): Path to custom model weights and config files. ``Defaults to None.``
36-
:prompt (str): Sample prompt for the model text generation. ``Defaults to None.``
37-
:prompts_txt_file_path (str): Path to txt file for multiple input prompts. ``Defaults to None.``
38-
:generation_len (int): Number of tokens to be generated. ``Defaults to None.``
39-
:cache_dir (str): Cache dir where downloaded HuggingFace files are stored. ``Defaults to Constants.CACHE_DIR.``
40-
:hf_token (str): HuggingFace login token to access private repos. ``Defaults to None.``
41-
:full_batch_size (int): Set full batch size to enable continuous batching mode. ``Defaults to None.``
28+
Main function for the QEfficient execution CLI application.
29+
30+
This function serves as the entry point for running a compiled model
31+
(QPC package) on the Cloud AI 100 Platform. It loads the necessary
32+
tokenizer and then orchestrates the text generation inference.
33+
34+
Parameters
35+
----------
36+
model_name : str
37+
Hugging Face Model Card name (e.g., ``gpt2``) for loading the tokenizer.
38+
qpc_path : str
39+
Path to the generated binary (QPC package) after compilation.
40+
41+
Other Parameters
42+
----------------
43+
device_group : List[int], optional
44+
List of device IDs to be used for inference. If `len(device_group) > 1`,
45+
a multi-card setup is enabled. Default is None.
46+
local_model_dir : str, optional
47+
Path to custom model weights and config files, used if not loading tokenizer
48+
from Hugging Face Hub. Default is None.
49+
prompt : str, optional
50+
Sample prompt(s) for the model text generation. For batch size > 1,
51+
pass multiple prompts separated by a pipe (``|``) symbol. Default is None.
52+
prompts_txt_file_path : str, optional
53+
Path to a text file containing multiple input prompts, one per line. Default is None.
54+
generation_len : int, optional
55+
Maximum number of tokens to be generated during inference. Default is None.
56+
cache_dir : str, optional
57+
Cache directory where downloaded HuggingFace files (like tokenizer) are stored.
58+
Default is None.
59+
hf_token : str, optional
60+
HuggingFace login token to access private repositories. Default is None.
61+
full_batch_size : int, optional
62+
Ignored in this context as continuous batching is managed by the compiled QPC.
63+
However, it might be passed through from CLI arguments. Default is None.
64+
65+
Example
66+
-------
67+
To execute a compiled model from the command line:
4268
4369
.. code-block:: bash
4470
45-
python -m QEfficient.cloud.execute OPTIONS
71+
python -m QEfficient.cloud.execute --model-name gpt2 --qpc-path /path/to/qpc/binaries --prompt "Hello world"
72+
73+
For multi-device inference:
74+
75+
.. code-block:: bash
76+
77+
python -m QEfficient.cloud.execute --model-name gpt2 --qpc-path /path/to/qpc/binaries --device-group "[0,1]" --prompt "Hello | Hi"
78+
4679
"""
4780
tokenizer = load_hf_tokenizer(
4881
pretrained_model_name_or_path=(local_model_dir if local_model_dir else model_name),

QEfficient/cloud/export.py

Lines changed: 52 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -25,16 +25,32 @@ def get_onnx_model_path(
2525
local_model_dir: Optional[str] = None,
2626
):
2727
"""
28-
exports the model to onnx if pre-exported file is not found and returns onnx_model_path
29-
30-
``Mandatory`` Args:
31-
:model_name (str): Hugging Face Model Card name, Example: ``gpt2``.
32-
``Optional`` Args:
33-
:cache_dir (str): Cache dir where downloaded HuggingFace files are stored. ``Defaults to None.``
34-
:tokenizer (Union[PreTrainedTokenizer, PreTrainedTokenizerFast]): Pass model tokenizer. ``Defaults to None.``
35-
:hf_token (str): HuggingFace login token to access private repos. ``Defaults to None.``
36-
:local_model_dir (str): Path to custom model weights and config files. ``Defaults to None.``
37-
:full_batch_size (int): Set full batch size to enable continuous batching mode. ``Defaults to None.``
28+
Exports the PyTorch model to ONNX format if a pre-exported file is not found,
29+
and returns the path to the ONNX model.
30+
31+
This function loads a Hugging Face model via QEFFCommonLoader, then calls
32+
its export method to generate the ONNX graph.
33+
34+
Parameters
35+
----------
36+
model_name : str
37+
Hugging Face Model Card name (e.g., ``gpt2``).
38+
39+
Other Parameters
40+
----------------
41+
cache_dir : str, optional
42+
Cache directory where downloaded HuggingFace files are stored. Default is None.
43+
hf_token : str, optional
44+
HuggingFace login token to access private repositories. Default is None.
45+
full_batch_size : int, optional
46+
Sets the full batch size to enable continuous batching mode. Default is None.
47+
local_model_dir : str, optional
48+
Path to custom model weights and config files. Default is None.
49+
50+
Returns
51+
-------
52+
str
53+
Path of the generated ONNX graph file.
3854
"""
3955
logger.info(f"Exporting Pytorch {model_name} model to ONNX...")
4056

@@ -58,20 +74,35 @@ def main(
5874
full_batch_size: Optional[int] = None,
5975
) -> None:
6076
"""
61-
Helper function used by export CLI app for exporting to ONNX Model.
62-
63-
``Mandatory`` Args:
64-
:model_name (str): Hugging Face Model Card name, Example: ``gpt2``.
65-
66-
``Optional`` Args:
67-
:cache_dir (str): Cache dir where downloaded HuggingFace files are stored. ``Defaults to None.``
68-
:hf_token (str): HuggingFace login token to access private repos. ``Defaults to None.``
69-
:local_model_dir (str): Path to custom model weights and config files. ``Defaults to None.``
70-
:full_batch_size (int): Set full batch size to enable continuous batching mode. ``Defaults to None.``
77+
Main function for the QEfficient ONNX export CLI application.
78+
79+
This function serves as the entry point for exporting a PyTorch model, loaded
80+
via QEFFCommonLoader, to the ONNX format. It prepares the necessary
81+
paths and calls `get_onnx_model_path`.
82+
83+
Parameters
84+
----------
85+
model_name : str
86+
Hugging Face Model Card name (e.g., ``gpt2``).
87+
88+
Other Parameters
89+
----------------
90+
cache_dir : str, optional
91+
Cache directory where downloaded HuggingFace files are stored. Default is None.
92+
hf_token : str, optional
93+
HuggingFace login token to access private repositories. Default is None.
94+
local_model_dir : str, optional
95+
Path to custom model weights and config files. Default is None.
96+
full_batch_size : int, optional
97+
Sets the full batch size to enable continuous batching mode. Default is None.
98+
99+
Example
100+
-------
101+
To export a model from the command line:
71102
72103
.. code-block:: bash
73104
74-
python -m QEfficient.cloud.export OPTIONS
105+
python -m QEfficient.cloud.export --model-name gpt2 --cache-dir /path/to/cache
75106
76107
"""
77108
cache_dir = check_and_assign_cache_dir(local_model_dir, cache_dir)

0 commit comments

Comments
 (0)