feat: Fast model loading for inference #125

bayo-ibm · 2025-05-19T22:28:49Z

Description of the change

This PR enables faster loading of a quantized model by calling only the functions/sub-functions needed to load a model while ignoring the functions needed for quantizing the model. An inference argument was added to the fms_mo argument to activate the function.

Related issue number

None

How to verify the PR

The PR is validated by performing Direct Quantization with Smooth Quant and parsing the inference argument with the rest of the argument. The validation was done with/without Qbmm.

Was the PR tested

I have added >=1 unit test(s) for every new method I have added.
I have ensured all unit tests pass

bayo-ibm · 2025-05-19T22:30:27Z

I added the new feature that allows fast model loading for inference.

fms_mo/__init__.py

fms_mo/dq.py

fms_mo/prep.py

BrandonGroth · 2025-05-20T20:16:11Z

fms_mo/prep.py

@@ -535,6 +535,30 @@ def has_quantized_module(model):
    """Check if model is already quantized - do not want to quantize twice if so"""
    return any(isinstance(m, quantized_modules) for m in model.modules())

+def swap_qbmm(model, qcfg):


Need to add doc string and add datatypes to function args

BrandonGroth · 2025-05-21T14:12:29Z

fms_mo/utils/qconfig_utils.py

@@ -623,7 +623,7 @@ def qconfig_save(
 def qconfig_load(fname: str = "qcfg.json") -> dict:
    """Read config in json format, work together with qconfig_save"""
    config = get_recipe(fname)
-
+    


Dead spacing here. Delete it.

fms_mo/dq.py

BrandonGroth

A few more nitpicks.

Also, please run the following and fix anything that lint or spellcheck does. "tox -e fix" will automatically change files, you just have to add + commit them. If multiple changes are needed, package them up in 1 commit if possible.

tox -e fix
tox -e lint
tox -e spellcheck

Signed-off-by: omobayode.fagbohungbe <[email protected]>

bayo-ibm requested review from andrea-fasoli, BrandonGroth, chichun-charlie-liu, kcirred, nwang-ibm and tharapalanivel as code owners May 19, 2025 22:28

BrandonGroth changed the title ~~Bayo local~~ feat: Fast model loading for inference May 20, 2025

github-actions bot added the feat label May 20, 2025