Optimized onnx transform class via multithreading #539

abhishek-singh591 · 2025-08-14T09:40:24Z

ONNX Transform Optimization with Thread Pool

Refactored the ONNX transform class to use a thread pool for parallelizing tensor operations, replacing the previous iterative loop. This resulted in a notable performance boost in the FP16Clip transform and a marginal improvement in split_tensor(), which may have further optimization potential.

Performance (LLaMA 3.1 8B)

Operation	Without Thread Pool	With Thread Pool
FP16 Transform	51.94 sec	38.78 sec
Split Tensor	15.30 sec	13.73 sec

Thread count is hardcoded to os.cpu_count() * 4 to better handle I/O-intensive workloads. Performance may vary depending on the machine's threading capabilities, so results may not be exactly reproducible across environments.

Signed-off-by: abhishek-singh591 <[email protected]>

vbaddi · 2025-08-19T08:43:45Z

@abhishek-singh591 this is good start, Can we verify a larger model. Say 70B?
Can you check on this model card: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct

abhishek-singh591 · 2025-08-19T08:55:28Z

@vbaddi

LLaMA 3.1 70B Performance Comparison

Without Thread Pooling

Execution time for f16: 419.0391 seconds
Execution time for Split: 194.3422 seconds

With Thread Pooling

Execution time for f16: 302.9838 seconds
Execution time for Split: 163.3579 seconds

abhishek-singh591 · 2025-08-19T14:05:22Z

@vbaddi
Updated the llama 3.1 70B results, hope this suffices for the scalability check.

Signed-off-by: abhishek-singh591 <[email protected]>

abhishek-singh591 requested review from quic-rishinr, ochougul, quic-hemagnih and quic-amitraj as code owners August 14, 2025 09:40

abhishek-singh591 added 4 commits August 14, 2025 10:26

Optimized onnx transform class via multithreading

406155b

Signed-off-by: abhishek-singh591 <[email protected]>

Reformated onnx_transforms.py for ruff check

b498685

Signed-off-by: abhishek-singh591 <[email protected]>

Reformated onnx_transform file

61945d9

Signed-off-by: abhishek-singh591 <[email protected]>

Reformated onnx_transform file

28b5279

Signed-off-by: abhishek-singh591 <[email protected]>

abhishek-singh591 force-pushed the optimized_eff branch from a505dc6 to 28b5279 Compare August 14, 2025 10:26

vbaddi assigned abhishek-singh591 Aug 19, 2025

vbaddi added the enhancement New feature or request label Aug 19, 2025

abhishek-singh591 added 7 commits August 22, 2025 17:56

Merged fp16 and split in onnx tranform.

6f77425

Merge remote-tracking branch 'upstream/main' into optimized_eff

d1007ed

Merged fp16 and split in onnx tranform and resolved conflits

3d4f2ad

Signed-off-by: abhishek-singh591 <[email protected]>

Merged fp16 and split in onnx tranform and resolved conflits

ca313a4

Signed-off-by: abhishek-singh591 <[email protected]>

Merged fp16 and split in onnx tranform and resolved conflits

df8bde4

Signed-off-by: abhishek-singh591 <[email protected]>

merged fp16 and split in onnx transform

2837cd0

Signed-off-by: abhishek-singh591 <[email protected]>

merged fp16 and split in onnx transform

c78d072

Signed-off-by: abhishek-singh591 <[email protected]>

abhishek-singh591 force-pushed the optimized_eff branch 2 times, most recently from 3acff3c to ef9f188 Compare August 23, 2025 08:36

merged fp16 and split in onnx transform

f8d9273

Signed-off-by: abhishek-singh591 <[email protected]>

abhishek-singh591 force-pushed the optimized_eff branch from ef9f188 to f8d9273 Compare August 23, 2025 08:43

abhishek-singh591 closed this Aug 23, 2025

This was referenced Aug 23, 2025

Optimized ONNX Transform with Merging and Thread Pooling #545

Closed

Optimized ONNX Transform via Class Merging and Thread Pooling #546

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimized onnx transform class via multithreading #539

Optimized onnx transform class via multithreading #539

Uh oh!

abhishek-singh591 commented Aug 14, 2025

Uh oh!

vbaddi commented Aug 19, 2025

Uh oh!

abhishek-singh591 commented Aug 19, 2025 •

edited

Loading

Uh oh!

abhishek-singh591 commented Aug 19, 2025

Uh oh!

Uh oh!

Optimized onnx transform class via multithreading #539

Optimized onnx transform class via multithreading #539

Uh oh!

Conversation

abhishek-singh591 commented Aug 14, 2025

ONNX Transform Optimization with Thread Pool

Performance (LLaMA 3.1 8B)

Uh oh!

vbaddi commented Aug 19, 2025

Uh oh!

abhishek-singh591 commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

LLaMA 3.1 70B Performance Comparison

Without Thread Pooling

With Thread Pooling

Uh oh!

abhishek-singh591 commented Aug 19, 2025

Uh oh!

Uh oh!

abhishek-singh591 commented Aug 19, 2025 •

edited

Loading