-
-
Notifications
You must be signed in to change notification settings - Fork 90
feat(ai): onnx runtime upgrade #594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hello @kallebysantos 😋 I'm currently testing this PR locally, but it seems the return scores of these dot product lines are quite different from the main branch. This PR
(These values should be 1 or less, but it seems they aren't? 🧐) Main Branch
|
Hi @nyannyacha 💚 Seems I did miss write the |
Alright @kallebysantos, I ran integration tests on the latest commit locally, and it looks fine. 😄 Have a great day! |
BenchmarkThis PR
Main branch:
Based on the benchmark, there appears to be a significant regression in the number of handled requests. And could you change the base of this PR from develop to main? |
5f422f6
to
3bf1963
Compare
Hi Nya thanks for your feedback 💚 Here you can see more about the |
Let's see what migration options they offered...
The options available to us appear to be either not upgrading to this version or choosing the first option, which introduces minimal regression. ...On the other hand, I also see comments like this:
Indeed, if we look at the commit that changed the signature of
If we don't mind practicing black magic, there are ways to trick rustc and make the reference to the shared Session mutable and pass it around. 🙃 (Though it does seem quite risky since CUDA EP can also be used. 😅)
Maybe based on this statement, we wrap it with a Mutex only when using CUDA EP and fully accept the regression, and on the other hand, for CPU EP, we could use black magic to bypass rustc's function signature checks, completely resolving the regression. What do you think? 😋 |
Hi @nyannyacha thanks for helping 💚
I'm ok with that 🧙♂️🪄 - Just would like to refer this other comment
To be honest, I'm not really sure if the CUDA support is still working. It became harder to test since I don't have easy access to a GPU machine, and I think that found some problems last time I did try. In my opinion we should focus on CPU only, then add GPU later based on demand. |
What kind of change does this PR introduce?
Refactor, upgrade
What is the current behavior?
Current the
ort rust backend
is usingort rc-9
&onnx v1.20.1
What is the new behavior?
This PR introduces:
ort
: library upgrade fromrc-9
torc-10
onnx
: support from1.20.1
to1.22.0
Additional context
This rc-10 version introduces the
Compiler
feature - I still didn't explored it yet, but seems that would be possible to AOT compilation during model caching that can speed up cold starts.Need help:
I would like to ask @nyannyacha 💚, if possible, to do
k6
tests comparing to the latest version of that.