Kernel fusion for Llama-v2 #538
TejaGollapudi
announced in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
https://twitter.com/pommedeterre33/status/1681935636129873920?t=VaxYpkbwNLKxly7icie8kw&s=19
I came across this great thread showing the benefits of kernel fusion for speeding up LLama-2 up to 1.8x using OpenAI's Triton kernels. (It may work with torch kernel fusions too).
Not sure if this would be beneficial for vLLM but it might be worth taking a look at 😄
Beta Was this translation helpful? Give feedback.
All reactions