Implement `gemv` numba dispatch #1418

jessegrabowski · 2025-05-24T09:29:00Z

Description

When working on #1416 I found that numba has terrible performance on matrix-vector multiplication:

@ricardoV94 thinks this is because numba probably only uses gemm for everything, and never uses gemv. Since we have a GEMV Op already, it will be very easy to follow the pattern I used in #1416 to make a dispatch for GEMV.

We might also have to add the GEMV rewrite to the numba mode as well -- I know they're disabled for jax, for example. Haven't checked for numba, but something to be aware of.

ricardoV94 · 2025-05-24T09:30:52Z

Yes all BLAS rewrites are disabled on NUMBA, those are the ones that add things like GEMM, GEMV, GER

ricardoV94 · 2025-05-27T14:16:36Z

Here is a MRE:

import numpy as np
import pytensor
import pytensor.tensor as pt

A = pt.matrix("A")
x = pt.vector("x")
out = A @ x

c_fn = pytensor.function([A, x], out, trust_input=True)
c_fn.dprint()
# CGemv{inplace} [id A] 2
#  ├─ AllocEmpty{dtype='float64'} [id B] 1
#  │  └─ Shape_i{0} [id C] 0
#  │     └─ A [id D]
#  ├─ 1.0 [id E]
#  ├─ A [id D]
#  ├─ x [id F]
#  └─ 0.0 [id G]

numba_fn = pytensor.function([A, x], out, mode="NUMBA", trust_input=True)
numba_fn.dprint()
# Squeeze{axis=1} [id A] 2
#  └─ dot [id B] 1
#     ├─ A [id C]
#     └─ ExpandDims{axis=1} [id D] 0
#        └─ x [id E]

rng = np.random.default_rng(1)
A_test = rng.normal(size=(1024, 512))
x_test = rng.normal(size=(512,))
np.testing.assert_allclose(c_fn(A_test, x_test), numba_fn(A_test, x_test))
%timeit c_fn(A_test, x_test) # 338 μs ± 8.23 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit numba_fn(A_test, x_test)  # 6.6 ms ± 978 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

jessegrabowski added numba linalg Linear algebra labels May 24, 2025

jessegrabowski changed the title ~~Implement gmbv numba dispatch~~ Implement gemv numba dispatch May 24, 2025

jessegrabowski mentioned this issue May 24, 2025

Implement BandedDot Op #1416

Open

11 tasks

ricardoV94 mentioned this issue May 27, 2025

Fix slow dot in numba #1426

Merged

ricardoV94 closed this as completed in #1426 May 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement `gemv` numba dispatch #1418

Implement `gemv` numba dispatch #1418

jessegrabowski commented May 24, 2025

ricardoV94 commented May 24, 2025

Uh oh!

ricardoV94 commented May 27, 2025

Uh oh!

Implement gemv numba dispatch #1418

Implement gemv numba dispatch #1418

Comments

jessegrabowski commented May 24, 2025

Description

ricardoV94 commented May 24, 2025

Uh oh!

ricardoV94 commented May 27, 2025

Uh oh!

Implement `gemv` numba dispatch #1418

Implement `gemv` numba dispatch #1418