enable KV cache in miniGPT #263

windmaple · 2025-11-05T06:28:45Z

This PR enables KV cache for miniGPT inference (inference time on goes from ~9s to ~3s on my Cloudtop). Also updated to use simpler sharding annotation with newer NNX API.

Correctness is validated in https://colab.research.google.com/drive/1Fw2IQjH-UcGReOXw6ykqJaXKWUv3HN_O?resourcekey=0-lNpYdIeKUxoOMfG_KpZfAw&usp=sharing

windmaple · 2025-11-05T07:00:58Z

@emilyfertig This is a bit beyond the original revamping scope but I think it's nice to have because 1) we need to use the newer sharding annotation anyway at some point 2) KV cache is very commonly used for LLM inference. This would be a good reference regardless of whether we can get vLLM TPU integration.

salfaris

I was just about to suggest the updated nnx.with_partitioning changes! This is great work :)

docs/source/JAX_for_LLM_pretraining.ipynb

emilyfertig

Thanks!

docs/source/JAX_for_LLM_pretraining.ipynb

windmaple added 2 commits November 5, 2025 06:02

Enable KV cache for inference in miniGPT

e7ec510

Update text in miniGPT

9e78d3c

windmaple marked this pull request as ready for review November 5, 2025 06:58

salfaris reviewed Nov 5, 2025

View reviewed changes

docs/source/JAX_for_LLM_pretraining.ipynb Outdated Show resolved Hide resolved

emilyfertig reviewed Nov 5, 2025

View reviewed changes

docs/source/JAX_for_LLM_pretraining.ipynb Outdated Show resolved Hide resolved

windmaple added 2 commits November 6, 2025 01:47

Get rid of jax.lax calls for better readability in miniGPT

c117763

update nnx.with_partitioning changes again in miniGPT

9510d13

emilyfertig approved these changes Nov 6, 2025

View reviewed changes

emilyfertig merged commit bac0c90 into jax-ml:revamp-2025 Nov 6, 2025
3 checks passed

windmaple deleted the kvcache branch November 6, 2025 23:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable KV cache in miniGPT #263

enable KV cache in miniGPT #263

Uh oh!

windmaple commented Nov 5, 2025 •

edited

Loading

Uh oh!

windmaple commented Nov 5, 2025

Uh oh!

salfaris left a comment

Uh oh!

Uh oh!

emilyfertig left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enable KV cache in miniGPT #263

enable KV cache in miniGPT #263

Uh oh!

Conversation

windmaple commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

windmaple commented Nov 5, 2025

Uh oh!

salfaris left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

emilyfertig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

windmaple commented Nov 5, 2025 •

edited

Loading