Add support for GPU int2 prefill #316

huing4257 · 2025-07-30T03:19:07Z

Add gemm kernel for int2 weight. Also fix scaling problems in previous bitlinear kernel.

huing4257 · 2025-07-30T03:24:50Z

@microsoft-github-policy-service agree

huing4257 · 2025-07-30T10:14:52Z

fix #284 :

Could you help me explain Python?

Of course! Python is a high-level, interpreted programming language known for its simplicity and readability. Here are some key points to help you understand Python:
...

tsong-ms · 2025-08-01T07:36:47Z

gpu/bitnet_kernels/bitgemm.cu

+      // Extract and decompress the int2 values
+      int32_t compressed = B_compressed[compressed_block_idx * 32 + tile_idx];
+      int8_t decompressed[16];
+      decode_i2s_to_i8s(&compressed, decompressed);


many threads will dequant i2s with same weight, could we create a pre-process to cache the dequant result to share memory

Yes, blocks with the same blockN but different blockM will dequant the same weight. However, shared memory is only accessible to threads within the same thread block. If we want to cache the dequant result, I think we either dequant all weights in global memory, or we have to loop on M in every block to reuse the weight, which may lead to splitK to maximize parallel. How do you think we can implement this to optimize?

…LOCK_SIZE_N adjustable

huing4257 added 3 commits July 30, 2025 02:12

feat: bitnet gemm kernel

68f4c5d

feat: in2 prefill

aff3d81

Add tests for gemm kernel

6dbe265

huing4257 changed the title ~~Add support for int2 prefill~~ Add support for GPU int2 prefill Jul 30, 2025

tsong-ms reviewed Aug 1, 2025

View reviewed changes

Fuse convert to bf16 and scaling into kernel; make BLOCK_SIZE_M and B…

d1caa81

…LOCK_SIZE_N adjustable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for GPU int2 prefill #316

Add support for GPU int2 prefill #316

Uh oh!

huing4257 commented Jul 30, 2025

Uh oh!

huing4257 commented Jul 30, 2025

Uh oh!

huing4257 commented Jul 30, 2025

Uh oh!

tsong-ms Aug 1, 2025

Uh oh!

huing4257 Aug 1, 2025

Uh oh!

Uh oh!

Add support for GPU int2 prefill #316

Are you sure you want to change the base?

Add support for GPU int2 prefill #316

Uh oh!

Conversation

huing4257 commented Jul 30, 2025

Uh oh!

huing4257 commented Jul 30, 2025

Uh oh!

huing4257 commented Jul 30, 2025

Uh oh!

tsong-ms Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

huing4257 Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!