Skip to content

[Feature request] Support strides for T.serial #991

@LyricZhao

Description

@LyricZhao

For lots of CUDA kernels, we conventionaly write as:

for (int i = thread_idx; i < numel; i += num_threads)
    out[i] = 0;

But in tilelang:

for i in T.serial(0, T.ceildiv(numel - thread_idx, num_threads)):
    j = thread_idx + i * num_threads
    out[j] = -1

which is annoying to manually compute the range and index transformation, also introduces more registers and lower index computation performance.

So I purpose supporting strides in T.serial.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions