Skip to content

Speed up fill for high dimensional arrays #591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 25, 2025
Merged

Conversation

GiggleLiu
Copy link
Contributor

Fix the following issue:
ArrogantGao/benchmark_tropical_tensornetwork#1

After fix, it has 20x speed up on high dimensional arrays:

julia> @btime CUDA.@sync fill!($(CUDA.zeros(TropicalF32, fill(2, 20)...)), zero(TropicalF32));
  15.316 μs (57 allocations: 1.52 KiB)

@maleadt
Copy link
Member

maleadt commented Apr 25, 2025

@vchuravy Any idea where this overhead comes from? Suboptimal launch configuration, or the kernel-side div?

Fix the following issue:
ArrogantGao/benchmark_tropical_tensornetwork#1

After fix, it has 20x speed up on high dimensional arrays:
```julia
julia> @Btime CUDA.@sync fill!($(CUDA.zeros(TropicalF32, fill(2, 20)...)), zero(TropicalF32));
  15.316 μs (57 allocations: 1.52 KiB)
```
@maleadt maleadt merged commit 55a943e into JuliaGPU:master Apr 25, 2025
13 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants