-
Notifications
You must be signed in to change notification settings - Fork 322
[Fix] Fix memory leak bug #1281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Warning Rate limit exceeded@kurisu6912 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 8 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
WalkthroughThe changes implement a closure-based wrapper pattern for DSL function mutation in Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant mutate
participant DSLMutator
participant Compiler
rect rgb(220, 230, 255)
Note over mutate,Compiler: OLD FLOW: In-place Mutation
User->>mutate: Call mutate(func)
mutate->>DSLMutator: Create DSLMutator()
DSLMutator->>DSLMutator: visit_FunctionDef<br/>(modifies func in-place)
DSLMutator-->>mutate: Modified func
mutate-->>User: Return mutated func
end
rect rgb(230, 255, 220)
Note over mutate,Compiler: NEW FLOW: Closure-based Construction
User->>mutate: Call mutate(func)
mutate->>mutate: Extract nonlocals<br/>from func context
mutate->>DSLMutator: Create DSLMutator<br/>(closure_names=nonlocals.keys())
DSLMutator->>DSLMutator: visit_FunctionDef<br/>(generates make_closure wrapper)
mutate->>Compiler: Compile transformed AST
Compiler-->>mutate: make_closure function
mutate->>mutate: Call make_closure<br/>(**nonlocals)
mutate-->>User: Return constructed func
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
👋 Hi! Thank you for contributing to the TileLang project. Please remember to run We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
testing/python/language/test_tilelang_language_frontend_v2.py (1)
148-204: Track the re-enablement of the commented test.The
test_torch_eqfunction has been disabled with a "not supported now" comment. Consider tracking this with a GitHub issue or TODO comment to ensure the test is either re-enabled (once support is restored) or permanently removed (if the feature is deprecated).Would you like me to help create an issue to track this, or add a more specific TODO comment with context?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
testing/python/language/test_tilelang_capture.py(1 hunks)testing/python/language/test_tilelang_language_frontend_v2.py(1 hunks)tilelang/language/tir/ir.pyi(1 hunks)tilelang/language/v2/ast.py(3 hunks)tilelang/language/v2/builder.py(4 hunks)tilelang/language/v2/dtypes.py(2 hunks)tilelang/language/v2/utils.py(3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
testing/python/language/test_tilelang_language_frontend_v2.pytesting/python/language/test_tilelang_capture.py
🧬 Code graph analysis (4)
tilelang/language/v2/ast.py (1)
tilelang/language/v2/utils.py (2)
get_func_nonlocals(33-54)get_compiled_object(92-109)
tilelang/language/v2/utils.py (1)
tilelang/utils/deprecated.py (1)
deprecated(14-41)
tilelang/language/v2/builder.py (1)
tilelang/language/v2/utils.py (1)
get_func_nonlocals(33-54)
testing/python/language/test_tilelang_capture.py (3)
src/tl_templates/cuda/reduce.h (1)
T(175-247)tilelang/transform/pass_config.py (1)
PassConfigKey(6-144)tilelang/language/v2/builder.py (2)
prim_func(152-156)prim_func(618-711)
🪛 Ruff (0.14.5)
testing/python/language/test_tilelang_capture.py
17-17: Undefined name float32
(F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Quick Lint
🔇 Additional comments (10)
tilelang/language/v2/utils.py (1)
57-76: LGTM - Deprecation strategy is appropriate.The function is correctly marked as deprecated with a clear warning about the memory leak danger. While the leak itself isn't fixed in this function, the deprecation approach allows for a gradual migration away from this dangerous pattern. The inline comment at line 71 provides additional context for maintainers.
tilelang/language/v2/dtypes.py (3)
39-39: Clever handling of NumPy dtype objects.The pattern
{np.dtype(k): v for k, v in _NUMPY_DTYPE_TO_STR.items()}ensures bothnp.int32andnp.dtype(np.int32)are recognized as valid keys, improving API flexibility.
67-88: Well-structured dtype mapping consolidation.The unified
_DTYPE_TO_STRmapping and the_STR_TO_TVM_DTYPE_CALLlookup table provide a clean separation between dtype-to-string conversion and string-to-FFI-call dispatch. The merge order (Python, NumPy, PyTorch) is appropriate as these represent progressively more specialized type systems.
91-127: Improved dtype dispatch with early lookup optimization.The refactored dispatch in
__dtype_call__efficiently handles common types through_STR_TO_TVM_DTYPE_CALLbefore falling back to dynamic construction. The error messages now reference the unified_DTYPE_TO_STRfor better user guidance.tilelang/language/tir/ir.pyi (1)
1-106: Comprehensive type stubs for TVM IR builder API.This new stub file provides extensive type hints for the TIR IR builder surface, covering arithmetic, transcendental, bitwise, comparison, and hardware-specific operations. The use of
TypeVar('_T')for polymorphic return types is appropriate for IR builder patterns.tilelang/language/v2/builder.py (1)
575-611: Memory leak fix: direct globals reference instead of copy.The key change at line 583 uses
func.__globals__directly rather than copying it with the spread operator. Combined with separate nonlocal extraction at line 584, this prevents the memory leak caused by creating a new globals dictionary that would retain references and prevent garbage collection.The enhanced string annotation handling (lines 593-610) maintains proper type resolution while avoiding the memory leak.
testing/python/language/test_tilelang_capture.py (2)
7-31: Well-designed GC verification test.The test correctly verifies that captured tensors are properly garbage collected after kernel construction. The use of weak references combined with explicit GC calls provides strong evidence that the memory leak has been fixed.
The static analysis warning about
'float32'on line 17 is a false positive—it's a string literal inside a type annotation, not a Python identifier.
33-35: Useful debugging code preserved for future use.The commented
objgraphdebugging code provides a helpful reference for investigating memory retention issues in the future, without affecting test execution.tilelang/language/v2/ast.py (2)
251-253: Explicit closure wrapper prevents memory leak.The new
make_closurewrapper pattern cleanly separates closure variables from the global namespace. By accepting closure names as explicit parameters (line 497) and returning a nested function, this approach avoids the need to copyfunc.__globals__, which was the root cause of the memory leak.Also applies to: 497-504
600-624: Excellent documentation and implementation of the memory leak fix.The detailed comment block (lines 602-615) provides crucial context about why the
make_closurepattern is necessary and how the previous approach with{**func.__globals__}caused memory leaks. The implementation correctly:
- Extracts nonlocals separately (line 600)
- Generates a closure wrapper via
DSLMutator(line 616)- Uses
func.__globals__directly without copying (line 622)- Binds nonlocals explicitly via
make_closure(**nonlocals)(line 624)This is the core fix that resolves the memory leak described in the PR.
|
LGTM, let's rebase with new tvm-ffi changes and fix lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
testing/python/language/test_tilelang_capture.py (1)
1-40: Clarify that the regression test actually exercises the leak scenarioThis test is intended to guard against retaining globals/closures after JIT/mutation, but
ais never referenced insideget_dummy_kernelordummy_kernel, so it isn’t obviously part of any closure or global namespace involved in the transformation. That means the test may already pass on the pre‑fix code and not truly detect the{**func.__globals__}leak it’s meant to cover.Consider updating the test so that a large tensor (like
a) is actually captured by the JIT’d function (e.g., referenced in the body or via a nonlocal) before building the kernel, and then verify that it’s no longer strongly referenced after construction. You may also want to guard the unconditionaltorch.cuda.empty_cache()with a CUDA‑availability check if these tests are ever run on CPU‑only environments.tilelang/language/v2/ast.py (1)
14-15: Two-stepmake_closureconstruction cleanly fixes the globals-copy leakThe updated mutate path and
DSLMutatorintegration look sound:
mutatenow collects nonlocals viautils.get_func_nonlocals(func)and passes onlynonlocals.keys()intoDSLMutator, so closure names are explicit.DSLMutator.__init__storingclosure_namesand using them invisit_FunctionDefto generate amake_closure(<closure_names...>)wrapper cleanly separates closure values from the global namespace.utils.get_compiled_object(..., globals=func.__globals__)compiles against the original module globals without constructing a{**func.__globals__}copy, avoiding the leak described in the comment.- The resulting structure (
fn = make_closure(**nonlocals), thenfn(builder)returning the inner function) matches theIRGenerator.gen: Callable[[BaseBuilder], Callable[_P, _T]]contract and keeps closure capture limited to what’s actually needed (nonlocals + builder), rather than retaining an expanded globals dict.Minor polish you could consider (optional):
- Widen the
closure_namestype annotation inDSLMutator.__init__toIterable[str]or similar, since you’re passingnonlocals.keys().- Fix the small typo in the explanatory comment in
mutate(“form” → “from”).Functionally, this rework is coherent and aligns well with the memory‑leak fix objective.
Also applies to: 251-254, 480-505, 599-625
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
testing/python/language/test_tilelang_capture.py(1 hunks)tilelang/language/v2/ast.py(3 hunks)tilelang/language/v2/builder.py(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.
Applied to files:
testing/python/language/test_tilelang_capture.py
🧬 Code graph analysis (3)
testing/python/language/test_tilelang_capture.py (3)
src/tl_templates/cuda/reduce.h (1)
T(178-250)tilelang/transform/pass_config.py (1)
PassConfigKey(6-144)tilelang/language/v2/builder.py (2)
prim_func(151-155)prim_func(632-725)
tilelang/language/v2/ast.py (1)
tilelang/language/v2/utils.py (2)
get_func_nonlocals(33-54)get_compiled_object(92-109)
tilelang/language/v2/builder.py (1)
tilelang/language/v2/utils.py (1)
get_func_nonlocals(33-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
- GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
🔇 Additional comments (1)
tilelang/language/v2/builder.py (1)
21-22: Closure-awareget_type_hintsand globals handling look correctUsing
globalns = func.__globals__andlocalns = utils.get_func_nonlocals(func)avoids constructing a new globals dict (and thus the{**func.__globals__}leak) while still matching howtyping.get_type_hintsresolves names for nested functions. The dtype string fast‑path viadt._all_dtypes+eval(..., globalns, localns)is constrained, and falling back toForwardRef+_eval_typefor other strings keeps general annotations working. Non‑string annotations being preserved as-is is consistent with the later_is_static_annotchecks. Overall this change is aligned with the memory‑leak fix and should keep type resolution behavior intact.Also applies to: 597-625
This pr fix a memory leak bug and add a test in DSLMutator, it use a expression of {**func.globals}, which make the globals unable to free
Summary by CodeRabbit
Tests
Refactor