Skip to content

Conversation

@kurisu6912
Copy link
Collaborator

@kurisu6912 kurisu6912 commented Nov 19, 2025

This pr fix a memory leak bug and add a test in DSLMutator, it use a expression of {**func.globals}, which make the globals unable to free

Summary by CodeRabbit

  • Tests

    • Added test coverage for tilelang capture functionality.
  • Refactor

    • Improved internal handling of closure variable capture and type hint resolution.
    • Marked internal utility function as deprecated; users relying on it should prepare for future removal.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

Warning

Rate limit exceeded

@kurisu6912 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 13 minutes and 8 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 29ef843 and 82e425b.

📒 Files selected for processing (2)
  • tilelang/language/v2/builder.py (4 hunks)
  • tilelang/language/v2/utils.py (0 hunks)

Walkthrough

The changes implement a closure-based wrapper pattern for DSL function mutation in ast.py, refactor type hint resolution in builder.py to incorporate captured nonlocals, deprecate utility functions in utils.py, and add a test verifying tensor garbage collection behavior during kernel construction.

Changes

Cohort / File(s) Change Summary
DSL Mutation Refactoring
tilelang/language/v2/ast.py
DSLMutator now requires closure_names constructor parameter; visit_FunctionDef generates a make_closure wrapper that accepts closure variables; mutate method shifts from in-place mutation to two-step closure construction via make_closure(**nonlocals).
Type Hints and Namespace Resolution
tilelang/language/v2/builder.py
Refactored get_type_hints to use func.__globals__ and utils.get_func_nonlocals(func) for namespace assembly; added string annotation handling with ForwardRef resolution and dtype matching; preserves non-string annotations as-is.
Utility Deprecation
tilelang/language/v2/utils.py
Added @deprecated decorator to inspect_function_capture; included memory-leak warning comment for func.__globals__ usage.
Garbage Collection Test
testing/python/language/test_tilelang_capture.py
New test file with test_tilelang_capture() function; verifies tensor garbage collection via weak references within JIT-compiled kernel construction; includes commented debugging hooks.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant mutate
    participant DSLMutator
    participant Compiler
    
    rect rgb(220, 230, 255)
    Note over mutate,Compiler: OLD FLOW: In-place Mutation
    User->>mutate: Call mutate(func)
    mutate->>DSLMutator: Create DSLMutator()
    DSLMutator->>DSLMutator: visit_FunctionDef<br/>(modifies func in-place)
    DSLMutator-->>mutate: Modified func
    mutate-->>User: Return mutated func
    end
    
    rect rgb(230, 255, 220)
    Note over mutate,Compiler: NEW FLOW: Closure-based Construction
    User->>mutate: Call mutate(func)
    mutate->>mutate: Extract nonlocals<br/>from func context
    mutate->>DSLMutator: Create DSLMutator<br/>(closure_names=nonlocals.keys())
    DSLMutator->>DSLMutator: visit_FunctionDef<br/>(generates make_closure wrapper)
    mutate->>Compiler: Compile transformed AST
    Compiler-->>mutate: make_closure function
    mutate->>mutate: Call make_closure<br/>(**nonlocals)
    mutate-->>User: Return constructed func
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • ast.py: Fundamental refactoring of mutation flow from in-place to closure-based construction; requires careful verification that wrapper generation and closure capture semantics are correct across edge cases
  • builder.py: Moderate logic density; string annotation handling and ForwardRef resolution logic needs validation for dtype matching and eval namespace correctness
  • Interaction between ast.py and builder.py: Verify that nonlocal extraction and type hint resolution work correctly together in the new closure-based flow

Possibly related PRs

Poem

🐰 A rabbit's ode to closures spun,
Where wrapper functions catch and run,
No garbage left when kernels dance,
Weak references give truth a glance,
Mutations wrapped in types refined,
Nonlocals bound, no trace behind! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.59% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[Fix] Fix memory leak bug' is vague and doesn't clearly specify which memory leak or which component is affected. While it relates to the changeset, it lacks specificity about the root cause (closure variable capture in DSLMutator) or the fix approach.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the TileLang project.

Please remember to run pre-commit run --all-files in the root directory of the project to ensure your changes are properly linted and formatted. This will help ensure your contribution passes the format check.

We appreciate you taking this step! Our team will review your contribution, and we look forward to your awesome work! 🚀

@SiriusNEO SiriusNEO self-requested a review November 19, 2025 05:12
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
testing/python/language/test_tilelang_language_frontend_v2.py (1)

148-204: Track the re-enablement of the commented test.

The test_torch_eq function has been disabled with a "not supported now" comment. Consider tracking this with a GitHub issue or TODO comment to ensure the test is either re-enabled (once support is restored) or permanently removed (if the feature is deprecated).

Would you like me to help create an issue to track this, or add a more specific TODO comment with context?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 74da369 and 7c7a3ac.

📒 Files selected for processing (7)
  • testing/python/language/test_tilelang_capture.py (1 hunks)
  • testing/python/language/test_tilelang_language_frontend_v2.py (1 hunks)
  • tilelang/language/tir/ir.pyi (1 hunks)
  • tilelang/language/v2/ast.py (3 hunks)
  • tilelang/language/v2/builder.py (4 hunks)
  • tilelang/language/v2/dtypes.py (2 hunks)
  • tilelang/language/v2/utils.py (3 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • testing/python/language/test_tilelang_language_frontend_v2.py
  • testing/python/language/test_tilelang_capture.py
🧬 Code graph analysis (4)
tilelang/language/v2/ast.py (1)
tilelang/language/v2/utils.py (2)
  • get_func_nonlocals (33-54)
  • get_compiled_object (92-109)
tilelang/language/v2/utils.py (1)
tilelang/utils/deprecated.py (1)
  • deprecated (14-41)
tilelang/language/v2/builder.py (1)
tilelang/language/v2/utils.py (1)
  • get_func_nonlocals (33-54)
testing/python/language/test_tilelang_capture.py (3)
src/tl_templates/cuda/reduce.h (1)
  • T (175-247)
tilelang/transform/pass_config.py (1)
  • PassConfigKey (6-144)
tilelang/language/v2/builder.py (2)
  • prim_func (152-156)
  • prim_func (618-711)
🪛 Ruff (0.14.5)
testing/python/language/test_tilelang_capture.py

17-17: Undefined name float32

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Quick Lint
🔇 Additional comments (10)
tilelang/language/v2/utils.py (1)

57-76: LGTM - Deprecation strategy is appropriate.

The function is correctly marked as deprecated with a clear warning about the memory leak danger. While the leak itself isn't fixed in this function, the deprecation approach allows for a gradual migration away from this dangerous pattern. The inline comment at line 71 provides additional context for maintainers.

tilelang/language/v2/dtypes.py (3)

39-39: Clever handling of NumPy dtype objects.

The pattern {np.dtype(k): v for k, v in _NUMPY_DTYPE_TO_STR.items()} ensures both np.int32 and np.dtype(np.int32) are recognized as valid keys, improving API flexibility.


67-88: Well-structured dtype mapping consolidation.

The unified _DTYPE_TO_STR mapping and the _STR_TO_TVM_DTYPE_CALL lookup table provide a clean separation between dtype-to-string conversion and string-to-FFI-call dispatch. The merge order (Python, NumPy, PyTorch) is appropriate as these represent progressively more specialized type systems.


91-127: Improved dtype dispatch with early lookup optimization.

The refactored dispatch in __dtype_call__ efficiently handles common types through _STR_TO_TVM_DTYPE_CALL before falling back to dynamic construction. The error messages now reference the unified _DTYPE_TO_STR for better user guidance.

tilelang/language/tir/ir.pyi (1)

1-106: Comprehensive type stubs for TVM IR builder API.

This new stub file provides extensive type hints for the TIR IR builder surface, covering arithmetic, transcendental, bitwise, comparison, and hardware-specific operations. The use of TypeVar('_T') for polymorphic return types is appropriate for IR builder patterns.

tilelang/language/v2/builder.py (1)

575-611: Memory leak fix: direct globals reference instead of copy.

The key change at line 583 uses func.__globals__ directly rather than copying it with the spread operator. Combined with separate nonlocal extraction at line 584, this prevents the memory leak caused by creating a new globals dictionary that would retain references and prevent garbage collection.

The enhanced string annotation handling (lines 593-610) maintains proper type resolution while avoiding the memory leak.

testing/python/language/test_tilelang_capture.py (2)

7-31: Well-designed GC verification test.

The test correctly verifies that captured tensors are properly garbage collected after kernel construction. The use of weak references combined with explicit GC calls provides strong evidence that the memory leak has been fixed.

The static analysis warning about 'float32' on line 17 is a false positive—it's a string literal inside a type annotation, not a Python identifier.


33-35: Useful debugging code preserved for future use.

The commented objgraph debugging code provides a helpful reference for investigating memory retention issues in the future, without affecting test execution.

tilelang/language/v2/ast.py (2)

251-253: Explicit closure wrapper prevents memory leak.

The new make_closure wrapper pattern cleanly separates closure variables from the global namespace. By accepting closure names as explicit parameters (line 497) and returning a nested function, this approach avoids the need to copy func.__globals__, which was the root cause of the memory leak.

Also applies to: 497-504


600-624: Excellent documentation and implementation of the memory leak fix.

The detailed comment block (lines 602-615) provides crucial context about why the make_closure pattern is necessary and how the previous approach with {**func.__globals__} caused memory leaks. The implementation correctly:

  1. Extracts nonlocals separately (line 600)
  2. Generates a closure wrapper via DSLMutator (line 616)
  3. Uses func.__globals__ directly without copying (line 622)
  4. Binds nonlocals explicitly via make_closure(**nonlocals) (line 624)

This is the core fix that resolves the memory leak described in the PR.

@SiriusNEO
Copy link
Collaborator

LGTM, let's rebase with new tvm-ffi changes and fix lint

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
testing/python/language/test_tilelang_capture.py (1)

1-40: Clarify that the regression test actually exercises the leak scenario

This test is intended to guard against retaining globals/closures after JIT/mutation, but a is never referenced inside get_dummy_kernel or dummy_kernel, so it isn’t obviously part of any closure or global namespace involved in the transformation. That means the test may already pass on the pre‑fix code and not truly detect the {**func.__globals__} leak it’s meant to cover.

Consider updating the test so that a large tensor (like a) is actually captured by the JIT’d function (e.g., referenced in the body or via a nonlocal) before building the kernel, and then verify that it’s no longer strongly referenced after construction. You may also want to guard the unconditional torch.cuda.empty_cache() with a CUDA‑availability check if these tests are ever run on CPU‑only environments.

tilelang/language/v2/ast.py (1)

14-15: Two-step make_closure construction cleanly fixes the globals-copy leak

The updated mutate path and DSLMutator integration look sound:

  • mutate now collects nonlocals via utils.get_func_nonlocals(func) and passes only nonlocals.keys() into DSLMutator, so closure names are explicit.
  • DSLMutator.__init__ storing closure_names and using them in visit_FunctionDef to generate a make_closure(<closure_names...>) wrapper cleanly separates closure values from the global namespace.
  • utils.get_compiled_object(..., globals=func.__globals__) compiles against the original module globals without constructing a {**func.__globals__} copy, avoiding the leak described in the comment.
  • The resulting structure (fn = make_closure(**nonlocals), then fn(builder) returning the inner function) matches the IRGenerator.gen: Callable[[BaseBuilder], Callable[_P, _T]] contract and keeps closure capture limited to what’s actually needed (nonlocals + builder), rather than retaining an expanded globals dict.

Minor polish you could consider (optional):

  • Widen the closure_names type annotation in DSLMutator.__init__ to Iterable[str] or similar, since you’re passing nonlocals.keys().
  • Fix the small typo in the explanatory comment in mutate (“form” → “from”).

Functionally, this rework is coherent and aligns well with the memory‑leak fix objective.

Also applies to: 251-254, 480-505, 599-625

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c7a3ac and 29ef843.

📒 Files selected for processing (3)
  • testing/python/language/test_tilelang_capture.py (1 hunks)
  • tilelang/language/v2/ast.py (3 hunks)
  • tilelang/language/v2/builder.py (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-11-14T07:56:11.098Z
Learnt from: lucifer1004
Repo: tile-ai/tilelang PR: 1256
File: testing/python/jit/test_tilelang_jit_gemm_nvrtc.py:55-115
Timestamp: 2025-11-14T07:56:11.098Z
Learning: In `testing/python/jit/test_tilelang_jit_gemm_nvrtc.py`, the global function `tilelang_callback_cuda_postproc` registered via `tvm.register_global_func(..., override=True)` is intentionally not restored after the test completes, as the persistent behavior is expected.

Applied to files:

  • testing/python/language/test_tilelang_capture.py
🧬 Code graph analysis (3)
testing/python/language/test_tilelang_capture.py (3)
src/tl_templates/cuda/reduce.h (1)
  • T (178-250)
tilelang/transform/pass_config.py (1)
  • PassConfigKey (6-144)
tilelang/language/v2/builder.py (2)
  • prim_func (151-155)
  • prim_func (632-725)
tilelang/language/v2/ast.py (1)
tilelang/language/v2/utils.py (2)
  • get_func_nonlocals (33-54)
  • get_compiled_object (92-109)
tilelang/language/v2/builder.py (1)
tilelang/language/v2/utils.py (1)
  • get_func_nonlocals (33-54)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test for Python 3.12 with Metal (on macos-latest)
  • GitHub Check: Test for Python 3.12 with ROCm-6.3 (on self-hosted-amd)
🔇 Additional comments (1)
tilelang/language/v2/builder.py (1)

21-22: Closure-aware get_type_hints and globals handling look correct

Using globalns = func.__globals__ and localns = utils.get_func_nonlocals(func) avoids constructing a new globals dict (and thus the {**func.__globals__} leak) while still matching how typing.get_type_hints resolves names for nested functions. The dtype string fast‑path via dt._all_dtypes + eval(..., globalns, localns) is constrained, and falling back to ForwardRef + _eval_type for other strings keeps general annotations working. Non‑string annotations being preserved as-is is consistent with the later _is_static_annot checks. Overall this change is aligned with the memory‑leak fix and should keep type resolution behavior intact.

Also applies to: 597-625

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants