Skip to content

Conversation

b8zhong
Copy link

@b8zhong b8zhong commented Oct 18, 2025

Motivation

Screenshot 2025-10-17 at 8 24 17 PM Screenshot 2025-10-17 at 8 25 29 PM

Technical Details

It seems to be spending a non trivial time in CUDA & it generally should be cached. We assume up to 8 devices.

Submission Checklist

@Copilot Copilot AI review requested due to automatic review settings October 18, 2025 03:28
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Cache results of device/arch detection to avoid repeated and costly runtime queries.

  • Add functools.lru_cache to get_arch and get_device
  • Import lru_cache and configure cache size to 8

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Collaborator

@valarLip valarLip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@valarLip valarLip requested a review from rahulbatra85 October 19, 2025 04:07
@b8zhong
Copy link
Author

b8zhong commented Oct 19, 2025

Let me fix the lint and the copilot suggestion, I think it's acc correct

@b8zhong b8zhong force-pushed the remove-device-check branch from 31a3a27 to ec0f7d0 Compare October 20, 2025 18:54
@b8zhong
Copy link
Author

b8zhong commented Oct 20, 2025

@valarLip Thanks for reviewing.

I fixed the lint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants