⚡️ Speed up function masks2poly
by 69% in PR #1586 (tune-mask2polygon
)
#1587
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1586
If you approve this dependent PR, these changes will be merged into the original PR branch
tune-mask2polygon
.📄 69% (0.69x) speedup for
masks2poly
ininference/core/utils/postprocess.py
⏱️ Runtime :
18.1 milliseconds
→10.7 milliseconds
(best of234
runs)📝 Explanation and details
The optimized code achieves a 69% speedup through three key performance improvements:
1. Faster empty mask detection: Replaces
np.any(m_uint8)
withnp.count_nonzero(m_uint8) == 0
. The profiler shows this reduces the most expensive line from 36.6% to 15.1% of total time.count_nonzero
is significantly faster on dense binary arrays, especially for the common case of empty masks where it can short-circuit early.2. Optimized contour selection: Instead of creating a temporary array
np.array([len(x) for x in contours])
and callingargmax()
, the code uses a simple loop to track the largest contour directly. This eliminates array allocation overhead and is particularly effective when there's only one contour (common case), reducingmask2poly
time from 54.8% to 56.5% but with better per-hit performance.3. Minor loop optimizations:
segments.append
as a local variable to avoid repeated attribute lookupsmask.dtype
once to avoid repeated property accessastype(np.uint8, copy=False)
The optimizations are most effective for:
These improvements compound especially well in typical computer vision workflows where many masks are empty or contain simple shapes.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-pr1586-2025-09-24T18.40.57
and push.