Skip to content

Conversation

aymuos15
Copy link
Collaborator

Kicking off the voronoi matcher.

I found it to be the easiest through a new matching scheme (which will be easier then to recursively put on more matching schemes on top of it).

Obviously open to change and discussion based on what everyone feels is right.

(apologies for the extra code, I branched off the wrong way -> They should disappear once the cupy PR is merged! Hope that is okay)

@Copilot Copilot AI review requested due to automatic review settings September 16, 2025 08:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new Voronoi-based region matching algorithm for instance matching, along with CuPy GPU backend support for connected components analysis. The implementation adds RegionBasedMatching as a new matching scheme that uses spatial distance rather than traditional overlap-based metrics.

  • Implements RegionBasedMatching algorithm using distance transforms and connected components
  • Adds CuPy GPU backend support for accelerated connected components analysis
  • Updates result handling to support NaN values for region-based matching metrics

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
unit_tests/test_region_matching.py Basic test for RegionBasedMatching functionality
unit_tests/test_region_integration.py Integration test with full panoptic evaluation pipeline
unit_tests/test_region_comprehensive.py Comprehensive test suite covering various matching scenarios
unit_tests/test_cupy_connected_components.py Complete test suite for CuPy backend functionality
unit_tests/test_config.py Updates config tests to include CuPy backend
pyproject.toml Adds CuPy as optional GPU dependency with CUDA variants
panoptica/utils/constants.py Adds CuPy backend to CCABackend enum
panoptica/panoptica_result.py Updates type annotations and metric calculations for NaN support
panoptica/panoptica_evaluator.py Adds region-based matching detection and NaN handling
panoptica/instance_matcher.py Implements RegionBasedMatching algorithm
panoptica/_functionals.py Adds CuPy backend support to _connected_components
panoptica/init.py Exports RegionBasedMatching class
benchmark/benchmark.py Adds CuPy benchmarking functions
.github/workflows/tests.yml Adds separate CUDA testing workflow

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

region_mask = (labeled_array == region_label)

# Compute distance transform
distance = distance_transform_edt(~region_mask)
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computing distance transforms for all regions sequentially can be inefficient for large numbers of regions. Consider vectorizing this operation or using parallel processing for better performance with many ground truth regions.

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would move this to a functionals function and import that here.
Otherwise, I agree with copilot, this can be made faster I think.

_connected_components(mask, CCABackend.cupy)

panoptica_cupy_time = timeit.timeit(label_panoptica_cupy, number=10)

Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark_panoptica_cupy function lacks GPU memory cleanup unlike benchmark_cupy. Consider adding memory pool cleanup: cp.get_default_memory_pool().free_all_blocks() after the timing to ensure consistent memory usage across benchmarks.

Suggested change
# Clean up GPU memory
cp.get_default_memory_pool().free_all_blocks()

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines +67 to +74
except (ImportError, Exception) as e:
# CuPy not available or CUDA issues, skip this test
if "CUDA" in str(e) or "cupy" in str(e).lower():
self.skipTest(
f"CuPy/CUDA not available for connected components test: {e}"
)
else:
raise
Copy link

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching broad 'Exception' and then checking error message strings is unreliable. Consider catching specific exceptions like CuPy's runtime errors (e.g., cupy.cuda.runtime.CUDARuntimeError) for more precise error handling.

Copilot uses AI. Check for mistakes.

@aymuos15 aymuos15 changed the title Voronoimatching [Feature] New Matcher Class: Voronoi Sep 16, 2025
@neuronflow
Copy link
Contributor

@MarcelRosier #231 seems to have crashed our tests 🙈

@MarcelRosier
Copy link
Collaborator

@MarcelRosier #231 seems to have crashed our tests 🙈

I think the lockfile is simply outdated in this branch.

@aymuos15 Can you try to run poetry lock and commit the updated file?

@aymuos15
Copy link
Collaborator Author

Ah okay. I will do this GMT evening. Unfortunately do not have access to my machine atm. Thanks for pointing it out.

@aymuos15
Copy link
Collaborator Author

@MarcelRosier sorry for the delay but does this work?

Comment on lines 13 to 22
else:
torch = None

if TYPE_CHECKING:
import torch


def load_torch_image(image_path: str | Path) -> torch.Tensor:
def load_torch_image(image_path: Union[str, Path]):
if torch is None:
raise ImportError("torch is not available. Please install torch to use this functionality.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that an issue? The input checker already governs package availability and should not even call this if it is not available, no?

pyproject.toml Outdated
Comment on lines 35 to 41
SimpleITK = "^2.2.2"
nibabel = "^5.1.0"
pynrrd = "^1.1.3"
joblib = "^1.3.2"
tqdm = ">=4.62.3"
auxiliary = ">=0.1.0"
pytest = ">=8.1.1"
Copy link
Collaborator

@Hendrik-code Hendrik-code Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want people to install every package we support by default, do we?

Copy link
Collaborator Author

@aymuos15 aymuos15 Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mandatory for the tests, thats why I added this. Auxiliary I think is gonna go away #217

Otherwise these are very lightweight anyways right? My theory is that people need not install anything themselves in order to run tests. Obviously happy to revert.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call nibabel, simpleITK, pynrrd, joblib, and torch together lightweight^^
I believe @neuronflow and my idea was to keep panoptica's requirement to a minimum. And if people develop it and require the unittests, that's the point where they install the dev dependencies, right? That's what they are for. Am I mistaken?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry about the torch one, that was wobbly on my part. I am fixing that now. I do think the others are okay.

but yes, I guess I do agree need not install everything for the tests, I will revert.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @Hendrik-code

Comment on lines 13 to 22
from importlib.util import find_spec

# Optional torch import
_spec = find_spec("torch")
if _spec is not None:
import torch
HAS_TORCH = True
else:
torch = None
HAS_TORCH = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is why torch is in the dev dependancies, and thus this should not be a problem, is it for you?

@aymuos15
Copy link
Collaborator Author

Sorry about the mess but I think things finally work now in terms of CI? If this is okay< I will port the same to #228

Comment on lines +445 to +449
# For region-based matching, set TP to NaN since it doesn't use traditional counting
tp_value = processing_pair.tp
if instance_matcher.__class__.__name__ == "RegionBasedMatching":
tp_value = np.nan

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I see. you create more labelmap entries and thus indirectly create the region-based loop we discussed.
Although that is fine, I really don't like something hacky like this thing.

I much rather would say the matcher just takes care of the rest by calling the remainder of the pipeline itself (no need to create labelamp entries, that all is practically useless computation) and then the matcher itself can set those values to nan after the corresponding result objects have been created. You agree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This seemed the most minimal approach in hindsight and thats why I went ahead with it. How do you feel about this then:

git diff --no-color panoptica/instance_matcher.py panoptica/utils/processing_pair.py panoptica/instance_evaluator.py panoptica/panoptica_evaluator.py | cat
diff --git a/panoptica/instance_evaluator.py b/panoptica/instance_evaluator.py
index 8b5727b..fb5707d 100644
--- a/panoptica/instance_evaluator.py
+++ b/panoptica/instance_evaluator.py
@@ -30,7 +30,11 @@ def evaluate_matched_instance(
         ], "decision metric not contained in eval_metrics"
         assert decision_threshold is not None, "decision metric set but no threshold"
     # Initialize variables for True Positives (tp)
-    tp = len(matched_instance_pair.matched_instances)
+    # For non-traditional counting (e.g., region-based matching), tp should be NaN
+    if hasattr(matched_instance_pair, 'use_traditional_counting') and not matched_instance_pair.use_traditional_counting:
+        tp = float('nan')
+    else:
+        tp = len(matched_instance_pair.matched_instances)
     score_dict: dict[Metric, list[float]] = {m: [] for m in eval_metrics}
 
     reference_arr, prediction_arr = (
diff --git a/panoptica/instance_matcher.py b/panoptica/instance_matcher.py
index 0eb4afc..8df679a 100644
--- a/panoptica/instance_matcher.py
+++ b/panoptica/instance_matcher.py
@@ -118,7 +118,7 @@ class InstanceMatchingAlgorithm(SupportsConfig, metaclass=ABCMeta):
             **kwargs,
         )
 
-        return map_instance_labels(unmatched_instance_pair.copy(), instance_labelmap)
+        return map_instance_labels(unmatched_instance_pair.copy(), instance_labelmap, use_traditional_counting=True)
 
     def _calculate_matching_metric_pairs(
         self,
@@ -164,7 +164,9 @@ class InstanceMatchingAlgorithm(SupportsConfig, metaclass=ABCMeta):
 
 
 def map_instance_labels(
-    processing_pair: UnmatchedInstancePair, labelmap: InstanceLabelMap
+    processing_pair: UnmatchedInstancePair, 
+    labelmap: InstanceLabelMap,
+    use_traditional_counting: bool = True
 ) -> MatchedInstancePair:
     """
     Map instance labels based on the provided labelmap and create a MatchedInstancePair.
@@ -198,6 +200,7 @@ def map_instance_labels(
     matched_instance_pair = MatchedInstancePair(
         prediction_arr=prediction_arr_relabeled,
         reference_arr=processing_pair.reference_arr,
+        use_traditional_counting=use_traditional_counting,
     )
     return matched_instance_pair
 
@@ -524,6 +527,39 @@ class RegionBasedMatching(InstanceMatchingAlgorithm):
             cca_backend (CCABackend): Backend for connected component analysis.
         """
         self._cca_backend = cca_backend
+        
+    def match_instances(
+        self,
+        unmatched_instance_pair: UnmatchedInstancePair,
+        label_group=None,
+        num_ref_labels=None,
+        processing_pair_orig_shape=None,
+        **kwargs,
+    ) -> MatchedInstancePair:
+        """
+        Override to set use_traditional_counting=False for region-based matching.
+        """
+        # Create context if needed
+        context = None
+        if (
+            label_group is not None
+            or num_ref_labels is not None
+            or processing_pair_orig_shape is not None
+        ):
+            context = MatchingContext(
+                label_group=label_group,
+                num_ref_labels=num_ref_labels,
+                processing_pair_orig_shape=processing_pair_orig_shape,
+            )
+
+        instance_labelmap = self._match_instances(
+            unmatched_instance_pair,
+            context,
+            **kwargs,
+        )
+
+        # Use use_traditional_counting=False since region-based matching doesn't use traditional TP semantics
+        return map_instance_labels(unmatched_instance_pair.copy(), instance_labelmap, use_traditional_counting=False)
 
     def _get_gt_regions(self, gt: np.ndarray) -> Tuple[np.ndarray, int]:
         """
diff --git a/panoptica/panoptica_evaluator.py b/panoptica/panoptica_evaluator.py
index 6b2e2cf..2c8feee 100644
--- a/panoptica/panoptica_evaluator.py
+++ b/panoptica/panoptica_evaluator.py
@@ -442,10 +442,8 @@ def panoptic_evaluate(
             else instance_metadata["original_num_refs"]
         )
 
-        # For region-based matching, set TP to NaN since it doesn't use traditional counting
+        # Use tp from processing_pair (already handles NaN for non-traditional counting methods)
         tp_value = processing_pair.tp
-        if instance_matcher.__class__.__name__ == "RegionBasedMatching":
-            tp_value = np.nan
 
         processing_pair = PanopticaResult(
             reference_arr=processing_pair.reference_arr,
diff --git a/panoptica/utils/processing_pair.py b/panoptica/utils/processing_pair.py
index fed0b99..a061269 100644
--- a/panoptica/utils/processing_pair.py
+++ b/panoptica/utils/processing_pair.py
@@ -322,11 +322,13 @@ class MatchedInstancePair(_ProcessingPairInstanced):
         missed_reference_labels (list[int]): Reference labels with no matching prediction.
         missed_prediction_labels (list[int]): Prediction labels with no matching reference.
         matched_instances (list[int]): Labels matched between prediction and reference arrays.
+        use_traditional_counting (bool): Whether this matching uses traditional TP/FP/FN semantics.
     """
 
     missed_reference_labels: list[int]
     missed_prediction_labels: list[int]
     matched_instances: list[int]
+    use_traditional_counting: bool
 
     def __init__(
         self,
@@ -337,6 +339,7 @@ class MatchedInstancePair(_ProcessingPairInstanced):
         matched_instances: list[int] | None = None,
         n_prediction_instance: int | None = None,
         n_reference_instance: int | None = None,
+        use_traditional_counting: bool = True,
     ) -> None:
         """Initializes a MatchedInstancePair
 
@@ -348,6 +351,7 @@ class MatchedInstancePair(_ProcessingPairInstanced):
             matched_instances (int | None, optional): matched instances labels, i.e. unique matched labels in both maps. Defaults to None.
             n_prediction_instance (int | None, optional): Number of prediction instances. Defaults to None.
             n_reference_instance (int | None, optional): Number of reference instances. Defaults to None.
+            use_traditional_counting (bool, optional): Whether this matching uses traditional TP/FP/FN semantics. Defaults to True.
 
             For each argument: If none, will calculate on initialization.
         """
@@ -360,6 +364,7 @@ class MatchedInstancePair(_ProcessingPairInstanced):
         if matched_instances is None:
             matched_instances = [i for i in self.pred_labels if i in self.ref_labels]
         self.matched_instances = matched_instances
+        self.use_traditional_counting = use_traditional_counting
 
         if missed_reference_labels is None:
             missed_reference_labels = list(
@@ -389,6 +394,7 @@ class MatchedInstancePair(_ProcessingPairInstanced):
             missed_reference_labels=self.missed_reference_labels,
             missed_prediction_labels=self.missed_prediction_labels,
             matched_instances=self.matched_instances,
+            use_traditional_counting=self.use_traditional_counting,
         )

Comment on lines +776 to +795
if np.isnan(res.tp):
return np.nan
return res.num_pred_instances - res.tp


def fn(res: PanopticaResult):
if np.isnan(res.tp):
return np.nan
return res.num_ref_instances - res.tp


def prec(res: PanopticaResult):
if np.isnan(res.tp):
return np.nan
return res.tp / (res.tp + res.fp)


def rec(res: PanopticaResult):
if np.isnan(res.tp):
return np.nan
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a better solution for this? I feel like putting these if-clauses everywhere is not the best solution (especially when we have to think about long-term here)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff --git a/panoptica/panoptica_result.py b/panoptica/panoptica_result.py
index aaaf560..b080ee7 100644
--- a/panoptica/panoptica_result.py
+++ b/panoptica/panoptica_result.py
@@ -105,6 +105,10 @@ class PanopticaResult(object):
             default_value=tp,
             was_calculated=True,
         )
+        
+        # Check if tp is NaN and handle dependent metrics centrally
+        self._tp_is_nan = np.isnan(tp) if isinstance(tp, float) else False
+        
         # endregion
         #
         # region Basic
@@ -112,36 +116,46 @@ class PanopticaResult(object):
         self._add_metric(
             "fp",
             MetricType.MATCHING,
-            fp,
+            fp if not self._tp_is_nan else None,
             long_name="False Positives",
+            default_value=np.nan if self._tp_is_nan else None,
+            was_calculated=self._tp_is_nan,
         )
         self.fn: int | float  # Allow NaN for region-based matching
         self._add_metric(
             "fn",
             MetricType.MATCHING,
-            fn,
+            fn if not self._tp_is_nan else None,
             long_name="False Negatives",
+            default_value=np.nan if self._tp_is_nan else None,
+            was_calculated=self._tp_is_nan,
         )
         self.prec: int | float  # Allow NaN for region-based matching
         self._add_metric(
             "prec",
             MetricType.NO_PRINT,
-            prec,
+            prec if not self._tp_is_nan else None,
             long_name="Precision (positive predictive value)",
+            default_value=np.nan if self._tp_is_nan else None,
+            was_calculated=self._tp_is_nan,
         )
         self.rec: int | float  # Allow NaN for region-based matching
         self._add_metric(
             "rec",
             MetricType.NO_PRINT,
-            rec,
+            rec if not self._tp_is_nan else None,
             long_name="Recall (sensitivity)",
+            default_value=np.nan if self._tp_is_nan else None,
+            was_calculated=self._tp_is_nan,
         )
         self.rq: float
         self._add_metric(
             "rq",
             MetricType.MATCHING,
-            rq,
+            rq if not self._tp_is_nan else None,
             long_name="Recognition Quality / F1-Score",
+            default_value=np.nan if self._tp_is_nan else None,
+            was_calculated=self._tp_is_nan,
         )
         # endregion
         #
@@ -773,26 +787,18 @@ class PanopticaResult(object):
 
 # region Basic
 def fp(res: PanopticaResult):
-    if np.isnan(res.tp):
-        return np.nan
     return res.num_pred_instances - res.tp
 
 
 def fn(res: PanopticaResult):
-    if np.isnan(res.tp):
-        return np.nan
     return res.num_ref_instances - res.tp
 
 
 def prec(res: PanopticaResult):
-    if np.isnan(res.tp):
-        return np.nan
     return res.tp / (res.tp + res.fp)
 
 
 def rec(res: PanopticaResult):
-    if np.isnan(res.tp):
-        return np.nan
     return res.tp / (res.tp + res.fn)
 
 
@@ -803,8 +809,6 @@ def rq(res: PanopticaResult):
     Returns:
         float: Recognition Quality (RQ).
     """
-    if np.isnan(res.tp):
-        return np.nan
     if res.tp == 0:
         return 0.0 if res.num_pred_instances + res.num_ref_instances > 0 else np.nan
     return res.tp / (res.tp + 0.5 * res.fp + 0.5 * res.fn)

I defintely agree with that. Should've thought of that myself. How does this feel?

@Hendrik-code
Copy link
Collaborator

Great work here. I think we need to adress some points (see comments), but otherwise this is nice! @aymuos15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] voronoi metrics [QUESTION] Comparison to CC-Metrics [FEATURE] voronoi-based analysis

4 participants