Change: Updates to aggregation #300

jiajic · 2025-05-02T17:32:21Z

Bumps {GiottoClass} to v0.5.0

Changes

harmonization of params in calculateOverlap() and overlapToMatrix()

New

aggregateFeatures() wrapper for running calculateOverlap() and overlapToMatrix()
overlapPointDT() and overlapIntensityDT() classes to store overlaps relationships efficiently and help with aggregation pipeline

Enhancements

aggregateFeatures() now has a method param to switch between "raster" and "vector" overlapping methods. The default method remains as "raster" for now, but may change in the future.

bug fixes

overlaps() will now properly find image overlaps

- Also adds some tests for aggregation and overlaps objects

Summary by CodeRabbit

New Features
- Introduced the aggregateFeatures function for streamlined spatial feature aggregation.
- Added new data classes for overlap information: overlapPointDT and overlapIntensityDT.
- New methods to convert overlap objects to data frames and matrices, and to display and combine them.
Improvements
- Harmonized and updated parameters across overlap calculation and aggregation functions for consistency.
- Enhanced subsetting, extraction, and display capabilities for overlap data.
- Improved documentation for new and updated functions and methods.
Bug Fixes
- Corrected overlap detection in image overlaps.
Tests
- Added comprehensive tests for aggregation and overlap functionalities.
Chores
- Updated package version and minimum R version requirement.

hotfix: createGiottoPolygons() and CI changes

feat: composable data processing framework

fix: terra rast() noflip

bugfixes

- new: `aggregateFeatures()` gobject function - param harmonizations: - `spatial_info` -> `spat_info` in `calculateOverlap()` - `poly_info` -> `spat_info` in `overlapToMatrix()` - `feat_subset_ids` -> `feat_subset_values` in `calculateOverlap()` - `count_info_column` -> `feat_count_column` in `calculateOverlap()` and `overlapToMatrix()` - `aggr_function` -> `fun` in `overlapToMatrtix()` - Deprecated functions: - `calculateOverlapRaster()` - `overlapImageToMatrix()`

also should support qptiff

…oObject()

…tch_agg_feats

github-advanced-security

lintr found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

…ntation

This reverts commit 47e2247, reversing changes made to a8bbfb6.

This reverts commit e18de85.

…tch_agg_feats

coderabbitai · 2025-06-16T15:14:09Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

This update introduces a new aggregation workflow for spatial features in the Giotto framework. It adds new classes (overlapPointDT, overlapIntensityDT), a high-level function (aggregateFeatures), and supporting methods for overlap calculation, subsetting, coercion, and display. Documentation, tests, and method signatures are updated for consistency and clarity.

Changes

Files/Group	Change Summary
DESCRIPTION	Version bump to 0.5.0; R dependency raised to 4.4.1.
NAMESPACE	Registers new S3 method `as.data.frame` for `overlapPointDT`; exports `aggregateFeatures`.
NEWS.md	Documents new features: parameter harmonization, new wrapper, new classes, and bug fix.
R/aggregate.R, man/aggregateFeatures.Rd	Adds `aggregateFeatures` function and documentation; refactors overlap and aggregation workflow; standardizes parameters; deprecates old methods.
R/classes.R	Adds new S4 classes for overlap info; updates object validation and conversion logic.
R/methods-coerce.R, man/as.data.table.Rd, man/as.matrix.Rd	Adds coercion methods for new classes (`as.data.frame`, `as.matrix`); documents new methods.
R/methods-dims.R, man/dims-generic.Rd	Adds `dim` methods for new overlap classes; updates documentation.
R/methods-extract.R	Adds and refactors subsetting methods for overlap classes and helper functions.
R/methods-initialize.R	Removes legacy feature/spatial info validation from initialization.
R/methods-overlaps.R	Refines overlap retrieval logic for polygons.
R/methods-rbind.R, man/rbind-generic.Rd	Adds `rbind2` and `rbind` methods for `overlapPointDT`; documents new method.
R/methods-show.R	Adds `show` methods for new overlap classes.
R/save_load.R	Removes saving/loading of overlaps for v0.5.0+ objects.
R/slot_show.R, man/dot-abbrev_mat.Rd	Simplifies internal documentation for matrix abbreviation; removes Rd file.
R/subset.R	Refactors overlap subsetting using helper functions.
man/calculateOverlap.Rd, man/calculateOverlapRaster.Rd	Updates documentation for parameter renaming and deprecation.
man/overlapToMatrix.Rd	Documents new methods for overlap classes; standardizes argument names.
tests/testthat/test-aggregate.R	Adds comprehensive tests for new aggregation and overlap features.
tests/testthat/test-create_mini_vizgen.R	Updates tests for new argument names and output classes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Giotto
    participant aggregateFeatures
    participant calculateOverlap
    participant overlapToMatrix

    User->>Giotto: Call aggregateFeatures(...)
    Giotto->>aggregateFeatures: aggregateFeatures(...)
    aggregateFeatures->>calculateOverlap: Calculate overlaps (points or images)
    calculateOverlap-->>aggregateFeatures: Overlap object (overlapPointDT/overlapIntensityDT)
    aggregateFeatures->>overlapToMatrix: Summarize overlaps (as matrix/exprObj)
    overlapToMatrix-->>aggregateFeatures: Aggregated matrix or exprObj
    aggregateFeatures-->>Giotto: Return updated Giotto object or exprObj

Poem

In a patchwork field of spatial delight,
New overlaps hop into view, crisp and bright.
With features and polygons, points and intensity,
Aggregation now flows with elegant consistency.
The rabbit leaps—code refactored, tests anew—
0.5.0 hops forward, bringing clarity through and through!
🐇✨

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jiajic · 2025-06-16T16:19:56Z

@coderabbitai review

coderabbitai · 2025-06-16T16:20:02Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 20

♻️ Duplicate comments (1)

man/overlapToMatrix.Rd (1)

66-72: Same inconsistency for overlapIntensityDT

Mirror the fix/proviso applied to overlapPointDT.

🧹 Nitpick comments (23)

man/as.data.table.Rd (1)

8-25: Mixed documentation for as.data.table() vs as.data.frame() can confuse users

This Rd file is titled/documented for as.data.table, yet the new alias/usage block registers as.data.frame.overlapPointDT.

If the intent is to export both coercions, create a separate Rd topic for as.data.frame so each generic is discoverable via ?as.data.frame.

At minimum, add the matching alias & usage for as.data.table.overlapPointDT (and the symmetric overlapIntensityDT) to keep the file internally consistent.
NEWS.md (1)
3-14: Minor repetition & style tweaks for the changelog

Static analysis flagged a double “wrapper for” wording. While harmless, a quick edit smooths the prose:
- - `aggregateFeatures()` wrapper for running `calculateOverlap()` and `overlapToMatrix()`
+ - `aggregateFeatures()` – a wrapper around `calculateOverlap()` and `overlapToMatrix()`
R/methods-rbind.R (2)
118-120: Index created only for poly

setindex(x@data, "poly") is helpful, but look-ups on feat_id_index are equally common. Adding a secondary index keeps both access paths O(log n) with negligible cost.
data.table::setindex(x@data, c("poly", "feat_id_index"))
146-153: Variadic rbind leaks in-place mutation to first argument

The variadic wrapper ultimately calls rbind2(xs[[1L]], …) which mutates and returns the first object, surprising callers who expect a new object. Consider cloning the first operand (data.table::copy() or methods::copy) before passing it to rbind2.
R/methods-show.R (1)
670-678: features count should derive from feat_ids, not nfeats

nfeats is mutable during rbind2 and may be inaccurate (see previous comment).
Displaying length(object@feat_ids) avoids stale numbers.
-    cat(sprintf("* features : %d\n", object@nfeats))
+    cat(sprintf("* features : %d\n", length(object@feat_ids)))
man/as.matrix.Rd (1)

26-29: Please add default value & example for the new feat_count_column argument

The parameter is introduced in the docs, but:

The default value (NULL) is not explicitly mentioned in the \usage{} block.

No example shows how the argument affects the output.

That makes it harder for users to discover how to use the feature.
A short example similar to the tests in test-aggregate.R would fix this quickly.
tests/testthat/test-aggregate.R (1)
134-135: Floating-point comparison should set a tolerance

expect_equal(m[1,10], 536529.94) may fail on alternate BLAS / OS combinations due to tiny FP drift.
- expect_equal(m[1,10], 536529.94)
+ expect_equal(m[1,10], 536529.94, tolerance = 1e-6)
R/combine_metadata.R (1)
490-492: Inefficient filter converts the whole points set to data.table first

getFeatureInfo() returns a SpatVector → converted to DT via .spatvector_to_dt(pts) after sub-setting would avoid materialising unnecessary rows/columns.

Consider:
idx <- pts$feat_ID_uniq %in% feat_overlap@data$feat
feat_overlap_info <- .spatvector_to_dt(pts[idx, ])
R/methods-coerce.R (1)

153-161: as.data.frame.overlapPointDT can silently mis-label columns

poly_ID/feat_ID are generated with vector indexed look-ups. If any index is NA or out-of-range you get NA without warning.
Add basic bounds checking or at least stopifnot(all(!is.na(poly_ID), …)) to avoid hard-to-trace downstream errors.

man/aggregateFeatures.Rd (1)

44-45: Typo: “expresssion”

expresssion has three “s”.
Replace with expression to avoid grep/build warnings.

man/calculateOverlap.Rd (1)

17-18: spat_info argument undocumented

spat_info was added to the usage but is missing from the \arguments section (only deprecated spatial_info is listed).
Please add a proper description and mark spatial_info as deprecated to keep roxygen checks happy.

R/classes.R (1)

1647-1654: Potential memory blow-up when converting SpatVector

terra::as.data.frame(x) materialises the whole vector; for large datasets this can explode RAM.
Consider streaming rows or using terra::values() with selected columns instead.

R/methods-extract.R (4)

1122-1135: Potential implicit recycling in poly subsetting

x@overlaps[[feat]][i, ids = FALSE] relies on gIndex dispatch, but no guard exists for negative/zero indices or out-of-range values.
Consider validating i with checkmate::assert_integerish(i, lower = 1, any.missing = FALSE) (after resolving logical/character) to catch user mistakes early.

1178-1191: Logical recycling may give unexpected duplicates

x@data <- x@data[i, ] after expanding a recycled logical vector can duplicate rows if i is longer than nrow(x@data). Usually users expect simple recycling, not row replication. Consider using i <- rep_len(i, nr)[seq_len(nr)] to avoid this.

1283-1325: API surprise: ids = TRUE returns vectors not objects

Overloading [ so that ids = TRUE changes the return type from object → vector violates the usual S4 subsetting contract and will trip up users (class(x[i]) now depends on a flag). Strongly advise introducing a dedicated accessor (e.g., polys() / features()) or exposing this through separate functions to keep [ for structural subsetting only.

1327-1364: Helper could skip extra allocation

.select_overlap_point_dt_i/j() repeatedly uses match() and builds temporary vectors. These are on the hot path during aggregation. Caching a lookup table or using keyed joins (setkey) would cut runtime for large datasets.

R/aggregate.R (1)

84-93: Unimported helper operators & setters trigger R CMD check warnings

Functions/operators like %null%, spatUnit<-, setGiotto, prov<-, featType<-, and deprecated() are used but not imported in this file, generating “no visible global function definition” notes.
Declare them in NAMESPACE via @importFrom or @import to keep the package CRAN-clean.
man/overlapToMatrix.Rd (6)
9-10: Alias section should reflect full S4 method signatures

The new overlapPointDT and overlapIntensityDT aliases are correctly added.
Double-check that their corresponding roxygen blocks include @aliases overlapToMatrix,overlapPointDT-method overlapToMatrix,overlapIntensityDT-method; otherwise the next roxygen2::roxygenise() run will drop these lines.

16-26: Parameter order / deprecated args doc drift

spat_info, feat_count_column, and fun were inserted before the ... placeholder, while the three deprecated arguments remain after it.
For consistency with the rest of the package (and to avoid mismatched usage examples), consider moving all deprecated arguments to the very end of the signature and adding @deprecated tags in the roxygen source, e.g.:
- aggr_function = deprecated(),
- poly_info = deprecated(),
- count_info_column = deprecated(),
...
+ ...,
+ aggr_function = deprecated(),
+ poly_info     = deprecated(),
+ count_info_column = deprecated()
This keeps the public interface clean and reduces the risk that users think the old names are still first-class.

75-82: Argument description for x omits the two new classes

The description still reads “giotto object or SpatVector points or data.table”.
Please append overlapPointDT and overlapIntensityDT so help users discover the new workflow.

88-90: Clarify that fun can be a function object

Current text: “character. Function to aggregate …”.
In practice, fun can also be an actual function (e.g. mean). Spell this out to prevent misuse:

Function or single-character name of a function used to aggregate …

95-100: Mark deprecated parameters with @keyword internal (or similar)

Listing deprecated args inline clutters the rendered help.
Consider adding @keywords internal or a separate “Deprecated Arguments” section so they’re still searchable but visually separated.

110-112: sort description could note performance impact

Mixed sorting on large matrices can be costly. Add a short note such as “may be slower for very large matrices” so users understand the trade-off.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0ff9c8 and 01e16ab.

📒 Files selected for processing (28)

DESCRIPTION (2 hunks)
NAMESPACE (2 hunks)
NEWS.md (1 hunks)
R/aggregate.R (34 hunks)
R/classes.R (4 hunks)
R/combine_metadata.R (1 hunks)
R/methods-coerce.R (3 hunks)
R/methods-crop.R (1 hunks)
R/methods-dims.R (1 hunks)
R/methods-extract.R (2 hunks)
R/methods-initialize.R (0 hunks)
R/methods-overlaps.R (1 hunks)
R/methods-rbind.R (2 hunks)
R/methods-show.R (1 hunks)
R/save_load.R (1 hunks)
R/slot_show.R (1 hunks)
R/subset.R (3 hunks)
man/aggregateFeatures.Rd (1 hunks)
man/as.data.table.Rd (2 hunks)
man/as.matrix.Rd (2 hunks)
man/calculateOverlap.Rd (8 hunks)
man/calculateOverlapRaster.Rd (2 hunks)
man/dims-generic.Rd (2 hunks)
man/dot-abbrev_mat.Rd (0 hunks)
man/overlapToMatrix.Rd (2 hunks)
man/rbind-generic.Rd (2 hunks)
tests/testthat/test-aggregate.R (1 hunks)
tests/testthat/test-create_mini_vizgen.R (3 hunks)

💤 Files with no reviewable changes (2)

man/dot-abbrev_mat.Rd
R/methods-initialize.R

🧰 Additional context used

🪛 LanguageTool

NEWS.md

[duplication] ~9-~9: Possible typo: you repeated a word.
Context: ...rapper for running calculateOverlap() and overlapToMatrix() - overlapPointDT() and overlapIntensityDT() classes to store...

(ENGLISH_WORD_REPEAT_RULE)

🪛 GitHub Check: lintr

R/aggregate.R

[notice] 50-50:
Lines should not be more than 80 characters. This line is 81 characters.

[warning] 102-102:
no visible global function definition for '%null%'

[warning] 105-105:
no visible global function definition for '%null%'

[warning] 181-181:
no visible global function definition for 'spatUnit<-'

[warning] 183-183:
no visible global function definition for 'setGiotto'

[warning] 337-337:
no visible global function definition for 'deprecated'

[warning] 513-513:
no visible global function definition for 'deprecated'

[warning] 541-541:
no visible global function definition for 'prov<-'

[warning] 541-541:
no visible global function definition for 'spatUnit'

[warning] 542-542:
no visible global function definition for 'spatUnit<-'

[warning] 542-542:
no visible global function definition for 'spatUnit'

[warning] 543-543:
no visible global function definition for 'featType<-'

[warning] 595-595:
no visible global function definition for 'affine'

[warning] 639-639:
no visible global function definition for 'prov<-'

[warning] 639-639:
no visible global function definition for 'spatUnit'

[warning] 640-640:
no visible global function definition for 'spatUnit<-'

[warning] 640-640:
no visible global function definition for 'spatUnit'

[warning] 641-641:
no visible global function definition for 'featType<-'

[warning] 739-739:
no visible global function definition for 'deprecated'

[warning] 855-855:
no visible global function definition for 'deprecated'

[warning] 857-857:
no visible global function definition for 'deprecate_warn'

[warning] 912-912:
Use && in conditional expressions.

[notice] 955-955:
Hanging indent should be 37 spaces but is 8 spaces.

[warning] 982-982:
no visible global function definition for ':='

[warning] 985-985:
no visible global function definition for ':='

[warning] 989-989:
no visible global function definition for ':='

[warning] 992-992:
no visible global function definition for ':='

🔇 Additional comments (15)

DESCRIPTION (1)

3-3: Verify necessity of tighter version constraints

Bumping the package to 0.5.0 while simultaneously tightening the minimal R requirement from 4.4.0 → 4.4.1 may exclude users on the latest 4.4-series stable release (4.4.0) without an obvious technical reason.

Please double-check that (a) the new code genuinely needs ≥ 4.4.1, and (b) the NEWS / changelog clearly explains the rationale so downstream maintainers aren’t surprised.

Also applies to: 29-29

NAMESPACE (1)

32-32: Export confirmation for aggregateFeatures

Export looks good and avoids name collisions with existing generics.

man/rbind-generic.Rd (1)

9-10: Documentation addition is consistent

The new aliases correctly expose the rbind2,overlapPointDT,… method. No further action needed.

Also applies to: 20-21

R/slot_show.R (1)

1107-1110: Internal helper now documented inline – OK

Replacing the roxygen header with simple comments is fine for a non-exported utility; it prevents an unnecessary Rd file without impacting users.

man/dims-generic.Rd (1)

27-28: Alias & usage entries look consistent for the new overlap classes

dim() is now documented for both overlapPointDT and overlapIntensityDT, and the usage stubs are present. No further action required.

Also applies to: 75-77

R/methods-crop.R (1)

229-237: Crop now delegates overlap filtering to .subset_overlaps_poly – validate side-effects

Good move to centralise overlap subsetting. Please double-check that .subset_overlaps_poly()

updates both point- and intensity-based overlap objects for the removed polygons,

preserves the class (overlapPointDT, overlapIntensityDT) after subsetting so downstream coercion & dim() methods keep working.

No code change required; this is a verification reminder.

R/methods-rbind.R (1)

93-122: Missing duplicate-ID guard

Unlike the existing cellMetaObj/spatLocsObj methods, this method doesn’t call .check_id_dups().
If two overlapPointDT objects contain the same poly/feat combination the result will silently contain duplicates. Consider re-using .check_id_dups() (or an equivalent) before merging.

man/calculateOverlapRaster.Rd (1)

14-20: Documentation looks coherent with code changes

Parameter renaming and deprecation notes are consistent; no issues spotted.

Also applies to: 36-40, 45-47

tests/testthat/test-create_mini_vizgen.R (3)

144-145: Argument rename reflected correctly

Switch from largeImages to images aligns with the API; good catch.

162-176: calculateOverlap() calls: missing return_gobject?

The earlier calculateOverlapRaster() defaulted to return_gobject = TRUE.
If the new function preserved that default this is fine; otherwise the test will fail. Please verify.

203-213: Parameter rename in overlapToMatrix() verified

The switch from poly_info to spat_info is correctly reflected in the test code.
R/methods-extract.R (2)

1062-1064: Good refactor – but double-check class coercion

Calling .subset_overlaps_poly() instead of hand-rolled filtering greatly clarifies intent.
Just verify that every legacy object arriving in x@overlaps is updated to the new overlap*DT classes before this call; otherwise dispatch on [ may fall back to the default method and silently drop the ids = FALSE flag.

1067-1112: NULL-safety for intensity branch

names(x@overlaps$intensity) is accessed unguarded.
When x@overlaps$intensity is NULL (not just length-0 list) this will raise an error.
Add an early return or wrap with is.null() to avoid surprising failures.
-    is_int_feat <- feat_type %in% names(x@overlaps$intensity)
+    is_int_feat <- !is.null(x@overlaps$intensity) &&
+        feat_type %in% names(x@overlaps$intensity)
Likely an incorrect or invalid review comment.
R/aggregate.R (1)

2075-2085: Missing namespace for mixedsort

.mixedsort_rowcols() calls mixedsort() without qualification; unless gtools (or equivalent) is attached this will
error at runtime. Add an explicit import or use gtools::mixedsort.
- match(mixedsort(mat_r_names), mat_r_names),
+ match(gtools::mixedsort(mat_r_names), mat_r_names),
and add @importFrom gtools mixedsort to the roxygen header or NAMESPACE.
man/overlapToMatrix.Rd (1)

50-55: data.table method: missing feat_count_column & sort args

The data.table S4 method now exposes fun, but does not expose feat_count_column (added elsewhere) or sort.
If these arguments are intentionally unsupported for plain data.tables, please document that explicitly; otherwise add them to avoid an inconsistent API surface.

coderabbitai · 2025-06-16T16:24:47Z

NAMESPACE

 S3method(.DollarNames,spatEnrObj)
 S3method(.DollarNames,spatLocsObj)
 S3method(.DollarNames,terraVectData)
+S3method(as.data.frame,overlapPointDT)


🛠️ Refactor suggestion

Missing symmetrical as.data.frame for overlapIntensityDT

You registered as.data.frame.overlapPointDT, but its sibling class overlapIntensityDT did not get the same treatment. Down-stream code that relies on as.data.frame() for both overlap classes will break.

+S3method(as.data.frame,overlapIntensityDT)

Add the roxygen tag to the method implementation and re-document.

🤖 Prompt for AI Agents

In the NAMESPACE file at line 11, you registered the S3 method as.data.frame for overlapPointDT but missed doing the same for overlapIntensityDT. To fix this, add a corresponding S3method(as.data.frame, overlapIntensityDT) line to the NAMESPACE file. Also, in the R script where the as.data.frame.overlapIntensityDT method is defined, add the appropriate roxygen2 @method tag for as.data.frame and overlapIntensityDT, then re-document the package to update the NAMESPACE accordingly.

coderabbitai · 2025-06-16T16:24:47Z

R/methods-dims.R

+#' @rdname dims-generic
+#' @export
+setMethod("dim", signature("overlapPointDT"), function(x) dim(x@data))
+
+#' @rdname dims-generic
+#' @export
+setMethod("dim", signature("overlapIntensityDT"), function(x) dim(x@data))


🛠️ Refactor suggestion

Add nrow/ncol for the new overlap classes for API parity

You introduced dim() methods but left out nrow() and ncol(). Many client-side calls use the simpler helpers and will now fall back to dim() + indexing, which is less efficient and could surprise users.

+#' @rdname dims-generic +#' @export +setMethod("nrow", signature("overlapPointDT"), function(x) nrow(x@data)) + +#' @rdname dims-generic +#' @export +setMethod("ncol", signature("overlapPointDT"), function(x) ncol(x@data)) + +#' @rdname dims-generic +#' @export +setMethod("nrow", signature("overlapIntensityDT"), function(x) nrow(x@data)) + +#' @rdname dims-generic +#' @export +setMethod("ncol", signature("overlapIntensityDT"), function(x) ncol(x@data))

Implementing these keeps the public surface consistent with other Giotto classes.

🤖 Prompt for AI Agents

In R/methods-dims.R around lines 149 to 155, you defined dim() methods for overlapPointDT and overlapIntensityDT classes but did not implement corresponding nrow() and ncol() methods. To maintain API consistency and improve efficiency, add nrow() and ncol() methods for these classes that directly return the number of rows and columns of the underlying data slot (x@data), similar to how dim() is implemented.

R/methods-overlaps.R

coderabbitai · 2025-06-16T16:24:47Z

R/methods-rbind.R

+        x_spat_map <- match(x@spat_ids, comb_spat)
+        y_spat_map <- match(y@spat_ids, comb_spat)
+        x_feat_map <- match(x@feat_ids, comb_feat)
+        y_feat_map <- match(y@feat_ids, comb_feat)
+
+        # replace id dictionaries for x (output object)
+        x@spat_ids <- comb_spat
+        x@feat_ids <- comb_feat
+
+        # remap indices
+        x@data[, poly := x_spat_map[poly]]
+        y@data[, poly := y_spat_map[poly]]
+        x@data[, feat_id_index := x_feat_map[feat_id_index]]
+        y@data[, feat_id_index := y_feat_map[feat_id_index]]
+


🛠️ Refactor suggestion

feat column/key is left inconsistent after re-indexing

You remap feat_id_index but leave the character column feat untouched, then set a data-table key on it:

data.table::setkeyv(x@data, "feat")

If the two objects had different ordering this key will no longer match the new indices, breaking joins and look-ups.
Either update the feat column with comb_feat[feat_id_index] after re-mapping, or set the key on feat_id_index.

🤖 Prompt for AI Agents

In R/methods-rbind.R around lines 100 to 114, after remapping the feat_id_index column, the feat character column remains unchanged, causing inconsistency when setting a data.table key on feat. To fix this, update the feat column by assigning comb_feat indexed by the updated feat_id_index (i.e., feat := comb_feat[feat_id_index]) before setting the key, ensuring the feat column aligns with the new indices and maintains correct joins and look-ups.

coderabbitai · 2025-06-16T16:24:48Z

R/methods-rbind.R

+#' @rdname rbind-generic
+#' @export
+setMethod("rbind2", signature("overlapPointDT", "overlapPointDT"),
+    function(x, y, ...) {
+        comb_spat <- unique(c(x@spat_ids, y@spat_ids))
+        comb_feat <- unique(c(x@feat_ids, y@feat_ids))
+
+        x_spat_map <- match(x@spat_ids, comb_spat)
+        y_spat_map <- match(y@spat_ids, comb_spat)
+        x_feat_map <- match(x@feat_ids, comb_feat)
+        y_feat_map <- match(y@feat_ids, comb_feat)
+
+        # replace id dictionaries for x (output object)
+        x@spat_ids <- comb_spat
+        x@feat_ids <- comb_feat
+
+        # remap indices
+        x@data[, poly := x_spat_map[poly]]
+        y@data[, poly := y_spat_map[poly]]
+        x@data[, feat_id_index := x_feat_map[feat_id_index]]
+        y@data[, feat_id_index := y_feat_map[feat_id_index]]
+
+        x@data <- rbind(x@data, y@data)
+        x@nfeats <- x@nfeats + y@nfeats
+
+        data.table::setkeyv(x@data, "feat")
+        data.table::setindex(x@data, "poly")
+        x
+    }
+)


⚠️ Potential issue

nfeats is incremented instead of recomputed ‒ causes double-counting when feature sets overlap

x@nfeats <- x@nfeats + y@nfeats assumes that the two objects have disjoint feature dictionaries.
When x and y share features the count is inflated (e.g. 3 + 3 = 6 while the merged set has only 5).
Recompute from the merged dictionary instead:

- x@nfeats <- x@nfeats + y@nfeats + x@nfeats <- length(comb_feat)

🤖 Prompt for AI Agents

In R/methods-rbind.R around lines 93 to 122, the code increments x@nfeats by y@nfeats, which causes double-counting when feature sets overlap. Instead of adding, recompute x@nfeats as the length of the combined feature dictionary (comb_feat) after merging to accurately reflect the unique feature count.

coderabbitai · 2025-06-16T16:24:49Z

R/methods-extract.R

+.subset_overlap_point_dt_i <- function(x, i) {
+    if (is.numeric(i) || is.logical(i)) {
+        i <- x@spat_ids[i]
+    }
+
+    poly <- NULL # NSE vars
+    idx <- match(i, x@spat_ids) # poly indices to keep
+    idx <- idx[!is.na(idx)] # drop unmatched NAs
+    x@spat_ids <- x@spat_ids[x@spat_ids %in% i] # replace spatial ids
+
+    x@data <- x@data[poly %in% idx]
+    x@data[, poly := match(poly, idx)]
+    data.table::setkeyv(x@data, "feat")
+    data.table::setindex(x@data, "poly")
+    x
+}
+
+.subset_overlap_point_dt_j <- function(x, j) {
+    # ---- convert j to numerical index ---- #
+    if (is.logical(j)) {
+        if (length(j) != length(x@feat_ids)) {
+            # recycle logical if needed
+            j <- rep(j, length.out = length(x@feat_ids))
+        }
+        j <- which(j)
+    }
+    if (is.character(j)) {
+        j <- match(j, x@feat_ids)
+    }
+
+    x@feat_ids <- x@feat_ids[j] # replace feature ids
+
+    # subset on feat_id_index matches
+    x@data <- x@data[feat_id_index %in% j]
+    x@data[, feat_id_index := match(feat_id_index, j)]
+    data.table::setkeyv(x@data, "feat")
+    data.table::setindex(x@data, "poly")
+    x
+}


⚠️ Potential issue

Negative index semantics are lost

.subset_overlap_point_dt_i() treats numeric i as positive positions only; negative indices (common in R to drop rows) will silently mis-select. Either:

Explicitly forbid negatives with an assertion, or

Implement the standard R rule (setdiff(seq_len(n), abs(i))).

Fail-fast behaviour is preferable to silent mis-subset.

🤖 Prompt for AI Agents

In R/methods-extract.R around lines 1138 to 1176, the function .subset_overlap_point_dt_i() does not handle negative numeric indices correctly, which in R are used to exclude elements. To fix this, add a check for negative numeric indices in the input i. If negatives are present, either raise an error to forbid them or convert them to positive indices by excluding the absolute values from the full sequence of x@spat_ids indices using setdiff(seq_along(x@spat_ids), abs(i)). This ensures the function respects standard R negative indexing semantics and avoids silent mis-selection.

coderabbitai · 2025-06-16T16:24:49Z

R/methods-extract.R

+.subset_overlap_intensity_dt_j <- function(x, j) {
+    # convert j to char col reference
+    if (is.numeric(j)) {
+        j <- j + 1L
+    } else if (is.logical(j)) {
+        j <- c(TRUE, j)
+    }
+    if (!is.character(j)) {
+        j <- colnames(x@data)[j]
+    }
+    j <- unique(c("poly_ID", j))
+    x@data <- x@data[, .SD, .SDcols = j]
+    x
+}


🛠️ Refactor suggestion

Numeric column index offset is fragile

Shifting numeric j by +1L assumes poly_ID is always the first column. If column order changes (e.g., after setcolorder) this will select the wrong data. Safer: look up columns by name once and compute the offset dynamically.

-if (is.numeric(j)) { - j <- j + 1L +if (is.numeric(j)) { + all_cols <- colnames(x@data) + j <- j[ j > 0 & j <= length(all_cols) ] + j <- unique(c(match("poly_ID", all_cols), j)) }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

.subset_overlap_intensity_dt_j <- function(x, j) {

# convert j to char col reference

if (is.numeric(j)) {

j <- j + 1L

} else if (is.logical(j)) {

j <- c(TRUE, j)

}

if (!is.character(j)) {

j <- colnames(x@data)[j]

}

j <- unique(c("poly_ID", j))

x@data <- x@data[, .SD, .SDcols = j]

x

}

.subset_overlap_intensity_dt_j <- function(x, j) {

# convert j to char col reference

if (is.numeric(j)) {

all_cols <- colnames(x@data)

j <- j[j > 0 & j <= length(all_cols)]

j <- unique(c(match("poly_ID", all_cols), j))

} else if (is.logical(j)) {

j <- c(TRUE, j)

}

if (!is.character(j)) {

j <- colnames(x@data)[j]

}

j <- unique(c("poly_ID", j))

x@data <- x@data[, .SD, .SDcols = j]

x

}

🤖 Prompt for AI Agents

In R/methods-extract.R around lines 1193 to 1206, the code shifts numeric column index j by +1L assuming "poly_ID" is always the first column, which is fragile if column order changes. To fix this, dynamically find the position of "poly_ID" in colnames(x@data) and adjust j accordingly instead of hardcoding +1L. This ensures correct column selection regardless of column order changes.

coderabbitai · 2025-06-16T16:24:50Z

R/aggregate.R

+.calculate_overlap_vector <- function(spatvec, pointvec, keep = NULL) {
+    checkmate::assert_character(keep, null.ok = TRUE)
+    res <- terra::extract(spatvec, pointvec)
+    cn <- colnames(res)
+    if (all(c("id.y", "poly_ID") %in% cn)) {
+        res_keep <- c("id.y", "poly_ID")
+    } else {
+        res_keep <- cn[c(1L, 2L)]
+    }
+    res <- res[!is.na(res[[2]]), res_keep] # drop NAs (sparsify) + col select
+    if (!is.null(keep)) {


🛠️ Refactor suggestion

Fragile column selection in .calculate_overlap_vector

The code picks res_keep <- c("id.y","poly_ID") by position and falls back to the first two columns if names do not match.
terra::extract() does not guarantee column order across versions; a safer pattern is to subset by names and error-out when expected columns are absent, e.g.:

expected <- c("id", "poly_ID") if (!all(expected %in% names(res))) stop(".calculate_overlap_vector: expected columns not found") res <- res[!is.na(res$poly_ID), expected]

This avoids silent mis-alignment when terra changes.

🤖 Prompt for AI Agents

In R/aggregate.R around lines 794 to 804, the current code selects columns by position which is fragile because terra::extract() does not guarantee column order. To fix this, explicitly check for the presence of expected columns by name (e.g., "id.y" and "poly_ID"), stop with an error if they are missing, and subset the data frame by these column names instead of by position. This ensures stable and correct column selection regardless of terra version changes.

coderabbitai · 2025-06-16T16:24:50Z

R/aggregate.R

+        res <- switch(method,
+            "raster" = .calculate_overlap_raster(
+                spatvec = x,
+                pointvec = y,
+                count_info_column = feat_count_column,
+                verbose = verbose
+            ),
+            "vector" = .calculate_overlap_vector(
+                spatvec = x,
+                pointvec = y,
+                keep = feat_count_column
+            )
        )
+
+        .create_overlap_point_dt(x, y, res, feat_ids = feat_ids)
    }


⚠️ Potential issue

feat_count_column silently dropped

calculateOverlap(SpatVector,SpatVector) adds feat_count_column to calculate_overlap_params, but when control reaches
.create_overlap_point_dt() the argument keep is left NULL, so the count column (and any other extras) is lost.

- .create_overlap_point_dt(x, y, res, feat_ids = feat_ids) + .create_overlap_point_dt( + x, y, + res, + keep = feat_count_column, # preserve requested meta + feat_ids = feat_ids + )

Without this, later aggregation in overlapToMatrix() cannot use the counts.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

res <- switch(method,

"raster" = .calculate_overlap_raster(

spatvec = x,

pointvec = y,

count_info_column = feat_count_column,

verbose = verbose

),

"vector" = .calculate_overlap_vector(

spatvec = x,

pointvec = y,

keep = feat_count_column

)

)

.create_overlap_point_dt(x, y, res, feat_ids = feat_ids)

}

res <- switch(method,

"raster" = .calculate_overlap_raster(

spatvec = x,

pointvec = y,

count_info_column = feat_count_column,

verbose = verbose

),

"vector" = .calculate_overlap_vector(

spatvec = x,

pointvec = y,

keep = feat_count_column

)

)

.create_overlap_point_dt(

x, y,

res,

keep = feat_count_column, # preserve requested meta

feat_ids = feat_ids

)

}

🤖 Prompt for AI Agents

In R/aggregate.R around lines 770 to 785, the argument feat_count_column is passed to the raster and vector overlap calculation functions but is not forwarded to the .create_overlap_point_dt() call, causing the count column to be lost. To fix this, pass feat_count_column as the keep argument in the .create_overlap_point_dt() function call so that the count column and any additional columns are preserved for later aggregation.

coderabbitai · 2025-06-16T16:24:50Z

man/overlapToMatrix.Rd

+\S4method{overlapToMatrix}{overlapPointDT}(
+  x,
+  name = "raw",
+  sort = TRUE,
+  feat_count_column = NULL,
+  output = c("Matrix", "exprObj"),
+  ...
+)
+


🛠️ Refactor suggestion

overlapPointDT method: fun parameter absent but documented above

The generic argument list documents fun, yet the overlapPointDT signature omits it.
Either add fun here or note in the docs that aggregation is fixed for this class to avoid user confusion.

-\S4method{overlapToMatrix}{overlapPointDT}( - x, - name = "raw", - sort = TRUE, - feat_count_column = NULL, - output = c("Matrix", "exprObj"), - ... -) +\S4method{overlapToMatrix}{overlapPointDT}( + x, + name = "raw", + sort = TRUE, + feat_count_column = NULL, + fun = "sum", + output = c("Matrix", "exprObj"), + ... +)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

\S4method{overlapToMatrix}{overlapPointDT}(

x,

name = "raw",

sort = TRUE,

feat_count_column = NULL,

output = c("Matrix", "exprObj"),

...

)

\S4method{overlapToMatrix}{overlapPointDT}(

x,

name = "raw",

sort = TRUE,

feat_count_column = NULL,

fun = "sum",

output = c("Matrix", "exprObj"),

...

)

🤖 Prompt for AI Agents

In man/overlapToMatrix.Rd around lines 57 to 65, the overlapPointDT method signature is missing the 'fun' parameter that is documented in the generic argument list. To fix this, either add the 'fun' parameter to the overlapPointDT method signature to match the documentation or update the documentation to clarify that aggregation is fixed for this class and 'fun' is not applicable, preventing user confusion.

jiajic added 20 commits February 20, 2025 12:17

Merge pull request drieslab#279 from drieslab/dev

e71931c

hotfix: createGiottoPolygons() and CI changes

Merge branch 'dev' into staging

2c92e3d

chore: drop remotes and raise R req on staging

e1cbb43

chore: remotes are still needed on main for now

793307c

Merge branch 'dev' into staging

a89623c

Merge pull request drieslab#285 from drieslab/dev

9161cac

feat: composable data processing framework

Merge pull request drieslab#289 from drieslab/dev

6759bf1

fix: terra rast() noflip

Merge pull request drieslab#290 from drieslab/dev

3bab5ed

bugfixes

enh: improve exprObj show() methods

21f2f78

fix: revert to previous flip handling for images

44cb004

chore: remove unused flip image internal

87ee8ba

enh: tif tool updates

749c7b6

also should support qptiff

feat: aggregation changes

a7aee31

chore: docs edits

c0b072b

chore: move gpoints and gpoly updates in init giotto into updateGiott…

eab9899

…oObject()

update combineFeatureOverlapData() to work with new overlap structure

12bd132

chore: catch up dev

4423677

chore: docs

fae4613

Merge branch 'dev' of https://github.com/drieslab/GiottoClass into pa…

d6eea2c

…tch_agg_feats

github-advanced-security bot found potential problems May 2, 2025

View reviewed changes

jiajic added 9 commits May 2, 2025 13:55

fix: scoping on local seed in tests

e6f82e1

fix: as.matrix method for points overlaps

f9e0017

enh: make subsetting point overlaps more robust

0a7823d

enh: dim() and rbind() for point overlaps

9e75f3e

fix: aggregateStacksPolygonOverlaps() for new point overlap impleme…

bd6c56b

…ntation

fix: remove now unecessary handling for overlaps saving

c303535

fix: incorrect overlap point feat_id tracking

9f654dc

update: test for new param names

2630dbb

chore: update test expected based on changes

b1be26f

jiajic added 19 commits May 6, 2025 22:57

Merge branch 'dev' into staging

6fc4cf0

Merge branch 'dev' into staging

606e82a

Merge branch 'dev' into staging

6236271

Merge branch 'dev' into staging

f968ba8

Merge branch 'dev' into staging

d1d5a46

Merge branch 'dev' into staging

d88e096

Merge branch 'dev' into staging

f2688a0

Merge branch 'dev' into staging

fe77c15

Merge branch 'dev' into staging

4457dca

Merge branch 'dev' into staging

c39ce2a

Merge branch 'dev' into staging

b37a9ad

chore: catch up to staging

47e2247

Revert "chore: catch up to staging"

e18de85

This reverts commit 47e2247, reversing changes made to a8bbfb6.

Reapply "chore: catch up to staging"

beffc3d

This reverts commit e18de85.

chore: catch up to dev

7b5b01f

chore: remove unused line

20562b8

Merge branch 'dev' of https://github.com/drieslab/GiottoClass into pa…

98402e3

…tch_agg_feats

chore: docs

0670056

ci: update

01e16ab

coderabbitai bot reviewed Jun 16, 2025

View reviewed changes

jiajic added 7 commits June 16, 2025 12:53

enh: ncol and nrow for new overlap objects

b786fbe

chore: conflict

e5cdacc

enh: df coercion for overlapIntesityDT

43c6e56

chore: improve docs for utility class

70ef2ab

Merge branch 'dev' into patch_agg_feats

bb68a1e

Merge branch 'dev' into patch_agg_feats

a2dd359

chore: catch up to dev

386c6b8

Change: Updates to aggregation #300

Are you sure you want to change the base?

Change: Updates to aggregation #300

Uh oh!

Conversation

jiajic commented May 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

New

Enhancements

bug fixes

Summary by CodeRabbit

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Poem

Uh oh!

jiajic commented Jun 16, 2025

Uh oh!

coderabbitai bot commented Jun 16, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jiajic commented May 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 16, 2025 •

edited

Loading