-
Notifications
You must be signed in to change notification settings - Fork 12
Change: Updates to aggregation #300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
hotfix: createGiottoPolygons() and CI changes
feat: composable data processing framework
fix: terra rast() noflip
- new: `aggregateFeatures()` gobject function
- param harmonizations:
- `spatial_info` -> `spat_info` in `calculateOverlap()`
- `poly_info` -> `spat_info` in `overlapToMatrix()`
- `feat_subset_ids` -> `feat_subset_values` in `calculateOverlap()`
- `count_info_column` -> `feat_count_column` in `calculateOverlap()` and `overlapToMatrix()`
- `aggr_function` -> `fun` in `overlapToMatrtix()`
- Deprecated functions:
- `calculateOverlapRaster()`
- `overlapImageToMatrix()`
also should support qptiff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lintr found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
This reverts commit e18de85.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis update introduces a new aggregation workflow for spatial features in the Giotto framework. It adds new classes ( Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Giotto
participant aggregateFeatures
participant calculateOverlap
participant overlapToMatrix
User->>Giotto: Call aggregateFeatures(...)
Giotto->>aggregateFeatures: aggregateFeatures(...)
aggregateFeatures->>calculateOverlap: Calculate overlaps (points or images)
calculateOverlap-->>aggregateFeatures: Overlap object (overlapPointDT/overlapIntensityDT)
aggregateFeatures->>overlapToMatrix: Summarize overlaps (as matrix/exprObj)
overlapToMatrix-->>aggregateFeatures: Aggregated matrix or exprObj
aggregateFeatures-->>Giotto: Return updated Giotto object or exprObj
Poem
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
♻️ Duplicate comments (1)
man/overlapToMatrix.Rd (1)
66-72: Same inconsistency foroverlapIntensityDTMirror the fix/proviso applied to
overlapPointDT.
🧹 Nitpick comments (23)
man/as.data.table.Rd (1)
8-25: Mixed documentation foras.data.table()vsas.data.frame()can confuse usersThis Rd file is titled/documented for
as.data.table, yet the new alias/usage block registersas.data.frame.overlapPointDT.
- If the intent is to export both coercions, create a separate Rd topic for
as.data.frameso each generic is discoverable via?as.data.frame.- At minimum, add the matching alias & usage for
as.data.table.overlapPointDT(and the symmetricoverlapIntensityDT) to keep the file internally consistent.NEWS.md (1)
3-14: Minor repetition & style tweaks for the changelogStatic analysis flagged a double “wrapper for” wording. While harmless, a quick edit smooths the prose:
- - `aggregateFeatures()` wrapper for running `calculateOverlap()` and `overlapToMatrix()` + - `aggregateFeatures()` – a wrapper around `calculateOverlap()` and `overlapToMatrix()`R/methods-rbind.R (2)
118-120: Index created only forpoly
setindex(x@data, "poly")is helpful, but look-ups onfeat_id_indexare equally common. Adding a secondary index keeps both access paths O(log n) with negligible cost.data.table::setindex(x@data, c("poly", "feat_id_index"))
146-153: Variadicrbindleaks in-place mutation to first argumentThe variadic wrapper ultimately calls
rbind2(xs[[1L]], …)which mutates and returns the first object, surprising callers who expect a new object. Consider cloning the first operand (data.table::copy()ormethods::copy) before passing it torbind2.R/methods-show.R (1)
670-678:featurescount should derive fromfeat_ids, notnfeats
nfeatsis mutable duringrbind2and may be inaccurate (see previous comment).
Displayinglength(object@feat_ids)avoids stale numbers.- cat(sprintf("* features : %d\n", object@nfeats)) + cat(sprintf("* features : %d\n", length(object@feat_ids)))man/as.matrix.Rd (1)
26-29: Please add default value & example for the newfeat_count_columnargumentThe parameter is introduced in the docs, but:
- The default value (
NULL) is not explicitly mentioned in the\usage{}block.- No example shows how the argument affects the output.
That makes it harder for users to discover how to use the feature.
A short example similar to the tests intest-aggregate.Rwould fix this quickly.tests/testthat/test-aggregate.R (1)
134-135: Floating-point comparison should set a tolerance
expect_equal(m[1,10], 536529.94)may fail on alternate BLAS / OS combinations due to tiny FP drift.- expect_equal(m[1,10], 536529.94) + expect_equal(m[1,10], 536529.94, tolerance = 1e-6)R/combine_metadata.R (1)
490-492: Inefficient filter converts the whole points set to data.table first
getFeatureInfo()returns a SpatVector → converted to DT via.spatvector_to_dt(pts)after sub-setting would avoid materialising unnecessary rows/columns.Consider:
idx <- pts$feat_ID_uniq %in% feat_overlap@data$feat feat_overlap_info <- .spatvector_to_dt(pts[idx, ])R/methods-coerce.R (1)
153-161:as.data.frame.overlapPointDTcan silently mis-label columns
poly_ID/feat_IDare generated with vector indexed look-ups. If any index isNAor out-of-range you getNAwithout warning.
Add basic bounds checking or at leaststopifnot(all(!is.na(poly_ID), …))to avoid hard-to-trace downstream errors.man/aggregateFeatures.Rd (1)
44-45: Typo: “expresssion”
expresssionhas three “s”.
Replace withexpressionto avoid grep/build warnings.man/calculateOverlap.Rd (1)
17-18:spat_infoargument undocumented
spat_infowas added to the usage but is missing from the \arguments section (only deprecatedspatial_infois listed).
Please add a proper description and markspatial_infoas deprecated to keep roxygen checks happy.R/classes.R (1)
1647-1654: Potential memory blow-up when convertingSpatVector
terra::as.data.frame(x)materialises the whole vector; for large datasets this can explode RAM.
Consider streaming rows or usingterra::values()with selected columns instead.R/methods-extract.R (4)
1122-1135: Potential implicit recycling in poly subsetting
x@overlaps[[feat]][i, ids = FALSE]relies on gIndex dispatch, but no guard exists for negative/zero indices or out-of-range values.
Consider validatingiwithcheckmate::assert_integerish(i, lower = 1, any.missing = FALSE)(after resolving logical/character) to catch user mistakes early.
1178-1191: Logical recycling may give unexpected duplicates
x@data <- x@data[i, ]after expanding a recycled logical vector can duplicate rows ifiis longer thannrow(x@data). Usually users expect simple recycling, not row replication. Consider usingi <- rep_len(i, nr)[seq_len(nr)]to avoid this.
1283-1325: API surprise:ids = TRUEreturns vectors not objectsOverloading
[so thatids = TRUEchanges the return type from object → vector violates the usual S4 subsetting contract and will trip up users (class(x[i])now depends on a flag). Strongly advise introducing a dedicated accessor (e.g.,polys()/features()) or exposing this through separate functions to keep[for structural subsetting only.
1327-1364: Helper could skip extra allocation
.select_overlap_point_dt_i/j()repeatedly usesmatch()and builds temporary vectors. These are on the hot path during aggregation. Caching a lookup table or using keyed joins (setkey) would cut runtime for large datasets.R/aggregate.R (1)
84-93: Unimported helper operators & setters trigger R CMD check warningsFunctions/operators like
%null%,spatUnit<-,setGiotto,prov<-,featType<-, anddeprecated()are used but not imported in this file, generating “no visible global function definition” notes.
Declare them in NAMESPACE via@importFromor@importto keep the package CRAN-clean.man/overlapToMatrix.Rd (6)
9-10: Alias section should reflect full S4 method signaturesThe new
overlapPointDTandoverlapIntensityDTaliases are correctly added.
Double-check that their corresponding roxygen blocks include@aliases overlapToMatrix,overlapPointDT-method overlapToMatrix,overlapIntensityDT-method; otherwise the nextroxygen2::roxygenise()run will drop these lines.
16-26: Parameter order / deprecated args doc drift
spat_info,feat_count_column, andfunwere inserted before the...placeholder, while the three deprecated arguments remain after it.
For consistency with the rest of the package (and to avoid mismatched usage examples), consider moving all deprecated arguments to the very end of the signature and adding@deprecatedtags in the roxygen source, e.g.:- aggr_function = deprecated(), - poly_info = deprecated(), - count_info_column = deprecated(), ... + ..., + aggr_function = deprecated(), + poly_info = deprecated(), + count_info_column = deprecated()This keeps the public interface clean and reduces the risk that users think the old names are still first-class.
75-82: Argument description forxomits the two new classesThe description still reads “giotto object or SpatVector points or data.table”.
Please appendoverlapPointDTandoverlapIntensityDTso help users discover the new workflow.
88-90: Clarify thatfuncan be a function objectCurrent text: “character. Function to aggregate …”.
In practice,funcan also be an actual function (e.g.mean). Spell this out to prevent misuse:Function or single-character name of a function used to aggregate …
95-100: Mark deprecated parameters with@keyword internal(or similar)Listing deprecated args inline clutters the rendered help.
Consider adding@keywords internalor a separate “Deprecated Arguments” section so they’re still searchable but visually separated.
110-112:sortdescription could note performance impactMixed sorting on large matrices can be costly. Add a short note such as “may be slower for very large matrices” so users understand the trade-off.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
DESCRIPTION(2 hunks)NAMESPACE(2 hunks)NEWS.md(1 hunks)R/aggregate.R(34 hunks)R/classes.R(4 hunks)R/combine_metadata.R(1 hunks)R/methods-coerce.R(3 hunks)R/methods-crop.R(1 hunks)R/methods-dims.R(1 hunks)R/methods-extract.R(2 hunks)R/methods-initialize.R(0 hunks)R/methods-overlaps.R(1 hunks)R/methods-rbind.R(2 hunks)R/methods-show.R(1 hunks)R/save_load.R(1 hunks)R/slot_show.R(1 hunks)R/subset.R(3 hunks)man/aggregateFeatures.Rd(1 hunks)man/as.data.table.Rd(2 hunks)man/as.matrix.Rd(2 hunks)man/calculateOverlap.Rd(8 hunks)man/calculateOverlapRaster.Rd(2 hunks)man/dims-generic.Rd(2 hunks)man/dot-abbrev_mat.Rd(0 hunks)man/overlapToMatrix.Rd(2 hunks)man/rbind-generic.Rd(2 hunks)tests/testthat/test-aggregate.R(1 hunks)tests/testthat/test-create_mini_vizgen.R(3 hunks)
💤 Files with no reviewable changes (2)
- man/dot-abbrev_mat.Rd
- R/methods-initialize.R
🧰 Additional context used
🪛 LanguageTool
NEWS.md
[duplication] ~9-~9: Possible typo: you repeated a word.
Context: ...rapper for running calculateOverlap() and overlapToMatrix() - overlapPointDT() and overlapIntensityDT() classes to store...
(ENGLISH_WORD_REPEAT_RULE)
🪛 GitHub Check: lintr
R/aggregate.R
[notice] 50-50:
Lines should not be more than 80 characters. This line is 81 characters.
[warning] 102-102:
no visible global function definition for '%null%'
[warning] 105-105:
no visible global function definition for '%null%'
[warning] 181-181:
no visible global function definition for 'spatUnit<-'
[warning] 183-183:
no visible global function definition for 'setGiotto'
[warning] 337-337:
no visible global function definition for 'deprecated'
[warning] 513-513:
no visible global function definition for 'deprecated'
[warning] 513-513:
no visible global function definition for 'deprecated'
[warning] 541-541:
no visible global function definition for 'prov<-'
[warning] 541-541:
no visible global function definition for 'spatUnit'
[warning] 542-542:
no visible global function definition for 'spatUnit<-'
[warning] 542-542:
no visible global function definition for 'spatUnit'
[warning] 543-543:
no visible global function definition for 'featType<-'
[warning] 595-595:
no visible global function definition for 'affine'
[warning] 639-639:
no visible global function definition for 'prov<-'
[warning] 639-639:
no visible global function definition for 'spatUnit'
[warning] 640-640:
no visible global function definition for 'spatUnit<-'
[warning] 640-640:
no visible global function definition for 'spatUnit'
[warning] 641-641:
no visible global function definition for 'featType<-'
[warning] 739-739:
no visible global function definition for 'deprecated'
[warning] 739-739:
no visible global function definition for 'deprecated'
[warning] 855-855:
no visible global function definition for 'deprecated'
[warning] 855-855:
no visible global function definition for 'deprecated'
[warning] 857-857:
no visible global function definition for 'deprecate_warn'
[warning] 912-912:
Use && in conditional expressions.
[notice] 955-955:
Hanging indent should be 37 spaces but is 8 spaces.
[warning] 982-982:
no visible global function definition for ':='
[warning] 985-985:
no visible global function definition for ':='
[warning] 989-989:
no visible global function definition for ':='
[warning] 992-992:
no visible global function definition for ':='
🔇 Additional comments (15)
DESCRIPTION (1)
3-3: Verify necessity of tighter version constraintsBumping the package to 0.5.0 while simultaneously tightening the minimal R requirement from 4.4.0 → 4.4.1 may exclude users on the latest 4.4-series stable release (4.4.0) without an obvious technical reason.
Please double-check that (a) the new code genuinely needs ≥ 4.4.1, and (b) the NEWS / changelog clearly explains the rationale so downstream maintainers aren’t surprised.
Also applies to: 29-29
NAMESPACE (1)
32-32: Export confirmation foraggregateFeaturesExport looks good and avoids name collisions with existing generics.
man/rbind-generic.Rd (1)
9-10: Documentation addition is consistentThe new aliases correctly expose the
rbind2,overlapPointDT,…method. No further action needed.Also applies to: 20-21
R/slot_show.R (1)
1107-1110: Internal helper now documented inline – OKReplacing the roxygen header with simple comments is fine for a non-exported utility; it prevents an unnecessary Rd file without impacting users.
man/dims-generic.Rd (1)
27-28: Alias & usage entries look consistent for the new overlap classes
dim()is now documented for bothoverlapPointDTandoverlapIntensityDT, and the usage stubs are present. No further action required.Also applies to: 75-77
R/methods-crop.R (1)
229-237: Crop now delegates overlap filtering to.subset_overlaps_poly– validate side-effectsGood move to centralise overlap subsetting. Please double-check that
.subset_overlaps_poly()
- updates both point- and intensity-based overlap objects for the removed polygons,
- preserves the class (
overlapPointDT,overlapIntensityDT) after subsetting so downstream coercion &dim()methods keep working.No code change required; this is a verification reminder.
R/methods-rbind.R (1)
93-122: Missing duplicate-ID guardUnlike the existing
cellMetaObj/spatLocsObjmethods, this method doesn’t call.check_id_dups().
If twooverlapPointDTobjects contain the samepoly/featcombination the result will silently contain duplicates. Consider re-using.check_id_dups()(or an equivalent) before merging.man/calculateOverlapRaster.Rd (1)
14-20: Documentation looks coherent with code changesParameter renaming and deprecation notes are consistent; no issues spotted.
Also applies to: 36-40, 45-47
tests/testthat/test-create_mini_vizgen.R (3)
144-145: Argument rename reflected correctlySwitch from
largeImagestoimagesaligns with the API; good catch.
162-176:calculateOverlap()calls: missingreturn_gobject?The earlier
calculateOverlapRaster()defaulted toreturn_gobject = TRUE.
If the new function preserved that default this is fine; otherwise the test will fail. Please verify.
203-213: Parameter rename inoverlapToMatrix()verifiedThe switch from
poly_infotospat_infois correctly reflected in the test code.R/methods-extract.R (2)
1062-1064: Good refactor – but double-check class coercionCalling
.subset_overlaps_poly()instead of hand-rolled filtering greatly clarifies intent.
Just verify that every legacy object arriving inx@overlapsis updated to the newoverlap*DTclasses before this call; otherwise dispatch on[may fall back to the default method and silently drop theids = FALSEflag.
1067-1112: NULL-safety forintensitybranch
names(x@overlaps$intensity)is accessed unguarded.
Whenx@overlaps$intensityisNULL(not just length-0 list) this will raise an error.
Add an early return or wrap withis.null()to avoid surprising failures.- is_int_feat <- feat_type %in% names(x@overlaps$intensity) + is_int_feat <- !is.null(x@overlaps$intensity) && + feat_type %in% names(x@overlaps$intensity)Likely an incorrect or invalid review comment.
R/aggregate.R (1)
2075-2085: Missing namespace formixedsort
.mixedsort_rowcols()callsmixedsort()without qualification; unlessgtools(or equivalent) is attached this will
error at runtime. Add an explicit import or usegtools::mixedsort.- match(mixedsort(mat_r_names), mat_r_names), + match(gtools::mixedsort(mat_r_names), mat_r_names),and add
@importFrom gtools mixedsortto the roxygen header or NAMESPACE.man/overlapToMatrix.Rd (1)
50-55: data.table method: missingfeat_count_column&sortargsThe data.table S4 method now exposes
fun, but does not exposefeat_count_column(added elsewhere) orsort.
If these arguments are intentionally unsupported for plain data.tables, please document that explicitly; otherwise add them to avoid an inconsistent API surface.
| S3method(.DollarNames,spatEnrObj) | ||
| S3method(.DollarNames,spatLocsObj) | ||
| S3method(.DollarNames,terraVectData) | ||
| S3method(as.data.frame,overlapPointDT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Missing symmetrical as.data.frame for overlapIntensityDT
You registered as.data.frame.overlapPointDT, but its sibling class overlapIntensityDT did not get the same treatment. Down-stream code that relies on as.data.frame() for both overlap classes will break.
+S3method(as.data.frame,overlapIntensityDT)Add the roxygen tag to the method implementation and re-document.
🤖 Prompt for AI Agents
In the NAMESPACE file at line 11, you registered the S3 method as.data.frame for
overlapPointDT but missed doing the same for overlapIntensityDT. To fix this,
add a corresponding S3method(as.data.frame, overlapIntensityDT) line to the
NAMESPACE file. Also, in the R script where the as.data.frame.overlapIntensityDT
method is defined, add the appropriate roxygen2 @method tag for as.data.frame
and overlapIntensityDT, then re-document the package to update the NAMESPACE
accordingly.
| #' @rdname dims-generic | ||
| #' @export | ||
| setMethod("dim", signature("overlapPointDT"), function(x) dim(x@data)) | ||
|
|
||
| #' @rdname dims-generic | ||
| #' @export | ||
| setMethod("dim", signature("overlapIntensityDT"), function(x) dim(x@data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add nrow/ncol for the new overlap classes for API parity
You introduced dim() methods but left out nrow() and ncol(). Many client-side calls use the simpler helpers and will now fall back to dim() + indexing, which is less efficient and could surprise users.
+#' @rdname dims-generic
+#' @export
+setMethod("nrow", signature("overlapPointDT"), function(x) nrow(x@data))
+
+#' @rdname dims-generic
+#' @export
+setMethod("ncol", signature("overlapPointDT"), function(x) ncol(x@data))
+
+#' @rdname dims-generic
+#' @export
+setMethod("nrow", signature("overlapIntensityDT"), function(x) nrow(x@data))
+
+#' @rdname dims-generic
+#' @export
+setMethod("ncol", signature("overlapIntensityDT"), function(x) ncol(x@data))Implementing these keeps the public surface consistent with other Giotto classes.
🤖 Prompt for AI Agents
In R/methods-dims.R around lines 149 to 155, you defined dim() methods for
overlapPointDT and overlapIntensityDT classes but did not implement
corresponding nrow() and ncol() methods. To maintain API consistency and improve
efficiency, add nrow() and ncol() methods for these classes that directly return
the number of rows and columns of the underlying data slot (x@data), similar to
how dim() is implemented.
| x_spat_map <- match(x@spat_ids, comb_spat) | ||
| y_spat_map <- match(y@spat_ids, comb_spat) | ||
| x_feat_map <- match(x@feat_ids, comb_feat) | ||
| y_feat_map <- match(y@feat_ids, comb_feat) | ||
|
|
||
| # replace id dictionaries for x (output object) | ||
| x@spat_ids <- comb_spat | ||
| x@feat_ids <- comb_feat | ||
|
|
||
| # remap indices | ||
| x@data[, poly := x_spat_map[poly]] | ||
| y@data[, poly := y_spat_map[poly]] | ||
| x@data[, feat_id_index := x_feat_map[feat_id_index]] | ||
| y@data[, feat_id_index := y_feat_map[feat_id_index]] | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
feat column/key is left inconsistent after re-indexing
You remap feat_id_index but leave the character column feat untouched, then set a data-table key on it:
data.table::setkeyv(x@data, "feat")If the two objects had different ordering this key will no longer match the new indices, breaking joins and look-ups.
Either update the feat column with comb_feat[feat_id_index] after re-mapping, or set the key on feat_id_index.
🤖 Prompt for AI Agents
In R/methods-rbind.R around lines 100 to 114, after remapping the feat_id_index
column, the feat character column remains unchanged, causing inconsistency when
setting a data.table key on feat. To fix this, update the feat column by
assigning comb_feat indexed by the updated feat_id_index (i.e., feat :=
comb_feat[feat_id_index]) before setting the key, ensuring the feat column
aligns with the new indices and maintains correct joins and look-ups.
| #' @rdname rbind-generic | ||
| #' @export | ||
| setMethod("rbind2", signature("overlapPointDT", "overlapPointDT"), | ||
| function(x, y, ...) { | ||
| comb_spat <- unique(c(x@spat_ids, y@spat_ids)) | ||
| comb_feat <- unique(c(x@feat_ids, y@feat_ids)) | ||
|
|
||
| x_spat_map <- match(x@spat_ids, comb_spat) | ||
| y_spat_map <- match(y@spat_ids, comb_spat) | ||
| x_feat_map <- match(x@feat_ids, comb_feat) | ||
| y_feat_map <- match(y@feat_ids, comb_feat) | ||
|
|
||
| # replace id dictionaries for x (output object) | ||
| x@spat_ids <- comb_spat | ||
| x@feat_ids <- comb_feat | ||
|
|
||
| # remap indices | ||
| x@data[, poly := x_spat_map[poly]] | ||
| y@data[, poly := y_spat_map[poly]] | ||
| x@data[, feat_id_index := x_feat_map[feat_id_index]] | ||
| y@data[, feat_id_index := y_feat_map[feat_id_index]] | ||
|
|
||
| x@data <- rbind(x@data, y@data) | ||
| x@nfeats <- x@nfeats + y@nfeats | ||
|
|
||
| data.table::setkeyv(x@data, "feat") | ||
| data.table::setindex(x@data, "poly") | ||
| x | ||
| } | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nfeats is incremented instead of recomputed ‒ causes double-counting when feature sets overlap
x@nfeats <- x@nfeats + y@nfeats assumes that the two objects have disjoint feature dictionaries.
When x and y share features the count is inflated (e.g. 3 + 3 = 6 while the merged set has only 5).
Recompute from the merged dictionary instead:
- x@nfeats <- x@nfeats + y@nfeats
+ x@nfeats <- length(comb_feat)🤖 Prompt for AI Agents
In R/methods-rbind.R around lines 93 to 122, the code increments x@nfeats by
y@nfeats, which causes double-counting when feature sets overlap. Instead of
adding, recompute x@nfeats as the length of the combined feature dictionary
(comb_feat) after merging to accurately reflect the unique feature count.
| .subset_overlap_point_dt_i <- function(x, i) { | ||
| if (is.numeric(i) || is.logical(i)) { | ||
| i <- x@spat_ids[i] | ||
| } | ||
|
|
||
| poly <- NULL # NSE vars | ||
| idx <- match(i, x@spat_ids) # poly indices to keep | ||
| idx <- idx[!is.na(idx)] # drop unmatched NAs | ||
| x@spat_ids <- x@spat_ids[x@spat_ids %in% i] # replace spatial ids | ||
|
|
||
| x@data <- x@data[poly %in% idx] | ||
| x@data[, poly := match(poly, idx)] | ||
| data.table::setkeyv(x@data, "feat") | ||
| data.table::setindex(x@data, "poly") | ||
| x | ||
| } | ||
|
|
||
| .subset_overlap_point_dt_j <- function(x, j) { | ||
| # ---- convert j to numerical index ---- # | ||
| if (is.logical(j)) { | ||
| if (length(j) != length(x@feat_ids)) { | ||
| # recycle logical if needed | ||
| j <- rep(j, length.out = length(x@feat_ids)) | ||
| } | ||
| j <- which(j) | ||
| } | ||
| if (is.character(j)) { | ||
| j <- match(j, x@feat_ids) | ||
| } | ||
|
|
||
| x@feat_ids <- x@feat_ids[j] # replace feature ids | ||
|
|
||
| # subset on feat_id_index matches | ||
| x@data <- x@data[feat_id_index %in% j] | ||
| x@data[, feat_id_index := match(feat_id_index, j)] | ||
| data.table::setkeyv(x@data, "feat") | ||
| data.table::setindex(x@data, "poly") | ||
| x | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative index semantics are lost
.subset_overlap_point_dt_i() treats numeric i as positive positions only; negative indices (common in R to drop rows) will silently mis-select. Either:
- Explicitly forbid negatives with an assertion, or
- Implement the standard R rule (
setdiff(seq_len(n), abs(i))).
Fail-fast behaviour is preferable to silent mis-subset.
🤖 Prompt for AI Agents
In R/methods-extract.R around lines 1138 to 1176, the function
.subset_overlap_point_dt_i() does not handle negative numeric indices correctly,
which in R are used to exclude elements. To fix this, add a check for negative
numeric indices in the input i. If negatives are present, either raise an error
to forbid them or convert them to positive indices by excluding the absolute
values from the full sequence of x@spat_ids indices using
setdiff(seq_along(x@spat_ids), abs(i)). This ensures the function respects
standard R negative indexing semantics and avoids silent mis-selection.
| .subset_overlap_intensity_dt_j <- function(x, j) { | ||
| # convert j to char col reference | ||
| if (is.numeric(j)) { | ||
| j <- j + 1L | ||
| } else if (is.logical(j)) { | ||
| j <- c(TRUE, j) | ||
| } | ||
| if (!is.character(j)) { | ||
| j <- colnames(x@data)[j] | ||
| } | ||
| j <- unique(c("poly_ID", j)) | ||
| x@data <- x@data[, .SD, .SDcols = j] | ||
| x | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Numeric column index offset is fragile
Shifting numeric j by +1L assumes poly_ID is always the first column. If column order changes (e.g., after setcolorder) this will select the wrong data. Safer: look up columns by name once and compute the offset dynamically.
-if (is.numeric(j)) {
- j <- j + 1L
+if (is.numeric(j)) {
+ all_cols <- colnames(x@data)
+ j <- j[ j > 0 & j <= length(all_cols) ]
+ j <- unique(c(match("poly_ID", all_cols), j))
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| .subset_overlap_intensity_dt_j <- function(x, j) { | |
| # convert j to char col reference | |
| if (is.numeric(j)) { | |
| j <- j + 1L | |
| } else if (is.logical(j)) { | |
| j <- c(TRUE, j) | |
| } | |
| if (!is.character(j)) { | |
| j <- colnames(x@data)[j] | |
| } | |
| j <- unique(c("poly_ID", j)) | |
| x@data <- x@data[, .SD, .SDcols = j] | |
| x | |
| } | |
| .subset_overlap_intensity_dt_j <- function(x, j) { | |
| # convert j to char col reference | |
| if (is.numeric(j)) { | |
| all_cols <- colnames(x@data) | |
| j <- j[j > 0 & j <= length(all_cols)] | |
| j <- unique(c(match("poly_ID", all_cols), j)) | |
| } else if (is.logical(j)) { | |
| j <- c(TRUE, j) | |
| } | |
| if (!is.character(j)) { | |
| j <- colnames(x@data)[j] | |
| } | |
| j <- unique(c("poly_ID", j)) | |
| x@data <- x@data[, .SD, .SDcols = j] | |
| x | |
| } |
🤖 Prompt for AI Agents
In R/methods-extract.R around lines 1193 to 1206, the code shifts numeric column
index j by +1L assuming "poly_ID" is always the first column, which is fragile
if column order changes. To fix this, dynamically find the position of "poly_ID"
in colnames(x@data) and adjust j accordingly instead of hardcoding +1L. This
ensures correct column selection regardless of column order changes.
| .calculate_overlap_vector <- function(spatvec, pointvec, keep = NULL) { | ||
| checkmate::assert_character(keep, null.ok = TRUE) | ||
| res <- terra::extract(spatvec, pointvec) | ||
| cn <- colnames(res) | ||
| if (all(c("id.y", "poly_ID") %in% cn)) { | ||
| res_keep <- c("id.y", "poly_ID") | ||
| } else { | ||
| res_keep <- cn[c(1L, 2L)] | ||
| } | ||
| res <- res[!is.na(res[[2]]), res_keep] # drop NAs (sparsify) + col select | ||
| if (!is.null(keep)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Fragile column selection in .calculate_overlap_vector
The code picks res_keep <- c("id.y","poly_ID") by position and falls back to the first two columns if names do not match.
terra::extract() does not guarantee column order across versions; a safer pattern is to subset by names and error-out when expected columns are absent, e.g.:
expected <- c("id", "poly_ID")
if (!all(expected %in% names(res)))
stop(".calculate_overlap_vector: expected columns not found")
res <- res[!is.na(res$poly_ID), expected]This avoids silent mis-alignment when terra changes.
🤖 Prompt for AI Agents
In R/aggregate.R around lines 794 to 804, the current code selects columns by
position which is fragile because terra::extract() does not guarantee column
order. To fix this, explicitly check for the presence of expected columns by
name (e.g., "id.y" and "poly_ID"), stop with an error if they are missing, and
subset the data frame by these column names instead of by position. This ensures
stable and correct column selection regardless of terra version changes.
| res <- switch(method, | ||
| "raster" = .calculate_overlap_raster( | ||
| spatvec = x, | ||
| pointvec = y, | ||
| count_info_column = feat_count_column, | ||
| verbose = verbose | ||
| ), | ||
| "vector" = .calculate_overlap_vector( | ||
| spatvec = x, | ||
| pointvec = y, | ||
| keep = feat_count_column | ||
| ) | ||
| ) | ||
|
|
||
| .create_overlap_point_dt(x, y, res, feat_ids = feat_ids) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feat_count_column silently dropped
calculateOverlap(SpatVector,SpatVector) adds feat_count_column to calculate_overlap_params, but when control reaches
.create_overlap_point_dt() the argument keep is left NULL, so the count column (and any other extras) is lost.
- .create_overlap_point_dt(x, y, res, feat_ids = feat_ids)
+ .create_overlap_point_dt(
+ x, y,
+ res,
+ keep = feat_count_column, # preserve requested meta
+ feat_ids = feat_ids
+ )Without this, later aggregation in overlapToMatrix() cannot use the counts.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| res <- switch(method, | |
| "raster" = .calculate_overlap_raster( | |
| spatvec = x, | |
| pointvec = y, | |
| count_info_column = feat_count_column, | |
| verbose = verbose | |
| ), | |
| "vector" = .calculate_overlap_vector( | |
| spatvec = x, | |
| pointvec = y, | |
| keep = feat_count_column | |
| ) | |
| ) | |
| .create_overlap_point_dt(x, y, res, feat_ids = feat_ids) | |
| } | |
| res <- switch(method, | |
| "raster" = .calculate_overlap_raster( | |
| spatvec = x, | |
| pointvec = y, | |
| count_info_column = feat_count_column, | |
| verbose = verbose | |
| ), | |
| "vector" = .calculate_overlap_vector( | |
| spatvec = x, | |
| pointvec = y, | |
| keep = feat_count_column | |
| ) | |
| ) | |
| .create_overlap_point_dt( | |
| x, y, | |
| res, | |
| keep = feat_count_column, # preserve requested meta | |
| feat_ids = feat_ids | |
| ) | |
| } |
🤖 Prompt for AI Agents
In R/aggregate.R around lines 770 to 785, the argument feat_count_column is
passed to the raster and vector overlap calculation functions but is not
forwarded to the .create_overlap_point_dt() call, causing the count column to be
lost. To fix this, pass feat_count_column as the keep argument in the
.create_overlap_point_dt() function call so that the count column and any
additional columns are preserved for later aggregation.
| \S4method{overlapToMatrix}{overlapPointDT}( | ||
| x, | ||
| name = "raw", | ||
| sort = TRUE, | ||
| feat_count_column = NULL, | ||
| output = c("Matrix", "exprObj"), | ||
| ... | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
overlapPointDT method: fun parameter absent but documented above
The generic argument list documents fun, yet the overlapPointDT signature omits it.
Either add fun here or note in the docs that aggregation is fixed for this class to avoid user confusion.
-\S4method{overlapToMatrix}{overlapPointDT}(
- x,
- name = "raw",
- sort = TRUE,
- feat_count_column = NULL,
- output = c("Matrix", "exprObj"),
- ...
-)
+\S4method{overlapToMatrix}{overlapPointDT}(
+ x,
+ name = "raw",
+ sort = TRUE,
+ feat_count_column = NULL,
+ fun = "sum",
+ output = c("Matrix", "exprObj"),
+ ...
+)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| \S4method{overlapToMatrix}{overlapPointDT}( | |
| x, | |
| name = "raw", | |
| sort = TRUE, | |
| feat_count_column = NULL, | |
| output = c("Matrix", "exprObj"), | |
| ... | |
| ) | |
| \S4method{overlapToMatrix}{overlapPointDT}( | |
| x, | |
| name = "raw", | |
| sort = TRUE, | |
| feat_count_column = NULL, | |
| fun = "sum", | |
| output = c("Matrix", "exprObj"), | |
| ... | |
| ) |
🤖 Prompt for AI Agents
In man/overlapToMatrix.Rd around lines 57 to 65, the overlapPointDT method
signature is missing the 'fun' parameter that is documented in the generic
argument list. To fix this, either add the 'fun' parameter to the overlapPointDT
method signature to match the documentation or update the documentation to
clarify that aggregation is fixed for this class and 'fun' is not applicable,
preventing user confusion.
Changes
calculateOverlap()andoverlapToMatrix()New
aggregateFeatures()wrapper for runningcalculateOverlap()andoverlapToMatrix()overlapPointDT()andoverlapIntensityDT()classes to store overlaps relationships efficiently and help with aggregation pipelineEnhancements
aggregateFeatures()now has amethodparam to switch between "raster" and "vector" overlapping methods. The default method remains as "raster" for now, but may change in the future.bug fixes
overlaps()will now properly find image overlaps- Also adds some tests for aggregation and overlaps objects
Summary by CodeRabbit
New Features
aggregateFeaturesfunction for streamlined spatial feature aggregation.overlapPointDTandoverlapIntensityDT.Improvements
Bug Fixes
Tests
Chores