Skip to content

RasterRegion should produce LazyMultibandTile #199

@echeipesh

Description

@echeipesh

Currently RasterRegion.raster triggers a read when the raster is requested:

require(bounds.intersects(source.gridBounds), s"The given bounds: $bounds must intersect the given source: $source")
@transient lazy val raster: Option[Raster[MultibandTile]] =
for {
intersection <- source.gridBounds.intersection(bounds)
raster <- source.read(intersection)
} yield {
if (raster.tile.cols == cols && raster.tile.rows == rows)
raster
else {
val colOffset = math.abs(bounds.colMin - intersection.colMin)
val rowOffset = math.abs(bounds.rowMin - intersection.rowMin)
require(colOffset <= Int.MaxValue && rowOffset <= Int.MaxValue, "Computed offsets are outside of RasterBounds")
raster.mapTile { _.mapBands { (_, band) => PaddedTile(band, colOffset.toInt, rowOffset.toInt, cols, rows) } }
}
}

This in particular is a problem when writing tiles sources from RasterSource API using GeoTrellis LayerWriter because the first action taken is to groupBy the records by their index:

https://github.com/locationtech/geotrellis/blob/474ed9019b1281ce9e134167e7f7f3b0fc3e2eae/s3-spark/src/main/scala/geotrellis/spark/store/s3/S3RDDWriter.scala#L81

Since the read is triggered before this groupBy this results in shuffle of all of the raster pixels which is quite expensive.

What would be preferable is a having an instance of MultibandTile that contains a RasterRegion but does not read the pixels until they're explicitly requested by one of the functions. This would allow the groupBy to be performed on metadata only, greatly improving performance of all ingests.

This would be helpful behavior in other but similar situations where the tiles need to be sorted, filtered or joined before they're actually used.

I'm not sure if this should be default behavior (probably?) or if we should provide both behaviors as part of the RasterRegion interface: eagerRaster and lazyRaster.

  • Implement LazyMultibandTile
  • RasterRegion produces LazyMultibandTile
  • Benchmark a sample ingest with eager vs lazy tile read to validate assumption and document

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions