-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Currently RasterRegion.raster triggers a read when the raster is requested:
geotrellis-contrib/vlm/src/main/scala/geotrellis/contrib/vlm/RasterRegion.scala
Lines 41 to 55 in aca902c
| require(bounds.intersects(source.gridBounds), s"The given bounds: $bounds must intersect the given source: $source") | |
| @transient lazy val raster: Option[Raster[MultibandTile]] = | |
| for { | |
| intersection <- source.gridBounds.intersection(bounds) | |
| raster <- source.read(intersection) | |
| } yield { | |
| if (raster.tile.cols == cols && raster.tile.rows == rows) | |
| raster | |
| else { | |
| val colOffset = math.abs(bounds.colMin - intersection.colMin) | |
| val rowOffset = math.abs(bounds.rowMin - intersection.rowMin) | |
| require(colOffset <= Int.MaxValue && rowOffset <= Int.MaxValue, "Computed offsets are outside of RasterBounds") | |
| raster.mapTile { _.mapBands { (_, band) => PaddedTile(band, colOffset.toInt, rowOffset.toInt, cols, rows) } } | |
| } | |
| } |
This in particular is a problem when writing tiles sources from RasterSource API using GeoTrellis LayerWriter because the first action taken is to groupBy the records by their index:
Since the read is triggered before this groupBy this results in shuffle of all of the raster pixels which is quite expensive.
What would be preferable is a having an instance of MultibandTile that contains a RasterRegion but does not read the pixels until they're explicitly requested by one of the functions. This would allow the groupBy to be performed on metadata only, greatly improving performance of all ingests.
This would be helpful behavior in other but similar situations where the tiles need to be sorted, filtered or joined before they're actually used.
I'm not sure if this should be default behavior (probably?) or if we should provide both behaviors as part of the RasterRegion interface: eagerRaster and lazyRaster.
- Implement
LazyMultibandTile -
RasterRegionproducesLazyMultibandTile - Benchmark a sample ingest with eager vs lazy tile read to validate assumption and document