How to do more efficient focal statistics on huge dataset #10404
Unanswered
chenyangkang
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi community,
Not sure if similar questions has been posted (I didn't find one) before but I'm trying to figure out what is the most efficient way to do a focal statistics for 37 land cover type on a global 30m land cover dataset.
Objectives:
I'm trying to using orthogonal indexing to get the 30m circle focal statistic summary of the proportion of each of the 37 and cover type, given a batch of query longitude, latitude, and year. The data on disk is chunked into lon-lat tiles, which each tile containing 23 years of data.
Apparently it is hard to even load one whole tile into the RAM, so xarray lazy loading will be super helpful here.
The code is like this:
I then query the
full_mosaic_focal_stats
using my data points:Now the problem is, xarray seems to be loading each category separately. That is, after it goes through the whole dataset to calculate the focal stats for category 1 for all points, it start again to calculate the focal stats for category 2. I think this is not ideal in terms of time use and I/O, especially when you have 37 categories. I found this issue when looking into the dask dashboard. I think it should load the data only once and compute all needed values.
Do people have a better way to do this?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions