-
Notifications
You must be signed in to change notification settings - Fork 20
Toniof 523 allow temporal resampling #542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build fails, please fix first then request new review.
CHANGES.md
Outdated
'first', 'last', 'min', 'max', 'sum', 'prod', 'mean', 'median', 'std', | ||
'var', 'percentile_<p>']`, to sample up to a finer resolution, use any of | ||
`['asfreq', 'ffill', 'bfill', 'pad', 'nearest', 'nearest-up', 'zero', | ||
'slinear', 'quadratic', 'cubic', 'previous', 'next']`. (#523) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temporal_resampling
should be an object that inludes parameters that control the resampling. Split up- and down-sampling explicitly, therefore include upsampling_method
, downsampling_method
as well as as optional upsampling_params
, downsampling_params
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principal, this sounds like a great idea that I would like to support, but it yields a larger problem: The temporal resolution of a dataset may be hard to determine. At times, it can be irregular and the decision whether downsampling or upsampling must be applied may not be unambiguous.
Keeping this in mind, I'll try to come up with a solution for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have implemented that it is checked whether downsampling or upsampling should be applied.
xcube/core/gen2/local/resamplert.py
Outdated
class CubeResamplerT(CubeTransformer): | ||
|
||
def __init__(self, | ||
cube_config: CubeConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove. I don't see a need to do this in the constructor. cube_config
is passed to transform_cube()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove this from the constructor, the parameter 'time_range' will most likely have been consumed by the CubeOpener and be unavailable to the CubeResamplerT. In consequence, the resampler would have no information about the requested start and ending time.
I see the following alternatives:
(a) ignore the time range in the resampler
(b) introduce a new parameter 'time_resample_time_range' so it would not be consumed earlier
(c) adapt the generator framework so that it does not remove parameters from a config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a) If this is possible, then why do we have this discussion? Let's go for it.
(b) That doesn't make sense, because we have a parameter temporal_resampling
to which we can time_range
which could be by default the request's time_range
.
(c) It MUST be possible to consume parameters, otherwise we don't know what the generator has to do. This is the idea of the current design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the simple solution to
(d) just pass the request's time range
into the ctor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now implemented this solution: The CubeConfig is not passed to the constructor anymore.
If transform_cube is called with a cubeconfig including time_range, it is used. Otherwise the time range is derived from the cube.
xcube/core/resampling/temporal.py
Outdated
return 29 | ||
if year % 100 == 0: | ||
return 28 | ||
return 28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pls explain why all this code is needed? Why can't we use pandas/xarray here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas does not offer timedeltas in monthly, yearly or quarterly resolution, hence it is not possible to determine the bounds or adjust the value of a time step. However, this is necessary as pandas / xarray set the time value to the very start or end of the requested time period and thereby create confusion about the actual extent of the period.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas does not offer timedeltas in monthly, yearly or quarterly resolution
They changed this recently for very good reasons. Instead of "1M" you should say "4W" which is precise. Calendar month averaging can still be done. I don't see why we need all this code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here http://xarray.pydata.org/en/stable/examples/monthly-means.html
DatasetResample and DataArrayResample both return GroupBy objects, so basically we are already using this. The problem is not the grouping itself but the determination of the exact times of the center of the sampling period and the bounds.
I have changed the code so that these are now determined using xarray's groupby.
Codecov Report
@@ Coverage Diff @@
## master #542 +/- ##
==========================================
- Coverage 91.54% 91.43% -0.12%
==========================================
Files 292 293 +1
Lines 26537 26990 +453
==========================================
+ Hits 24294 24679 +385
- Misses 2243 2311 +68
Continue to review full report at Codecov.
|
CHANGES.md
Outdated
To sample down to a broader temporal resolution, use any of `['count', | ||
'first', 'last', 'min', 'max', 'sum', 'prod', 'mean', 'median', 'std', | ||
'var', 'percentile']`, to sample up to a finer resolution, use any of | ||
`['asfreq', 'ffill', 'bfill', 'pad', 'nearest', 'interpolate']`. (#523) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not up to date with current impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated this, but it has become quite long. The best solution would be to explain this parameter elsewhere and link there.
cube_config = CubeConfig(time_range=('2010-08-01', '2010-11-30'), | ||
time_period='2M', | ||
temporal_resampling=dict( | ||
downsampling=('min', {})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to be more pythonic here and allow for the no-args case also
downsampling=('min', {})) | |
downsampling='min' |
xcube/core/gen2/local/resamplert.py
Outdated
class CubeResamplerT(CubeTransformer): | ||
|
||
def __init__(self, | ||
cube_config: CubeConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a) If this is possible, then why do we have this discussion? Let's go for it.
(b) That doesn't make sense, because we have a parameter temporal_resampling
to which we can time_range
which could be by default the request's time_range
.
(c) It MUST be possible to consume parameters, otherwise we don't know what the generator has to do. This is the idea of the current design.
xcube/core/gen2/local/resamplert.py
Outdated
class CubeResamplerT(CubeTransformer): | ||
|
||
def __init__(self, | ||
cube_config: CubeConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the simple solution to
(d) just pass the request's time range
into the ctor.
xcube/core/resampling/temporal.py
Outdated
return 29 | ||
if year % 100 == 0: | ||
return 28 | ||
return 28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas does not offer timedeltas in monthly, yearly or quarterly resolution
They changed this recently for very good reasons. Instead of "1M" you should say "4W" which is precise. Calendar month averaging can still be done. I don't see why we need all this code here.
# Conflicts: # CHANGES.md
Solves #523 . xcube gen2 has now a parameter
temporal_resampling
to which any of the following parameters may be set for downsampling:'count', first', 'last', 'min', 'max', 'sum', 'prod', 'mean', 'median', 'std', var', or 'percentile_<p>'
and for upsampling:'asfreq', 'ffill', 'bfill', 'pad', 'nearest', 'nearest-up', 'zero', 'slinear', 'quadratic', 'cubic', 'previous', 'next'
.Checklist:
docs/source/*
CHANGES.md
Remember to close associated issues after merge!