Skip to content

Conversation

TonioF
Copy link
Contributor

@TonioF TonioF commented Oct 8, 2021

Solves #523 . xcube gen2 has now a parameter temporal_resampling to which any of the following parameters may be set for downsampling: 'count', first', 'last', 'min', 'max', 'sum', 'prod', 'mean', 'median', 'std', var', or 'percentile_<p>' and for upsampling: 'asfreq', 'ffill', 'bfill', 'pad', 'nearest', 'nearest-up', 'zero', 'slinear', 'quadratic', 'cubic', 'previous', 'next'.

Checklist:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/source/*
  • Changes documented in CHANGES.md
  • AppVeyor CI passes
  • Test coverage remains or increases (target 100%)

Remember to close associated issues after merge!

@TonioF TonioF added the enhancement New feature or request label Oct 8, 2021
@TonioF TonioF requested a review from forman October 8, 2021 09:52
Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build fails, please fix first then request new review.

@forman forman self-requested a review October 13, 2021 09:58
CHANGES.md Outdated
'first', 'last', 'min', 'max', 'sum', 'prod', 'mean', 'median', 'std',
'var', 'percentile_<p>']`, to sample up to a finer resolution, use any of
`['asfreq', 'ffill', 'bfill', 'pad', 'nearest', 'nearest-up', 'zero',
'slinear', 'quadratic', 'cubic', 'previous', 'next']`. (#523)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temporal_resampling should be an object that inludes parameters that control the resampling. Split up- and down-sampling explicitly, therefore include upsampling_method, downsampling_method as well as as optional upsampling_params, downsampling_params.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principal, this sounds like a great idea that I would like to support, but it yields a larger problem: The temporal resolution of a dataset may be hard to determine. At times, it can be irregular and the decision whether downsampling or upsampling must be applied may not be unambiguous.
Keeping this in mind, I'll try to come up with a solution for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented that it is checked whether downsampling or upsampling should be applied.

class CubeResamplerT(CubeTransformer):

def __init__(self,
cube_config: CubeConfig):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove. I don't see a need to do this in the constructor. cube_config is passed to transform_cube().

Copy link
Contributor Author

@TonioF TonioF Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this from the constructor, the parameter 'time_range' will most likely have been consumed by the CubeOpener and be unavailable to the CubeResamplerT. In consequence, the resampler would have no information about the requested start and ending time.
I see the following alternatives:
(a) ignore the time range in the resampler
(b) introduce a new parameter 'time_resample_time_range' so it would not be consumed earlier
(c) adapt the generator framework so that it does not remove parameters from a config

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) If this is possible, then why do we have this discussion? Let's go for it.
(b) That doesn't make sense, because we have a parameter temporal_resampling to which we can time_range which could be by default the request's time_range.
(c) It MUST be possible to consume parameters, otherwise we don't know what the generator has to do. This is the idea of the current design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the simple solution to

(d) just pass the request's time range into the ctor.

Copy link
Contributor Author

@TonioF TonioF Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now implemented this solution: The CubeConfig is not passed to the constructor anymore.
If transform_cube is called with a cubeconfig including time_range, it is used. Otherwise the time range is derived from the cube.

return 29
if year % 100 == 0:
return 28
return 28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls explain why all this code is needed? Why can't we use pandas/xarray here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas does not offer timedeltas in monthly, yearly or quarterly resolution, hence it is not possible to determine the bounds or adjust the value of a time step. However, this is necessary as pandas / xarray set the time value to the very start or end of the requested time period and thereby create confusion about the actual extent of the period.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas does not offer timedeltas in monthly, yearly or quarterly resolution

They changed this recently for very good reasons. Instead of "1M" you should say "4W" which is precise. Calendar month averaging can still be done. I don't see why we need all this code here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here http://xarray.pydata.org/en/stable/examples/monthly-means.html

DatasetResample and DataArrayResample both return GroupBy objects, so basically we are already using this. The problem is not the grouping itself but the determination of the exact times of the center of the sampling period and the bounds.
I have changed the code so that these are now determined using xarray's groupby.

@codecov-commenter
Copy link

codecov-commenter commented Oct 15, 2021

Codecov Report

Merging #542 (b45051a) into master (f333bce) will decrease coverage by 0.11%.
The diff coverage is 85.38%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #542      +/-   ##
==========================================
- Coverage   91.54%   91.43%   -0.12%     
==========================================
  Files         292      293       +1     
  Lines       26537    26990     +453     
==========================================
+ Hits        24294    24679     +385     
- Misses       2243     2311      +68     
Impacted Files Coverage Δ
xcube/core/gen2/local/resamplert.py 64.84% <63.29%> (-35.16%) ⬇️
xcube/core/resampling/temporal.py 93.41% <90.43%> (-4.93%) ⬇️
test/core/gen2/local/test_resamplert.py 100.00% <100.00%> (ø)
test/core/gen2/test_config.py 100.00% <100.00%> (ø)
test/core/gen2/test_request.py 100.00% <100.00%> (ø)
test/core/resampling/test_temporal.py 100.00% <100.00%> (ø)
test/util/test_jsonschema.py 100.00% <100.00%> (ø)
xcube/cli/resample.py 90.80% <100.00%> (ø)
xcube/core/gen2/config.py 93.84% <100.00%> (+0.74%) ⬆️
xcube/cli/io.py 91.05% <0.00%> (-0.53%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f333bce...b45051a. Read the comment docs.

@TonioF TonioF requested a review from forman October 18, 2021 09:00
CHANGES.md Outdated
Comment on lines 12 to 15
To sample down to a broader temporal resolution, use any of `['count',
'first', 'last', 'min', 'max', 'sum', 'prod', 'mean', 'median', 'std',
'var', 'percentile']`, to sample up to a finer resolution, use any of
`['asfreq', 'ffill', 'bfill', 'pad', 'nearest', 'interpolate']`. (#523)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not up to date with current impl.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this, but it has become quite long. The best solution would be to explain this parameter elsewhere and link there.

cube_config = CubeConfig(time_range=('2010-08-01', '2010-11-30'),
time_period='2M',
temporal_resampling=dict(
downsampling=('min', {}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be more pythonic here and allow for the no-args case also

Suggested change
downsampling=('min', {}))
downsampling='min'

class CubeResamplerT(CubeTransformer):

def __init__(self,
cube_config: CubeConfig):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) If this is possible, then why do we have this discussion? Let's go for it.
(b) That doesn't make sense, because we have a parameter temporal_resampling to which we can time_range which could be by default the request's time_range.
(c) It MUST be possible to consume parameters, otherwise we don't know what the generator has to do. This is the idea of the current design.

class CubeResamplerT(CubeTransformer):

def __init__(self,
cube_config: CubeConfig):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the simple solution to

(d) just pass the request's time range into the ctor.

return 29
if year % 100 == 0:
return 28
return 28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas does not offer timedeltas in monthly, yearly or quarterly resolution

They changed this recently for very good reasons. Instead of "1M" you should say "4W" which is precise. Calendar month averaging can still be done. I don't see why we need all this code here.

@TonioF TonioF requested a review from forman January 14, 2022 16:29
@TonioF TonioF self-assigned this Feb 22, 2022
@TonioF TonioF linked an issue Feb 22, 2022 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for temporal resampling in cube generator
3 participants