Skip to content

Conversation

PranavBhatP
Copy link
Contributor

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Small change in the code to provide a 3d output tensor for point predictions with 3rd dimension set to 1. With this change the output contract for TimeXer is

  • Point predictions: (batch_size, predictions, 1) where the 3rd dimension indicates a single target.
  • Quantile predicitons: (batch_size, predictions, num_quantiles) where the 3rd dimension indicates the number of quantiles for which the output is generated.

What should a reviewer concentrate their feedback on?

PR checklist

  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
  • Added/modified tests
  • Used pre-commit hooks when committing to ensure that code is compliant with hooks. Install hooks with pre-commit install.
    To run hooks independent of commit, execute pre-commit run --all-files

@PranavBhatP PranavBhatP changed the title [ENH] Consistent 3D output for single-target point predictions in TimeXer v1 and predict [ENH] Consistent 3D output for single-target point predictions in TimeXer v1. Aug 1, 2025
Copy link

codecov bot commented Aug 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@1956bc8). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1936   +/-   ##
=======================================
  Coverage        ?   87.07%           
=======================================
  Files           ?      136           
  Lines           ?     8612           
  Branches        ?        0           
=======================================
  Hits            ?     7499           
  Misses          ?     1113           
  Partials        ?        0           
Flag Coverage Δ
cpu 87.07% <100.00%> (?)
pytest 87.07% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fkiraly fkiraly moved this to PR in progress in May - Sep 2025 mentee projects Aug 1, 2025
@fkiraly fkiraly moved this from PR in progress to PR under review in May - Sep 2025 mentee projects Aug 1, 2025
@fkiraly fkiraly added the enhancement New feature or request label Aug 2, 2025
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that we are removing tests but not adding tests.

If we are removing an API stepout, then we should remove a stepout in the testes to match, no?

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented Aug 2, 2025

it was an unintended removal, happened by mistake, sorry. The code stays the same in test_timexer.py

@PranavBhatP PranavBhatP requested a review from fkiraly August 4, 2025 12:34
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so it was a bug! Good that we do reviews.

Related question: is there a stepout we can remove now?

@PranavBhatP PranavBhatP requested a review from fkiraly August 9, 2025 17:17
@PranavBhatP
Copy link
Contributor Author

is there a stepout we can remove now?

with that question in mind, I tried tweaking with a few of the tensor shapes, and I was able to reduce the number of stepouts. now, the default return is a 3d tensor - (batch_size, prediction,_length, n_quantiles). For point predictions, n_quantiles = 1 by default, and for explicit quantiles we equate the n_quantiles=len(loss.quantiles) where loss has to be QuantileLoss

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mention you reduced the number of stepouts, but I do not see a change in tests - did you forget to push them?

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented Aug 9, 2025

no i meant stepouts in the actual model code. We were handling quantile and non-quantile predictions separately inside the model. But with this PR, I have unified the handling to a single format for all types of metrics used (quantile or point).

The code for the tests ultimately remains the same, since we are testing the output shape of predict API for TimeXer, which is not the concern of this PR.

If there is a misunderstanding, please let me know.

@PranavBhatP PranavBhatP requested a review from fkiraly August 10, 2025 13:50
@fkiraly
Copy link
Collaborator

fkiraly commented Aug 10, 2025

Yes, I meant tests.

I think thta one of the following must be true:

  • there was a stepout in the tests that we can now remove
  • the tests were not complete or were faulty (e.g., they are not covering the API this was a stepout from)

@agobbifbk
Copy link

Seems that all the shape are well managed. Here you are using the self.n_quantiles as indicator of the number of output per channel. What if I want to use a distribution loss here? Should we rename it in a more general way?

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating my review request:

  • if there was a stepout for the forecaster in tests, please remove the stepout
  • if there was no stepout, please review the tests why there was no stepout and/or why they still passed

@PranavBhatP
Copy link
Contributor Author

I am confused why do we need to add these tests again?

I see what you mean. This PR's changes are solely a patch on the existing set of tests specific to test_timexer.py. There is overlap with test_all_estimators.py, so I think it's an open question of whether we need to perform individual model testing on certain aspects when it is already covered by the test_all_estimators.

@PranavBhatP
Copy link
Contributor Author

I would also add the test in a separate PR, to be merged before this.

So the workflow would be: add the missing test for the old version (in a separate PR) -> replace with tests for the new output format for timexer (in this PR)?

@phoeenniixx phoeenniixx moved this from PR under review to PR in progress in May - Sep 2025 mentee projects Aug 18, 2025
@fkiraly
Copy link
Collaborator

fkiraly commented Aug 19, 2025

I would also add the test in a separate PR, to be merged before this.

So the workflow would be: add the missing test for the old version (in a separate PR) -> replace with tests for the new output format for timexer (in this PR)?

I would say, add the test for the post-1936 state. Then, hopefully, only TimeXer will fail, and you can skip. Merge the test first, and then merge main to 1936, and in this PR you just remove the skip.

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented Aug 20, 2025

Thanks for adding the new test!
I would make it a separate test in the class, could you split it?
I think there should be a function that determines expected output shape. I would do this as follows:
give every (non-multitarget) loss a tag loss_dims: int, which tells you the third dimension of the tensor
in the _expected_fwd_shape or similar, call that; and if it is multitarget, then iterate over the components
I would also add the test in a separate PR, to be merged before this.

I was going through this change request and trying to implement it with the workflow you just suggested, I don't understand which class you are referring to. I am not editing the tests in test_all_estimators.py if that was what you meant, I am only editing tests in the individual model tests i.e test_timexer.py

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented Aug 21, 2025

give every (non-multitarget) loss a tag loss_dims: int, which tells you the third dimension of the tensor

is this change relevant to #1936 and #1951? I think its a whole new PR in itself, since it involves changes to the individual metrics class.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 23, 2025

I am not editing the tests in test_all_estimators.py if that was what you meant

That is indeed what I meant: there should be a test that checks the expected output of every class - TimeXer is expected to fail currently.

@phoeenniixx
Copy link
Member

I am not editing the tests in test_all_estimators.py if that was what you meant

That is indeed what I meant: there should be a test that checks the expected output of every class - TimeXer is expected to fail currently.

It can't fail as right now the test framework checks for both the contracts- 2D and 3D output format

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 23, 2025

@phoeenniixx, that´s exactly what I mean with "removing the stepout".

Going forward, the stepout for 2D should be removed, no?

@phoeenniixx
Copy link
Member

phoeenniixx commented Aug 23, 2025

@phoeenniixx, that´s exactly what I mean with "removing the stepout".

Going forward, the stepout for 2D should be removed, no?

For that to happen, we need to fix returns for NBEATS as well, it also has a 2D return right? Before that we can't really remove this stepout ig
Or we can add skips for timexer and nbeats

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented Aug 23, 2025

Again, then the definition of "non-conformance" that we have applied to TimeXer will extend to NBeats also. So, @phoeenniixx, we might need a separate PR for that as well, does this sound right?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 23, 2025

So then my proposal would be to add the test in
#1951,

and skip both TimeXer and NBEATS. We merge it before this PR, and in this PR we remove the skip for TimeXer.

@phoeenniixx
Copy link
Member

phoeenniixx commented Aug 23, 2025

So then my proposal would be to add the test in #1951,

and skip both TimeXer and NBEATS. We merge it before this PR, and in this PR we remove the skip for TimeXer.

Yeah we can do this, it will require just one line change in _integration(removing the 2 from the list) and also a check to skip the models in _integration (as all the models are tested through _integration). The output formats are currently tested in the _integration and you dont want to skip the whole _integration test, rather just the output check. So, I think just one check before the output format test to skip only the TimeXer and NBeats would be enough. We wont require this kind of skip anyway in future, so just a if-else block should suffice

if not isinstance(class, (TimeXer,NBeats)):
   # perform the output test for 3D

And afterwards, we fix NBeats(in some other PR, as @PranavBhatP mentioned), remove this skip and then all models will be API conformant.

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 23, 2025

@phoeenniixx, that would work!

Although I do not like skips per model inside tests, these are easy to forget there. This indicates that maybe there is too much being tested in a single test method.

@phoeenniixx
Copy link
Member

Although I do not like skips per model inside tests, these are easy to forget there. This indicates that maybe there is too much being tested in a single test method.

I agree, we need some refactoring. I think we should raise an issue as a checklist, so that we dont forget anything. In v2 also, there are a lot of things that can be refactored in test-framework. But these things imo, should have a lower priority, we should first achieve a stable (and complete) API for v2, just save the things along the way that needs refactoring, and we can do this afterwards? As we would anyway need a full refactoring round after first complete v2 like connecting v1 and v2 etc. We can do these things at that time?

@phoeenniixx
Copy link
Member

For now, we should focus on API conformance for v1 and design and implementation for v2...

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 24, 2025

agreed - what is, in your opinion, @phoeenniixx, the quickest, simplest way that we ensure:

  • the inconsistency in TimeXer is covered by TestAllEstimators tests
  • we use the "skip then fix" pattern

@phoeenniixx
Copy link
Member

phoeenniixx commented Aug 25, 2025

I could think of two ways:

  • A more "intuitive" way similar to sktime:
    Add test skip in tags for TimeXer, but this will skip the whole _integration. But We know it passes the rest of the tests (like model training etc), So we could do this.
  • Hardcode way:
    Just add a check before doing the output format test something like this -
    if not isinstance(class, (TimeXer,NBeats)):
     # perform the output test for 3D
    and remove the tests for 2D format from the list here:
    assert n_dims in [2, 3], (
    f"Prediction output must be 2D or 3D, but got {n_dims}D tensor "
    f"with shape {output.shape}"
    )
    if n_dims == 2:
    batch_size, prediction_length = output.shape
    assert batch_size > 0, f"Batch size must be positive, got {batch_size}"
    assert (
    prediction_length > 0
    ), f"Prediction length must be positive, got {prediction_length}"

IMO, the best way for now is the Hardcode way because this "non-conformance" is a special case that happened due to no concrete contract in past, but from now on, we have one concrete contract - the output format should be 3D for single target. And we are already aware of the non-conformant models, we should just raise an issue for them (for NBeats mainly as TimeXer is already being fixed by @PranavBhatP). This would not happen in future models, also we will also be refactoring the test framework - making it more modular and adding new tests if required. So, the easiest and quickest way would be this hardcode way, I think.

But this is mainly because we right now have a lot on our plate, but we should avoid such things in future, once we have a stable API for v2.

@phoeenniixx
Copy link
Member

What do you think about this @fkiraly, @PranavBhatP ?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 25, 2025

ok, @PranavBhatP, I agree with the arguments that we can cut corners because the skips will only apply to the models in the past.

For this I think we should open an issue for nbeats and explicitly mention that the test internal skip should be removed.

The first option would imply changes to the framework or tests first - they need refactoring, and I think this would be th better option, if we could make a wish - but it would cause scope creep, and feels like a distraction at the moment.

What do you think, @PranavBhatP.

@PranavBhatP
Copy link
Contributor Author

What do you think, @PranavBhatP.

Yes, this approach makes sense as a temporary fix, I will try changing the code for this. In this PR do I only skip tests for TimeXer, or do I do it for both the models in question.

@phoeenniixx
Copy link
Member

So, from what I can understand, you are solving the issue for TimeXer here. I think, the changes to the _integration should happen in another PR, where you first remove the tests for 2D output formats and then also add skips for TimeXer and NBeats, and then remove the skip for TimXer here and make TimeXer API -conformant. The same should be done for NBeats in some other PR no?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 25, 2025

yes, @phoeenniixx, that is what I had understood

@PranavBhatP
Copy link
Contributor Author

PranavBhatP commented Aug 26, 2025

Then in that case the PR #1951 handles the changes specific to the tests in TimeXer (test_timexer.py), we can have a separate PR to make changes to _integration in test_all_estimators.py and once that is done we can merge this PR. does this sound right? I am not 100% sure about what is the expected workflow.

@phoeenniixx
Copy link
Member

yeah i think this makes sense, right @fkiraly ?

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 26, 2025

yes, sounds like a good plan to me too

fkiraly pushed a commit that referenced this pull request Aug 31, 2025
…1936 (#1951)

#### What does this implement/fix? Explain your changes.

Adds the test for the output shapes of the forward method of `TimeXer`.
The test checks the output contract for the model, which is being
implemented in #1936. Currently skipped under `test_timexer.py` due to
obvious causes of failure i.e current version of model on main is still
not updated to the required output contract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: PR in progress
Development

Successfully merging this pull request may close these issues.

4 participants