Skip to content

Add support for inserting events in a EventDataset #421

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lodevt
Copy link
Contributor

@lodevt lodevt commented Jan 29, 2025

An initial effort to implement the functionality from Issue 415.

To insert events based on a set of constraint, I opted to use a scoring function. The scoring function assigns a value to each existing event that indicates how suitable that position is for the new event to be inserted (higher is more suitable).

So for example, when you want to insert a pressing event into a sequence where the opponent team is in possession, as close as possible to a given timestamp, your scoring function could look like this

def scoring_function(event: Event, dataset: EventDataset):
    if event.state["sequence"].team == event_to_insert.team:
        return 0
    if event.period != event_to_insert.period:
        return 0
    return 1 / abs(
        event.timestamp.total_seconds()
        - event_to_insert.timestamp.total_seconds()
    )

@probberechts
Copy link
Contributor

This is a very nice solution!

I only had to add some code to update the pointers to the previous / next event. My implementation is a bit of a hack, though. I would rather use the set_refs method, but I didn't immediately see how to make it work for both the filter (you probably don't want to update the refs) and the insert (you want to update the refs) methods. Does anyone have a a better solution?

One other thing I just realized now is that it will be unclear to the user whether the new event will be inserted before or after the event that gives the max score. I guess it will be sufficient to specify this in the docstring.

@lodevt
Copy link
Contributor Author

lodevt commented Feb 4, 2025

I also don't immediately see a very clean solution. One alternative could be adding a force_update parameter to the set_refs method, setting it to Truein this case. I don't think this is necessarily better than your implementation, but it offers an alternative and allows to reuse the set_refsmethod.

    def set_refs(
        self,
        dataset: "Dataset",
        prev: Optional["DataRecord"],
        next_: Optional["DataRecord"],
        force_update: bool = False,
    ):
        if hasattr(self, "dataset") and not force_update:
            # TODO: determine if next/prev record should be affected
            # by Dataset.filter
            return

        self.dataset = dataset
        self.prev_record = prev
        self.next_record = next_

@probberechts probberechts requested a review from koenvo March 1, 2025 12:31
@koenvo
Copy link
Contributor

koenvo commented Mar 2, 2025

I will check the PR in more detail, but was already thinking about how this works.

from a workflow perspective I believe the insert should happen right after loading, and generates an better/improved/corrected truth of what happened during the game.

After the dataset is corrected people can start processing like filter, transform etc. Small note: we should put a flag on a dataset to indicate it’s filtered?

From that perspective I believe correcting actions, like inserts, should update the refs. The original was incorrect and so are the refs.

Curious to your opinions!

@probberechts
Copy link
Contributor

Yes, good point: it's generally not a good idea to insert events into a filtered dataset. Adding a flag is a great solution!

However, there may be cases where transforming the data first makes sense before inserting events. For example, if you're manually creating event annotations using a specific coordinate system, you’d want to transform the dataset to that coordinate system before adding the events to ensure consistency.

On a related note, the coordinates of inserted events must be consistent with the dataset’s coordinate system. I guess there is currently no way to easily verify or enforce this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants