-
Notifications
You must be signed in to change notification settings - Fork 1
Sequences design document #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The idea and proposal behind groups of events and sequences.
|
|
||
| A subset (or a full) hierarchy of events can be thought of as a sequence of events. | ||
|
|
||
| A sequence groups related events via parent-child hierarchies. Each root event that matches a creation criteria creates its own sequence. Descendant events (via `parent_id`) are automatically added to the same sequence. Cost and revenue are aggregated on each sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Descendant events (via parent_id) are automatically added to the same sequence.
Recursively?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you answered that below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recursively in the sense that if:
Event A is ingested: added to event_sequence
Event B with parent_id A is ingested: Added to the event_sequence on the basis that A is added
Event C with parent_id B is ingested: Added to the event_sequence on the basis that B is added
|
|
||
| A subset (or a full) hierarchy of events can be thought of as a sequence of events. | ||
|
|
||
| A sequence groups related events via parent-child hierarchies. Each root event that matches a creation criteria creates its own sequence. Descendant events (via `parent_id`) are automatically added to the same sequence. Cost and revenue are aggregated on each sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cost and revenue are aggregated on each sequence
I'd leave revenue out of it for now to simplify the mental model.
This is the first time we make cost a first-class citizen versus metadata._cost. Should we maybe be a bit more flexible in how we name and store this so that we can also store other aggregates if we were to explore that (e.g. total input / output token count from metadata._llm)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I'm thinking more that we would want to highlight that we could aggregate anything in the events into the sequence aggregation. We could highlight costs + other, or just say that it's possible to aggregate values into the sequence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"aggregate anything" sure, yes, I was talking specifically about not designing the data model as having a "total_cost" column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, sorry, that's what I meant as well. I updated the data model in the entity graph but it's a bit fuzzy in this text.
|
|
||
| This simplifies the creation and understanding of what a single sequence (or sequence definition) is. | ||
|
|
||
| It does not solve the comparison between multiple sequences with different definitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I'm having trouble thinking of a reasonable reason to do this.
Why would you want to compare e.g. a single sequence "workout_tracking" with "nutrition_tracking" in a fitness app? I could see how globally you want to know the average cost of tracking a workout and tracking food intake but you'd never compare an instance of each side by side.
In other words you want to compare apples to apples, oranges to oranges, and additionally are interested in both the average apple and average orange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about when you want to track e.g.
- nutrition_tracking (outcome: ate balanced)
- nutrition_tracking (outcome: ate too much candy)
How often does users eat balanced vs too much candy. If you have multiple sequences that are tracking the same events you might want to start to compare the outcomes of those sequences.
When it comes to fully disparate sequences, then I fully agree that it does not make sense to want to compare them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Nutrition Tracked (no success)
- Nutrition Tracked: Balanced (success: ate_balanced)
- Nutrition Tracked: Cheat Day (success: too_much_candy)
But yes, I get what you're after, eventually you'll want to run numbers balanced <> cheat day, maybe.
But that's solvable outside of the scope of this design, no, it's more dashboarding than anything else.
The idea and proposal behind groups of events and sequences.