Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 137 additions & 0 deletions engineering/design-documents/event-sequences.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
<Info>
**Status**: Active
**Created**: October 2025
**Last Updated**: November 2025
</Info>

## Summary

Increase the scope and functionality of Polar events to allow for aggregating multiple events and tying them together, ultimately to be able to act on the success or failure of a product flow in a meter.

## Goals

* Make it possible for Polar to give more actionable insights based on how customers are using a product.
* Allow someone to experiment and determine costs and ideal pricing for different parts of a product.

## Events

### Data model

| id | name | external_id`*` | parent_id`*` | ...metadata |
| ------------------ | ------------- | ------------------ | ---------------- | ------------- |
| \<Polar uuid-1> | Start event | $user\ specified$ |`null` | \{... anything \}|
| \<Polar uuid-2> | Nested event | $user\ specified$ | \<Polar uuid-2> | \{... anything \}|
| \<Polar uuid-3> | Nested event | $user\ specified$ | \<Polar uuid-3> | \{... anything \}|

`*`: New field.

By adding an `external_id` to the `events` we gain an idempotency key on ingested events, making it unproblematic to re-ingest the same events multiple times. We can then leverage the `external_id` as the identifier to specify both the id on an event as well as the parent id of an event.

Internally we don't want to store the relationship between two events via an user-specified ID, but we can validate and translate the specified `parent_id` on the ingestion of an event thus ensuring the relationship is stored by Polar IDs.

### Flowchart

```mermaid
sequenceDiagram
participant Parrot
participant Polar SDK
participant Polar API
% participant Events

Parrot->>Polar SDK: withSpan(externalId: 'parrot-internal-id')
Polar SDK->>Polar API: POST /events/ingest

Polar SDK->>Polar SDK: Mark instance as parentEventId = parrot-internal-id
Parrot->>Polar SDK: sendEvent(name, metadata)
Polar SDK->>Polar API: POST /events/ingest {name, metadata, parentEventId = Parrot internal id}
note over Polar SDK, Polar API: Lookup externalId to get Polar ID and set Polar ID on parent_id
```

## Sequences

A subset (or a full) hierarchy of events can be thought of as a sequence of events.

A sequence groups related events via parent-child hierarchies. Each root event that matches a creation criteria creates its own sequence. Descendant events (via `parent_id`) are automatically added to the same sequence. Cost and revenue are aggregated on each sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Descendant events (via parent_id) are automatically added to the same sequence.

Recursively?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you answered that below.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursively in the sense that if:

Event A is ingested: added to event_sequence
Event B with parent_id A is ingested: Added to the event_sequence on the basis that A is added
Event C with parent_id B is ingested: Added to the event_sequence on the basis that B is added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost and revenue are aggregated on each sequence

I'd leave revenue out of it for now to simplify the mental model.

This is the first time we make cost a first-class citizen versus metadata._cost. Should we maybe be a bit more flexible in how we name and store this so that we can also store other aggregates if we were to explore that (e.g. total input / output token count from metadata._llm)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, I'm thinking more that we would want to highlight that we could aggregate anything in the events into the sequence aggregation. We could highlight costs + other, or just say that it's possible to aggregate values into the sequence?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"aggregate anything" sure, yes, I was talking specifically about not designing the data model as having a "total_cost" column.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, sorry, that's what I meant as well. I updated the data model in the entity graph but it's a bit fuzzy in this text.


Sequences that have the same definition (creation criteria) can then be aggregated or compared to each other to be able to answer questions such as:

* How does this sequence compare to the average sequence.
* How does this customer compare to the average customer in terms of
* Usage
* Cost

### Data model

```mermaid
erDiagram
direction LR
EventSequenceDefinition ||--o{ EventSequence : "creates instances"
EventSequence ||--o{ EventToSequence : "contains"
Event ||--o{ EventToSequence : "belongs to"
Event ||--o| Event : "parent_id"
Organization ||--o{ EventSequenceDefinition : "owns"

EventSequenceDefinition {
uuid id PK
string name
Filter creation_criteria
jsonb config
uuid organization_id FK
}

EventSequence {
uuid id PK
uuid event_sequence_definition_id FK
jsonb aggregated_data
string label
timestamp first_event_at
timestamp last_event_at
}

EventToSequence {
uuid event_id PK,FK
uuid event_sequence_id PK,FK
boolean is_root
}

Event {
uuid id PK
string name
uuid parent_id FK
uuid customer_id FK
jsonb user_metadata
timestamp timestamp
}
```


#### Root vs Descendant Events

- **Root events**: Events that match the creation_criteria in EventSequenceDefinition and create a new sequence
- **Descendant events**: Child/grandchild events found via `parent_id` traversal
- Both stored in `EventToSequence`, distinguished by `is_root` flag to easily allow querying the root events for a listing.

#### Sequence Instances

Each root event creates its **own** EventSequence instance:

```
Event A: support_request.created (+ 4 descendants)
→ EventSequence X

Event B: support_request.created (+ 2 descendants)
→ EventSequence Y

Event C: support_request.created (+ 1 descendant)
→ EventSequence Z

Three separate, unrelated support request sequences
```

<OpenQuestion>
The proposal is to let a sequence only have a single outcome defined, and if a hierarchy of events can have multiple outcomes we would prefer the user to set up multiple sequences with the same creation criteria.

This simplifies the creation and understanding of what a single sequence (or sequence definition) is.

It does not solve the comparison between multiple sequences with different definitions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I'm having trouble thinking of a reasonable reason to do this.

Why would you want to compare e.g. a single sequence "workout_tracking" with "nutrition_tracking" in a fitness app? I could see how globally you want to know the average cost of tracking a workout and tracking food intake but you'd never compare an instance of each side by side.

In other words you want to compare apples to apples, oranges to oranges, and additionally are interested in both the average apple and average orange.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about when you want to track e.g.

  • nutrition_tracking (outcome: ate balanced)
  • nutrition_tracking (outcome: ate too much candy)

How often does users eat balanced vs too much candy. If you have multiple sequences that are tracking the same events you might want to start to compare the outcomes of those sequences.

When it comes to fully disparate sequences, then I fully agree that it does not make sense to want to compare them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Nutrition Tracked (no success)
  • Nutrition Tracked: Balanced (success: ate_balanced)
  • Nutrition Tracked: Cheat Day (success: too_much_candy)

But yes, I get what you're after, eventually you'll want to run numbers balanced <> cheat day, maybe.

But that's solvable outside of the scope of this design, no, it's more dashboarding than anything else.

</OpenQuestion>