Skip to content

Conversation

@n-h-diaz
Copy link
Contributor

@n-h-diaz n-h-diaz commented Oct 17, 2025

Timestamps are read from the IngestionHistory table, but are cached for the next 5 seconds in memory to avoid a large increase in traffic. I can adjust the expiry as needed, depending on how frequent we expect the incremental data to be refreshed + how fresh we want the data

Since the running spanner instance (dc_graph_2025_09_15) doesn't have incremental ingestion, the timestamp will eventually go stale (it's manually set), so I added a temporary fallback to strong reads. Once we switch to incremental ingestion, we should make this check stronger and revisit if we want these queries to actually fail, since becoming stale indicates that something went wrong in ingestion. (I set the version retention to the max of 7 days)

@n-h-diaz n-h-diaz marked this pull request as ready for review October 17, 2025 20:58
@n-h-diaz n-h-diaz requested review from hqpho, keyurva and vish-cs October 17, 2025 20:59
@n-h-diaz n-h-diaz mentioned this pull request Oct 21, 2025
github-merge-queue bot pushed a commit that referenced this pull request Oct 21, 2025
Separating out some fixes from
#1639 to unblock while the
stale reads are still in discussion

(The failing tests were making it harder to evaluate performance, so
it'd be helpful to get the fixes in sooner)

Includes
* Sorting observation query test results for determinism in tests
* Adding "distinct" to all chaining queries to avoid duplicates. "Match
any" is not sufficient if there are multiple paths between nodes
@n-h-diaz n-h-diaz marked this pull request as draft October 21, 2025 18:16
@n-h-diaz
Copy link
Contributor Author

Going to rethink this a bit after some discussion with Vishal

tl;dr - having an in memory cache could cause inconsistency across shards

Copy link
Contributor

@vish-cs vish-cs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes!


const (
// CACHE_DURATION defines how long the CompletionTimestamp is kept in memory before being refetched.
CACHE_DURATION = 5 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we rename this to TIMESTAMP_CACHE_DURATION just to be explicit

withStruct func(interface{}),
) error {
iter := sc.client.Single().Query(ctx, stmt)
timestampBound, err := sc.GetStalenessTimestampBound(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a feature flag to disable stale reads?

return nil, err
}

timestampBound := spanner.ReadTimestamp(*completionTimestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what's the difference b/w completion timestamp and timestampBound?


err = sc.processRows(iter, newStruct, withStruct)

// Check if the error is due to an expired timestamp (FAILED_PRECONDITION).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious when can that happen...max 7 day timeout?

// It prioritizes returning a value from an in-memory cache to reduce Spanner traffic.
func (sc *SpannerClient) getCompletionTimestamp(ctx context.Context) (*time.Time, error) {
// Check cache
sc.cacheMutex.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we need to think through how consistency would be ensured across caches in different mixer instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants