[PP] Handle `IngestRequest` message #3974

muhamadazmy · 2025-11-12T09:07:57Z

[PP] Handle IngestRequest message

Summary:
Handle the incoming IngestRequest messages sent by the ingestion-client

Stack created with Sapling. Best reviewed with ReviewStack.

tillrohrmann

Thanks a lot for creating this PR @muhamadazmy. The changes make a lot of sense to me. I left a few minor comments.

My main question was about the impact on the partition processor event loop if we are ingesting a lot of entries (potentially also large entries). Did you observe any negative effects?

It would be great to address the problem of deserializing the Envelope records in a follow-up PR since this is really unnecessary work that the system now needs to do.

crates/worker/src/partition/leadership/leader_state.rs

crates/worker/src/partition/leadership/self_proposer.rs

crates/worker/src/partition/mod.rs

crates/worker/src/partition/leadership/self_proposer.rs

tillrohrmann · 2025-11-25T12:52:57Z

crates/worker/src/partition/leadership/self_proposer.rs

+        // sender
+        //     .enqueue_many(records)
+        //     .await
+        //     .map_err(|_| Error::SelfProposer)?;


minor optimization: We might be able to use sender.try_enqueue_many and only fall back to enqueue individual records if the try one fails.

In the future we might have a LogSender::enqueue_many_optimized which checks for the available capacity and obtains as many permits as possible in a batch. But this is probably premature optimization.

tillrohrmann · 2025-11-25T13:09:27Z

crates/worker/src/partition/leadership/self_proposer.rs

+            // is a way to pass the raw encoded data directly to the appender
+            let envelope = StorageCodec::decode(&mut record.record)?;


This is indeed a pity. Let's do this as a follow-up. I think there isn't a lot missing because Record already supports carrying Bytes and we only need a way to create a Record from (Keys, Bytes) that we have available here. This is literally wasted work that we are doing here.

tillrohrmann · 2025-11-25T13:19:39Z

crates/worker/src/partition/leadership/mod.rs

+                    .propose_many_with_callback(records, callback)
+                    .await;


Can this await become a problem for the partition processor event loop if there are too many records to ingest? Should we maybe think about a throttling mechanism to not starve the other select branches?

Good point! I will look into it

This is now mainly controlled by the ingestion client "batch_size" (in bytes) This put an upper limit on the size of a single ingest request. Default is 50kB.

- `ingestion-client` implements the runtime layer that receives WAL envelopes, fans it out to the correct partition, and tracks completion. It exposes: - `IngestionClient`, enforces inflight budgets, and resolves partition IDs before sending work downstream. - The session subsystem that batches `IngestRecords`, retries connections, and reports commit status to callers. - `ingestion-client` only ingests records and notify the caller once the record is "committed" to bifrost by the PP. This makes it useful to implement kafka ingress and other external ingestion

Summary: Handle the incoming `IngestRequest` messages sent by the `ingestion-client`

tillrohrmann

Thanks for updating this PR @muhamadazmy. I think we shouldn't map ServiceStopped to LostLeadership as this has a slightly different semantical meaning than before. Instead mapping it to NotLeader should be fine, I believe. Apart from this, +1 for merging :-)

It's really nice that we no longer need to do the deserialization serialization steps when handling the Envelope 👏

tillrohrmann · 2025-12-05T13:39:59Z

crates/core/src/worker_api/partition_processor_rpc_client.rs

            RpcReplyError::LoadShedding => Self::Busy,
            RpcReplyError::ServiceNotReady => Self::Busy,
-            RpcReplyError::ServiceStopped => Self::Stopping,
+            RpcReplyError::ServiceStopped => Self::LostLeadership,


I think ServiceStopped should rather be NotLeader. The difference is whether the partition processor has potentially processed this message or not. With LostLeadership it is not possible to safely retry an rpc because it might have processed it (e.g. written to Bifrost).

tillrohrmann · 2025-12-05T13:49:49Z

crates/worker/src/partition/mod.rs

+                        PartitionProcessorRpcError::NotLeader(id)
+                        | PartitionProcessorRpcError::LostLeadership(id) => {


Just want to highlight that NotLeader means the message has not been processed by the pp and LostLeadership means we don't know. If this should make any difference for the IngestionClient, then we probably need two different return values. If not, then ignore my comment.

This was referenced Nov 12, 2025

[bifrost] Get a CommitToken back from notify_committed() #3968

Open

[Ingress-Kafka] Refactor to use ingestion-client #3975

Open

[Ingress] ingress-core crate #3967

Closed

muhamadazmy force-pushed the pr3973 branch from 842da35 to f7f6279 Compare November 12, 2025 09:20

muhamadazmy mentioned this pull request Nov 12, 2025

[Ingestion] ingestion-client crate #3976

Open

muhamadazmy force-pushed the pr3973 branch 2 times, most recently from c093ea2 to 2a7342c Compare November 12, 2025 09:57

muhamadazmy requested a review from tillrohrmann November 12, 2025 10:50

muhamadazmy marked this pull request as ready for review November 12, 2025 10:52

muhamadazmy force-pushed the pr3973 branch from 2a7342c to 41bc19e Compare November 12, 2025 13:01

muhamadazmy mentioned this pull request Nov 12, 2025

Update crazy-max/ghaction-setup-docker to v4 #3977

Merged

muhamadazmy force-pushed the pr3973 branch 6 times, most recently from 8b8db7f to 953b39a Compare November 13, 2025 11:27

This was referenced Nov 13, 2025

[AdminAPI] Use IngestionClient for invocation and state mgmt #3980

Open

[Cleaner] remove the cleaner external bifrost writer #3987

Open

muhamadazmy force-pushed the pr3973 branch 5 times, most recently from 966834c to 8ae259e Compare November 17, 2025 11:40

tillrohrmann linked an issue Nov 17, 2025 that may be closed by this pull request

New ingress API #4018

Open

muhamadazmy mentioned this pull request Nov 17, 2025

Use ingestion-client in the Shuffler #4024

Open

muhamadazmy force-pushed the pr3973 branch 2 times, most recently from fc8efad to 5b6ea28 Compare November 25, 2025 12:54

tillrohrmann reviewed Nov 25, 2025

View reviewed changes

muhamadazmy force-pushed the pr3973 branch from 5b6ea28 to f168818 Compare November 27, 2025 12:20

muhamadazmy force-pushed the pr3973 branch 6 times, most recently from 9dd6b0e to 2554660 Compare November 28, 2025 11:51

muhamadazmy changed the title ~~[Ingress] Handle IngestRequest message~~ [PP] Handle IngestRequest message Nov 28, 2025

muhamadazmy force-pushed the pr3973 branch 5 times, most recently from e5ee450 to ac216d6 Compare December 1, 2025 12:50

muhamadazmy requested a review from tillrohrmann December 1, 2025 13:00

muhamadazmy force-pushed the pr3973 branch 12 times, most recently from 0f3f506 to a16b71e Compare December 4, 2025 10:57

muhamadazmy added 3 commits December 4, 2025 11:58

[bifrost] Get a CommitToken back from notify_committed()

eeb457f

[PP] Handle IngestRequest message

ee27f2d

Summary: Handle the incoming `IngestRequest` messages sent by the `ingestion-client`

muhamadazmy force-pushed the pr3973 branch from a16b71e to ee27f2d Compare December 4, 2025 10:58

tillrohrmann approved these changes Dec 5, 2025

View reviewed changes

		// is a way to pass the raw encoded data directly to the appender
		let envelope = StorageCodec::decode(&mut record.record)?;

		PartitionProcessorRpcError::NotLeader(id)
		\| PartitionProcessorRpcError::LostLeadership(id) => {

[PP] Handle IngestRequest message #3974

Are you sure you want to change the base?

[PP] Handle IngestRequest message #3974

Conversation

muhamadazmy commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tillrohrmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tillrohrmann left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[PP] Handle `IngestRequest` message #3974

[PP] Handle `IngestRequest` message #3974

muhamadazmy commented Nov 12, 2025 •

edited

Loading

tillrohrmann left a comment •

edited

Loading