Skip to content

Conversation

kanurag94
Copy link
Contributor

@kanurag94 kanurag94 commented Sep 30, 2025

Description

With this change envoy will send one GOAWAY frame with a last_stream_id set to 2^31-1, as described in RFC 9113. After a reasonable interval (defaults to 1s), envoy will send a followup GOAWAY frame with last_stream_id set to the highest received stream ID at that point in time.
This interval is configurable using graceful_goaway_timeout field.


Commit Message: http2: send graceful GoAway on close
Additional Description: According to RFC 9113, envoy should send a goaway with last_stream_id set to 2^31-1 before sending a final goaway with the highest received stream ID.
Risk Level: Low
Testing: Tests added. Manually verified.
Docs Changes: NA
Release Notes: Changelog added
Platform Specific Features:
Runtime guard: envoy.reloadable_features.http2_graceful_goaway
Fixes #39876
[Optional API Considerations:]

Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #41284 was opened by kanurag94.

see: more, trace.

Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #41284 was opened by kanurag94.

see: more, trace.

Copy link

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #41284 was synchronize by kanurag94.

see: more, trace.

@kanurag94 kanurag94 marked this pull request as ready for review September 30, 2025 18:23
@kanurag94
Copy link
Contributor Author

/retest

@kanurag94
Copy link
Contributor Author

/retest

Signed-off-by: Anurag Aggarwal <[email protected]>
@antoniovleonti
Copy link
Contributor

My understanding is that a graceful goaway is sent at the beginning of the drain sequence via Connection::shutdownNotice() (check code here) That eventually calls this adapter function which in both nghttp2 and oghttp2 sends the goaway frame with the max stream ID as you have done here.

Can you explain why this is not sufficient?

@kanurag94
Copy link
Contributor Author

My understanding is that a graceful goaway is sent at the beginning of the drain sequence via Connection::shutdownNotice() (check code here) That eventually calls this adapter function which in both nghttp2 and oghttp2 sends the goaway frame with the max stream ID as you have done here.

Can you explain why this is not sufficient?

On connection drain the shutdown notice in both nghttp2/oghttp2 adapter sends goaways correctly. If you look at the original issue for this PR #39876, an important part is filters triggering GoAways immediately. This PR solves that part.
For nghttp2/oghttp2 connection drain and subsequent shutdown, this goaway will not be called AFAICS.

Copy link
Contributor

@antoniovleonti antoniovleonti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry! I missed the issue in the description. Thank you for the explanation.

EXPECT_EQ(1, server_stats_store_.counter("http2.goaway_sent").value());
}

TEST_P(Http2CodecImplTest, GracefulGoAwayCausesOutboundFlood) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be prudent to add some tests showing how this new feature and the existing http connection manager drain phase interact.

Could you also add a couple integration tests?

Copy link
Contributor Author

@kanurag94 kanurag94 Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!, I added some integration tests. Please let me know if they are okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for the added testing!

I think what worries me the most is if it's possible for the drain timer to fire after the graceful shutdown timer or vice versa.

Will that crash the envoy? (e.g. the second timer fires and tries calling a function on a now null object).

Could that potentially cause us to send two final goaways (with the second having a higher stream_id-- which is not allowed according to the spec)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand the code (which is little to be honest) -- the destructor on ConnectionImpl should destroy the timers -> cancelling any pending callbacks -> preventing null object access.

Actually the behavior hasn't changed much. I reverted the change on calling gracefulGoAway on onDrainTimeout. Now only filter or load shedding invoked goaways are changed to graceful. Both goaway paths are separate and non overlapping now. It just changes the manual goaway to be more RFC compliant.

However, I will keep attempting adding test.

1. only call custom gracefulGoAway when triggered from LoadSheddding or Filter
2. Add UTs
3. Add integration tests showing interaction of connection drain and filter invoked

Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
@kanurag94
Copy link
Contributor Author

While trying to fix loadShedding integration tests, I see that envoy allows new streams to be created when they shouldn't be during the GOAWAY period. See ServerConnectionImpl::onBeginHeaders(int32_t stream_id). This violates RFC 7540.
Please correct me if my understanding is wrong.

cc @birenroy, @antoniovleonti, @yanavlasov

Signed-off-by: Anurag Aggarwal <[email protected]>
Signed-off-by: Anurag Aggarwal <[email protected]>
@antoniovleonti
Copy link
Contributor

While trying to fix loadShedding integration tests, I see that envoy allows new streams to be created when they shouldn't be during the GOAWAY period. See ServerConnectionImpl::onBeginHeaders(int32_t stream_id). This violates RFC 7540. Please correct me if my understanding is wrong.

cc @birenroy, @antoniovleonti, @yanavlasov

https://github.com/envoyproxy/envoy/blob/main/envoy/http/codec.h#L599 This comment seems to imply this is working as intended, but I agree the RFC seems to say this shouldn't be allowed.

@kanurag94
Copy link
Contributor Author

While trying to fix loadShedding integration tests, I see that envoy allows new streams to be created when they shouldn't be during the GOAWAY period. See ServerConnectionImpl::onBeginHeaders(int32_t stream_id). This violates RFC 7540. Please correct me if my understanding is wrong.
cc @birenroy, @antoniovleonti, @yanavlasov

https://github.com/envoyproxy/envoy/blob/main/envoy/http/codec.h#L599 This comment seems to imply this is working as intended, but I agree the RFC seems to say this shouldn't be allowed.

Thanks for the reference. I see the behavior with nghttp2 vs oghttp2 different.

  • nghttp2 resets any new streams as soon as it goaway.
  • oghttp2 seems to accept new streams.

I think this makes the behaviour already inconsistent irrespective what we expect according to the comment, orz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Envoy does not use graceful shutdown when sending HTTP/2 GOAWAYs
4 participants