Skip to content

Conversation

@efilipowicz
Copy link

Summary

Fix: Prevent HTTP connection leak on network exceptions

Problem:

The SDK experiences cascading SocketTimeoutException errors under load. The frequency of these timeouts increases the longer the application is running. This behavior points to a resource leak in the underlying Apache HttpClient's connection pool.

The root cause was identified in the DefaultHttpClient.request() method. Several catch blocks for network-related exceptions (e.g., IOException, ClientProtocolException, NoHttpResponseException) would log the error and re-throw it without releasing the underlying HTTP connection. This meant that any time a request failed with a socket timeout or other network issue, a connection was permanently leased from the pool and never returned. Over time, this exhausts the connection pool, causing all subsequent requests to hang until they also time out.

Solution:

This simply releases the connections where necessary to ensure the pool doesn't get exhausted.

Checklist

  • Read the contribution guide
  • Successfully build your changes locally - Run ./gradlew clean build
  • Include a title that clearly describes your changes and references relevant issues / backlog items
  • Include a summary of your changes
  • Include tests for your changes where possible

Copy link
Contributor

@apederson94 apederson94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me

@ronreynolds
Copy link
Collaborator

ronreynolds commented Jun 11, 2025

why not just move them all into:

} finally {
   this.releaseConnection();
}

?

actually this comment raises some questions:

                // moving this to finally causes issues because socket is closed (which means response stream is closed)
                this.releaseConnection();

why not add a check if the socket is closed before calling release? and how is this not a problem in all the catch blocks? (or are we, by some luck, missing the socket close when all the exceptions are thrown? seems a bit sloppy).

according to git-blame the comment was written by Tim Wells 7 years ago.

@efilipowicz efilipowicz reopened this Jun 11, 2025
@efilipowicz
Copy link
Author

why not just move them all into:

} finally {
   this.releaseConnection();
}

?

actually this comment raises some questions:

                // moving this to finally causes issues because socket is closed (which means response stream is closed)
                this.releaseConnection();

why not add a check if the socket is closed before calling release? and how is this not a problem in all the catch blocks? (or are we, by some luck, missing the socket close when all the exceptions are thrown? seems a bit sloppy).

according to git-blame the comment was written by Tim Wells 7 years ago.

In the current structure, if you put the retry in the finally block, you woudn't close the connection when performing the retries, which then opens another connection, which is the heart of the issue.

That said, this entire method could be refactored to simplify the error-handling.

@ronreynolds
Copy link
Collaborator

In the current structure, if you put the retry in the finally block, you woudn't close the connection when performing the retries, which then opens another connection, which is the heart of the issue.

That said, this entire method could be refactored to simplify the error-handling.

reading the code more closely i see that now;

while (true) {
   ... 
   try {
      apacheHttpResponse = this.httpClient.execute(apacheHttpRequest, context); // that which is released
      ...
      if (statusCode == 200) break;  // doesn't reset
      ...
      if (!shouldRetry(...)) break; // doesn't reset
      ...
      this.releaseConnection(); // only resets if this.apacheHttpResponse != null
   } catch (...)   

yeah, not confusing at all. 👎

@efilipowicz
Copy link
Author

In the current structure, if you put the retry in the finally block, you woudn't close the connection when performing the retries, which then opens another connection, which is the heart of the issue.
That said, this entire method could be refactored to simplify the error-handling.

reading the code more closely i see that now;

while (true) {
   ... 
   try {
      apacheHttpResponse = this.httpClient.execute(apacheHttpRequest, context); // that which is released
      ...
      if (statusCode == 200) break;  // doesn't reset
      ...
      if (!shouldRetry(...)) break; // doesn't reset
      ...
      this.releaseConnection(); // only resets if this.apacheHttpResponse != null
   } catch (...)   

yeah, not confusing at all. 👎

We seem to handle it with the 200 success path in the AbstractResources.java file as a finally block, but I'd argue it should be handled in the httpClient class directly. I'd be happy to take a swing at refactoring the method a bit, but for now, this issue is causing problems for some deployed solutions.

The fix we currently have is to simply rebuild the Smartsheet client on nearly every execution, so we never hit pool exhaustion. This, in turn, results in some sloppy code.

@ronreynolds
Copy link
Collaborator

ronreynolds commented Jun 12, 2025

you can always switch to my OpenAPI-generated client. ;-)

thinking about this further are we trying to duplicate functionality of something like resilience4j with this retry loop?

also i makes me want to investigate adding resilienc4j to openapi-codegen code. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants