Skip to content

Excessive memory usage on multithreading #1670

@jbvsmo

Description

@jbvsmo

I have been trying to debug a "memory leak" in my newly upgraded boto3 application. I am moving from the original boto 2.49.

My application starts a pool of 100 thread and every request is queued and redirected to one of these threads and usual memory for the lifetime of the appication was about 1GB with peaks of 1.5GB depending of the operation.

After the upgrade I added one boto3.Session per thread and I access multiple resources and clients from this session which are reused throughout the code. On previous code I would have a boto connection of each kind per thread (I use several services like S3, DynamoDB, SES, SQS, Mturk, SimpleDB) so it is pretty much the same thing.

Except that each boto3.Session alone uses increases memory usage immensely and now my application is running on 3GB of memory instead.

How do I know it is the boto3 Session, you ask? I created 2 demo experiments with the same 100 threads and the only difference on both is using boto3 in one and not on the other.

Program 1: https://pastebin.com/Urkh3TDU
Program 2: https://pastebin.com/eDWPcS8C (Same thing with 5 lines regarding boto commented out)

Output program 1 (each print happens 5 seconds after the last one):

Process Memory: 39.4 MB
Process Memory: 261.7 MB
Process Memory: 518.7 MB
Process Memory: 788.2 MB
Process Memory: 944.5 MB
Process Memory: 940.1 MB
Process Memory: 944.4 MB
Process Memory: 948.7 MB
Process Memory: 959.1 MB
Process Memory: 957.4 MB
Process Memory: 958.0 MB
Process Memory: 959.5 MB

Now with plain multiple threads and no AWS access.
Output program 2 (each print happens 5 seconds after the last one):

Process Memory: 23.5 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB
Process Memory: 58.7 MB

Alone the boto3 session object is retaining 10MB per thread in a total of about 1GB. This is not acceptable from an object that should not be doing much more than requesting stuff to the AWS servers only. It means that the Session is keeping lots of unwanted information.

You could be wondering if it is not the resource that is keeping live memory. If you move the resource creation to inside the for loop, the program will also hit the 1GB in the exact the same 15 to 20 seconds of existence.

In the beginning I tried garbage collecting for cyclic references but it was futile. The decrease in memory was only a couple megabytes.

I've seen people complaining on botocore project on something similar (maybe not!), so it might be a shared issue.
boto/botocore#805

Metadata

Metadata

Labels

automation-exemptfeature-requestThis issue requests a feature.p2This is a standard priority issueresponse-requestedWaiting on additional information or feedback.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions