-
Notifications
You must be signed in to change notification settings - Fork 171
Need advice on Solid Queue's memory usage #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @yjchieng, thanks for opening this issue! 🙏 I think it depends a lot on your app. A brand new Rails app seems to use around 74.6MB memory for me after booting (without Solid Queue, just running Puma). I think the consumption you're seeing is from all the processes together and not just the supervisor, measuring free memory before starting the supervisor and after, as the supervisor will fork more processes. Are you running multiple workers or just one? I think reducing the number of workers there would help. Another thing that might help is using |
There might also be something else going on because the only changes from version 0.7.0 to 0.8.2 were for the installing part of Solid Queue; nothing was changed besides the initial installation, so the memory footprint shouldn't have changed. I imagine there is other stuff going on in your AWS instance at the same time that might consuming memory as well. |
Up 🆙🔥 I have huge memory issues in production (Rails 7.2 + activeJob + solidQueue). Everything works just fine in dev mode, but in production, there seems to be a memory leak. After restarting my production server, I get to roughly ~75% RAM usage. Very quickly (talking in minutes...) I get to ~100%. And if I let the app run for the weekend and come back on Monday (like today), I get to... 288% RAM usage... I tried removing all the lines in my code related to solidQueue, and I can confirm that this is what's causing the memory issue in production. The exact error codes I'm getting, causing my app to crash in production (Heroku), are R14 and R15. Any advice/suggestions would be very much appreciated fellow devs. Have an amazing day! |
@Focus-me34, what version of Solid Queue are you running? And when you say you're removing anything related to Solid Queue, what Active Job adapter are you using instead? |
@rosa I'm using Solid Queue version 1.0.0. I checked all sub-dependencies versions. They all respect the pre-requirements. Here's some of our setup code:
Do you see anything weird? |
No, that looks good to me re: configuration. You said:
So, if you don't get any memory issues when not using Solid Queue, is that because you're not running any jobs at all, if you're not using another adapter? Because that would point to the jobs having a memory leak, not Solid Queue. |
Hey Rosa, sorry for the delayed reply! I've been very busy at work. Here’s where we're at: I've been working on getting our company’s code running smoothly with Rails 7.2 and Solid Queue (in production Heroku). As I mentioned earlier, it’s been a huge challenge, and unfortunately, we haven’t had much success with it. My colleague and I decided to take a closer look at our code to see if the problem was on our end. Since my last comment here, we’ve implemented tests, and I can confirm that the code is behaving exactly as expected. Our next step, after troubleshooting the high memory usage on Heroku, was to try switching away from Solid Queue and try a different job adapter (as you suggested). I've set up Sidekiq as the adapter, and we saw a drastic improvement: memory usage dropped from around 170% of our 512 MB quota to a range of 25%-70%. This leads me to believe that there might be a memory leak in production when using Solid Queue. From our observations, it seems that after the initial job execution completes, instance variables at the top of the function (which should reset to nil at the start of each job) are retaining the values from the previous iteration. We suspect this might be preventing the Garbage Collector from clearing memory properly between jobs. Let me know if there's any more information I can provide to help you investigate. We’re really looking forward to moving back to using the built-in Solid Queue functionality once this issue is resolved. [Edit: The job we're running involves two main dependencies. We scrape an RSS feed using Nokogiri and fetch a URL for each entry using httparty] |
I am observing high memory usage when solid queue is used on heroku as well. Has there been any solutions to fix this issue? is there any temporary fix that can be used for now? |
solid queue with puma plugin is stupidly using high memory. |
Hey @rajeevriitm, no, no real solutions. I had an async adapter that would run the supervisor, workers and dispatchers and everything together in the same process, which would save memory. However, this was scrapped from the 1.0 version. I need to push for that again. In the meantime, you can try using @arikarim, thanks for your very helpful and useful comment 😒 |
Haha sorry i was so tired of this. |
i still there is some strange issues, with puma concurrency more than 0 the memory goes up 😢 |
@rosa I have a simple configuration. queue runs on a single server config. I have a continuous running job that runs every 30 mins. queue.yml
puma.rb
|
@rosa there seems to be a problem with newer versions of solid queue, i have downgraded my solid gem to 1.1.0 and the memory issue is fixed. 😄 |
+1, I'm running the latest I also tried switching from bin/jobs to a Rake task, but that didn’t help either. Note: I don't run any jobs.
|
hi 👋 I'm experiencing the same issue: Ruby: 3.4.2 Running on DigitalOcean droplet instance with 1G of memory. I'll try to downgrade the solid_queue gem version to see if improves |
Hey all, so sorry about this! I've been swamped with other stuff at work, but I'm going to look into this on Monday. |
Thanks for addressing the issue. Hope it's resolved soon.
…On Sun, 30 Mar 2025, 3:59 am Rosa Gutierrez, ***@***.***> wrote:
Hey all, so sorry about this! I've been swamped with other stuff at work,
but I'm going to look into this on Monday.
—
Reply to this email directly, view it on GitHub
<#330 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADHHCXX5AO7SO53U3BY66UT2W4GCTAVCNFSM6AAAAABNZXT3UCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGI3DMNJWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: rosa]*rosa* left a comment (rails/solid_queue#330)
<#330 (comment)>
Hey all, so sorry about this! I've been swamped with other stuff at work,
but I'm going to look into this on Monday.
—
Reply to this email directly, view it on GitHub
<#330 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADHHCXX5AO7SO53U3BY66UT2W4GCTAVCNFSM6AAAAABNZXT3UCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGI3DMNJWGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@IvanPakhomov99, @rajeevriitm, could you try downgrading to version v1.1.2 and let me know if the issue persists? Also, what version of Ruby are you using? |
Hi @rosa, Thanks for your prompt response. I tested four different Solid Queue versions (1.1.0, 1.1.1, 1.1.2, and 1.1.4) but encountered the same issue across all of them. Environment:
Let me know if you need any additional details. |
@rosa I tried downgrading. The issue exist in 1.1.2 as well.
Happy to help . |
hey @rosa , were you able to identify the issue causing the memory leak? |
@rajeevriitm, I'm afraid I wasn't 😞 I reviewed all code from v1.1.0 and didn't identify anything that could leak memory. This was before @IvanPakhomov99 shared that testing 1.1.0, 1.1.1, 1.1.2, and 1.1.4 made no difference. Then, I tried running jobs of different kinds (recurring jobs being enqueued every 5 seconds of different types, long running jobs, etc.) over a couple of days and couldn't reproduce any memory leaks. I think whatever is happening depends on what your jobs are doing or what you're loading in your app. I also wonder if this is not a memory leak, but simply high memory usage. Solid Queue runs a different process for each worker, a process for the dispatcher, another process for the scheduler and another process for the supervisor. All those processes load your app and even though the fork is done after loading the app, so CoW should ensure memory sharing, this is not the same as running a single process. I don't have a better idea to reduce memory usage other than providing a single-process execution mode (what I've called "async" mode). |
@rosa Thanks for taking the time to look into this! The main pitfall here is that it's a brand new service and I am not running any jobs yet. I also confirmed that the service pod memory is stable. That said, I actually have an update — I downgraded to version 1.0.2, and memory usage looks stable now. I’ll stick with this version for now. |
@rosa Thank you for your attentiveness to this issue. I just downgraded |
Oh! Thanks a lot for confirming that! I had only looked down to version 1.1.0. I'm on-call this week so a bit short on time but will try to figure out why the memory increased from that version to 1.1.0. |
@rosa I’m not sure how relevant this is, but it might be worth taking another look at this commit: a152f26. It looks like Even though the block uses Sorry if that’s not the case — I might not be seeing the full picture. Just reviewing the changes between versions 1.1.0 and 1.0.2. |
Just to follow up on this - it seemed to have worked in the short-term, but I am back to where I started unfortunately... still seeing those r14/memory issues consistently. |
We are experiencing the same unbounded memory growth issues with solid_queue with results in OOMs. We're on solid-queue 1.1.3 and Ruby 3.3.7. We can recreate this issue by running solid-queue-worker (1.1.3)Here's what the memory size of the solid-queue worker process It's being sampled every minute for about 75 minutes and grows from about 141mb to 840mb in that time. This was run in development with YJIT on and eager loading off. We have ran it with YJIT off and eager loading off, and the same issue persists. solid-queue-supervisor (1.1.3)Here's what the memory size of the solid-queue supervisor process Memory here also seems to grow but at a much slower rate. Memory growthThis issue persists in all environments regardless of yjit, eager loading, or seemingly other environmental factors. If left unchecked solid-queue will eventually hit OOM errors causing the pod/container it is running in to be killed. The memory growth for the worker never seems to plateau. We have tried increasing memory and solid queue will just run until it consumes all memory. Because the worker grows at a much faster rate than the supervisor it's unclear if the supervisor will experienced unbounded memory growth. I suspect it will given that memory growth is occurring when there are no jobs present. No jobs during this benchmarkJust wanting to call out that there are no jobs being enqueued at all. It's an empty database. solid-queue 1.1.4 behavior is up nextI'm now running with solid-queue 1.1.4 and so far the issue appears exist there as well. |
Alright, I'm going to revert the change in #417 since that's looking like the most likely culprit as @IvanPakhomov99 said. I hope that fixes this problem. Thanks a lot for the tests @zdennis and for writing this up 🙏 |
Thanks, @rosa . 🙇 Let us know when/how to test, and I'm happy to pull it down and re-run the same tests. |
…rruptible This is looking like the most likely culprit of a memory leak pointed out for multiple people on #330
…rruptible For all Rubies, and not just Ruby < 3.2. This is looking like the most likely culprit of a memory leak pointed out for multiple people on #330
…rruptible For all Rubies, and not just Ruby < 3.2. This is looking like the most likely culprit of a memory leak pointed out for multiple people on #330
Thanks a lot @zdennis, really appreciate it 🙏 I've just pushed version 1.1.5 with that change reverted, I hope this is it 🤞 |
solid_queue 1.1.5 is looking much better. After about 80 minutes so far, memory isn't growing. 🥳 Worker memory 🟢Plateaus right around 149Mb for us. Supervisor memory 🟢And the supervisor also plateaus, right around 159Mb: Concurrent::Promises.future issueI wonder if this is related the memory leak reported in usage of Concurrent::Promises.future in ruby-concurrency/concurrent-ruby#960. Thanks for pushing out 1.1.5 🙇Thanks for quickly push out 1.1.5, @rosa! It is looking like it may be the ticket here. I'll keep this test running over night (because I'm curious), and will post back with those results as well. |
Here is the memory usage on the worker and supervisor processes over 15 hours. Again, this is with no jobs being processed. Still looks great compared to prior releases. We'll bump and test this out in a live environment. I'll share results later this week or next week after it's had some time running in the wild. Worker memoryWorker memory is still very low at 152Mb. Supervisor memorySupervisor memory is up to about 165.7mb. It very slowly grew from about 156mb to 160mb and then jumped up to 165mb after about 14 hours. I haven't been benchmarking/profiling GC major/minor runs and we're not manually invoking GC.start so it's not clear if this is normal growth by Ruby or if there is a small leak in the supervisor process still. Either way, 1.1.5 continues to look good. |
@zdennis any surprise until now? |
We've had it running in production since late Monday and we haven't had an OOM since. Here's a chart showing how over a similar timeframe (from Saturday to Monday) we did see an OOM with solid_queue 1.1.3. And then over the same number of days, Monday evening to today, we haven't seen an OOM and memory looks much better. We did notice that memory usage grows when polling (and when there are no active jobs). We have some recurring jobs and then we have past jobs. We are not invoking I notice solid_queue has a usage of I'll post back again early next week after it's been up for a week, but so far it is still looking good. 🤞 |
I've instrumented the code between the original (and now current) implementation of Interruptible.rb Vs. the change I made in [a152f26] and they result in exactly the same number calls to Poll. This strongly suggests the implementation of Interruptible is not changing SQ behavior for number of iterations within the poll loop semantics. I've built a test harness to allow for isolating various configurations (Ruby versions, Rails versions, Database configurations, and SolidQueue configurations). I've tested various implementations of Concurrent::Promises.future including the Interruptible implementation in the PR that I submitted and can also confirm that it does not leak. While testing without any jobs (just worker polling) what I'm seeing is a lot of garbage being generated in-between major GC's, which can be exacerbated via a shorter polling_interval, and results constant and significant memory growth. This is forcing Ruby to allocate more slabs to handle the object allocations. While the objects are GC'able, the slabs are forever (stackable via GC.stats and GC.heap_stats). When I switch to JEMALLOC and MALLOC_MAX_ARENA=2, which is how my Heroku and local dev is configured, I don't see the memory growth issues at all. More digging to forthcoming. |
@zdennis Any updates on how its been running in @rosa, I hate to rain on the parade, but I'm still having the memory growth issue even after upgrading to
Thoughts I have 5 recurring jobs specified in my I've included the content of my config files. Perhaps I have something incorrectly configured? I'm open to thoughts or suggestions. 🙏 Should I try adding a recurring job that forces garbage collection to run or something like that as a workaround? Solid Queue Config # config/queue.yml
default: &default
dispatchers:
- polling_interval: 1
batch_size: 500
workers:
- queues: "*"
threads: 2
processes: <%= ENV.fetch("JOB_CONCURRENCY", 1) %>
polling_interval: 1
development:
<<: *default
database: app_development
test:
<<: *default
database: app_test
production:
<<: *default
url: <%= ENV["DATABASE_URL"] %> SQ Recurring Jobs Config # config/recurring.yml
recurring_maintenance: &recurring_maintenance
job_one:
class: "JobOne"
priority: 0
schedule: "every 60 seconds"
job_two:
class: "JobTwo"
priority: 1
schedule: "every 5 minutes"
job_three:
class: "JobThree"
priority: 1
schedule: "every 30 minutes"
job_four:
class: "JobFour"
priority: 1
schedule: "every weekday at 9am America/Detroit"
job_five:
class: "JobFive"
priority: 1
schedule: "30 15 15 * * 1-5 America/Detroit"
development:
<<: *recurring_maintenance
production:
<<: *recurring_maintenance 5/14 Next Morning Update |
@jrodden1 wrote:
Our staging and production environments are still looking good for us. This isn't to say there's not another issue going on. Memory growth still occurs, but it's no longer unbounded and it gets reclaimed. We have more memory available in our clusters than it appears you have in your Heroku app Heroku so it could be that if we had a lower limit, we might be seeing some OOM issues. I am eagerly waiting @hms 's continued investigation as I think he may be on to something. I'd love to dive in deeper, but I don't have bandwidth right now. Have you tried setting |
I have to admin I reprioritized my personal work priorities given it was assumed my patch was the root cause and since my code was removed, I moved on. I can resume my digging, but I think I need some help. There are too many permutations between ruby versions, rails versions, and databases (and database versions). The community needs to identify a small subset of configurations to be our test beds as it's a ton of work for me to manage all of the versions that have been mentioned in this thread. Secondly, I need better info around if people are seeing the leak with SQ simply idling, running standard jobs, and/or running recurring jobs. That's also a lot of permutations to manage. Even worse, idling and standard jobs run on the Worker and reassuring jobs run on the Scheduler, making monitoring harder. What I've found so far is as follows:
One test that I didn't run yet, but has me curious is around Interruptible sleep. In the 1.1.5 release, Rosa kept (by accident???) one change that I made that removed a lot of unnecessary worker polling. If the problem is in some manner tied to the poll loop, then by putting back the original high volume poll loop, it should make the problem a little easier to spot. For those of you suffering from OOM issues, here is what I'm doing these days and has been working well for months:
This approach has some pros and cons:
Despite the above, it works well enough that I have abandoned a PR to implement Worker restarts as native feature of SQ. It should be noted that this works much more cleanly with a dedicated queue / single-threaded worker and known jobs that generate memory pressure. If you kill a worker with multiple threads, there is the risk killing an active job and with the current SQ design, that job can take a while to be rerun or even worse, can get caught in a run, die loop. |
I've done some further troubleshooting today. I decided to just toggle off all the recurring jobs (just commented them all out of And... The memory usage stayed fairly consistent! So it doesn't look like its crazy growing while just sitting idle. Since I didn't have any recurring jobs specified, the Scheduler process wasn't running.
This seems to narrow it down to something to do when recurring jobs are being used... I also found out that Heroku (for new apps since 2019) default sets I have tried setting both versions of the ENV var mentioned previously and didn't notice any difference in memory usage. Also, I am using Heroku Postgres as my DB. |
Huzzah! Success! I turned my recurring jobs back on toward the end of the work day yesterday and also changed my With this setup, this is the memory graph I'm getting now. While I don't have a lot of 'headroom' (only around ~100MB), I'm happy to see that the usage leveled off. I'll be curious to see if this changes if/when I add additional recurring jobs, but for now, I think I'm good! |
Ruby: 3.3.4
Rails: 7.2.1
Solid Queue: 0.7.0, 0.8.2
I run a Rails App on AWS EC2 instance with 1G of memory.
I notice the solid queue process takes up 15-20% of the instance's memory, which becomes the single largest process by memory usage.
What I checked:
(I use it to manage my solid queue process)
stop supervisorctl - free memory 276MB
start supervisorctl - free memory 117MB
Trying to see if this is something related to supervisor.
before solid_queue:start - free memory 252MB
after solid_queue:start - free memory 109MB
I upgraded to 0.8.2 (was 0.7.0).
stop supervisorctl - free memory 220MB
start supervisorctl - free memory 38MB
I need some advise:
And, thanks a lot for making this wonderful gem. :)
The text was updated successfully, but these errors were encountered: