Skip to content

Fix potential deadlock in CacheState::lock #15698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 23, 2025

Conversation

smoelius
Copy link
Contributor

@smoelius smoelius commented Jun 23, 2025

This PR fixes a potential source of deadlock in the CacheState::lock function (here), as explained below.

I ran into this deadlock while testing Dylint. For example, in this GitHub run, two jobs were killed after running for six hours. This fix seems to resolve the deadlock (e.g., see this run, which uses the fix).

Until this fix (or a similar one) appears in rustup-installable Cargo, is there an easy workaround?


A CacheState struct holds two recursive locks: mutate_lock and cache_lock.

When MutateExclusive is passed to CacheState::lock, it tries to acquire both locks. First, it tries to acquire mutate_lock, then it tries to acquire cache_lock.

The problematic case is when it acquires the first, but not the second.

Note that if the second cannot be acquired because of an error, the mutate_lock recursive lock is decremented:

Err(e) => {
self.mutate_lock.decrement();
return Err(e);
}

However, if the second would simply block, LockingResult::WouldBlock is returned.

CacheState::lock is called from two places. One of those locations is in CacheLocker::try_lock:1

if state.lock(gctx, mode, NonBlocking)? == LockAcquired {
Ok(Some(CacheLock { mode, locker: self }))
} else {
Ok(None)
}

Note that CacheLocker::try_lock creates a CacheLock if and only if LockingResult::LockAcquired is returned.

Furthermore, when a CacheLock is dropped, it decrements both mutate_lock and cache_lock:

MutateExclusive => {
state.cache_lock.decrement();
state.mutate_lock.decrement();
}

A scan of cache_lock.rs shows that there are only three places2 where mutate_lock.decrement() is called: the error location in CacheState::lock (referenced above), and two places in CacheLock::drop.

Thus, if LockingResult::WouldBlock is returned from CacheState::lock, mutate_lock is never decremented.

Footnotes

  1. The other location is in CacheLocker::lock, which calls CacheState::lock with BlockingMode::Blocking. For that reason, CacheLocker::lock should not return WouldBlock when called from this location.

  2. https://github.com/rust-lang/cargo/blob/84709f085062cbf3c51fa507527c1b2334015178/src/cargo/util/cache_lock.rs#L413, https://github.com/rust-lang/cargo/blob/84709f085062cbf3c51fa507527c1b2334015178/src/cargo/util/cache_lock.rs#L438, and https://github.com/rust-lang/cargo/blob/84709f085062cbf3c51fa507527c1b2334015178/src/cargo/util/cache_lock.rs#L445

@rustbot
Copy link
Collaborator

rustbot commented Jun 23, 2025

r? @epage

rustbot has assigned @epage.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jun 23, 2025
Copy link
Contributor

@epage epage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@epage epage added this pull request to the merge queue Jun 23, 2025
Merged via the queue into rust-lang:master with commit 409fed7 Jun 23, 2025
24 checks passed
@smoelius
Copy link
Contributor Author

Is there any chance this fix could make it into Thursday's release? (Sorry if this is a crazy or unreasonable ask.)

@smoelius smoelius deleted the fix-cache-state-lock branch June 23, 2025 16:39
@epage
Copy link
Contributor

epage commented Jun 23, 2025

Despite the forge saying 1.89 beta will be branched on the 20th, it doesn't look like its been created yet, so this should at least make that without a beta backport.

We have 3 days before 1.88 is released. We just did a submodule update for rust-lang/rust (rust-lang/rust#142898). So to do this we'd need to feel comfortable enough with this PR with little testing, get a beta backport merged, and update the submodule. I'm not even sure what the exact cut off for a change like this is. This problem was introduced in #12706 which was released in 1.75. I feel like we can live for 6 more weeks with this but I'll bring it up with the team.

@smoelius
Copy link
Contributor Author

Thanks very much for the explanation.

@epage
Copy link
Contributor

epage commented Jun 23, 2025

Looks like we did miss the 1.89 window so I backported this in #15699.

@smoelius
Copy link
Contributor Author

Thanks for keeping me informed, and thanks for going to all of this trouble. 🙏

ehuss added a commit that referenced this pull request Jun 23, 2025
Beta backports
- 0b362e3 (#15698)

In order to make CI pass, the following PRs are also cherry-picked:

---

Problem has been around since 1.75 but significant enough for us to aim
for 6 weeks of testing instead of 12.
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jun 25, 2025
Update cargo

2 commits in 84709f085062cbf3c51fa507527c1b2334015178..409fed7dc1553d49cb9a8c0637d12d65571346ce
2025-06-22 23:58:39 +0000 to 2025-06-23 15:55:04 +0000
- Fix potential deadlock in `CacheState::lock` (rust-lang/cargo#15698)
- feat(toml): Parse support for multiple build scripts (rust-lang/cargo#15630)
workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Jun 25, 2025
Update cargo

2 commits in 84709f085062cbf3c51fa507527c1b2334015178..409fed7dc1553d49cb9a8c0637d12d65571346ce
2025-06-22 23:58:39 +0000 to 2025-06-23 15:55:04 +0000
- Fix potential deadlock in `CacheState::lock` (rust-lang/cargo#15698)
- feat(toml): Parse support for multiple build scripts (rust-lang/cargo#15630)
rust-timer added a commit to rust-lang/rust that referenced this pull request Jun 25, 2025
Rollup merge of #142993 - ehuss:update-cargo, r=ehuss

Update cargo

2 commits in 84709f085062cbf3c51fa507527c1b2334015178..409fed7dc1553d49cb9a8c0637d12d65571346ce
2025-06-22 23:58:39 +0000 to 2025-06-23 15:55:04 +0000
- Fix potential deadlock in `CacheState::lock` (rust-lang/cargo#15698)
- feat(toml): Parse support for multiple build scripts (rust-lang/cargo#15630)
@rustbot rustbot added this to the 1.90.0 milestone Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants