Skip to content

runtime: heap mspan limit is set too late, causing data race between span allocation and conservative scanning #74288

Closed
@mknyszek

Description

@mknyszek

Conservative scanning filters out invalid pointers by relying on the span lookup mechanics (spanOf), and double-checking that the pointer points within that span's bounds, since the span table can contain stale entries.

Unfortunately, the upper bound on the span is set very late for heap spans. It is not set in allocSpan before the atomic write to the span state, thereby publishing it to the garbage collector, but is instead written far later, either in mcache.allocLarge or mcentral.grow. It's also not cleared by mspan.init which would also be fine.

If any one of these things were true, the bug would have been avoided:

  1. mspan.init sets or clears limit.
  2. mspan.allocSpan sets limit explicitly in the heap span case, just like the other cases.
  3. On span free, we clear the span table.

I propose we do (1) and (2). My proposed semantics are that mspan.init always sets limit to a safe value, but this value is later adjusted down by allocSpan and allocLarge respectively. mcentral.grow should not be setting limit because it's way too late. mspan.init does not have enough information on its own to set right size, but for small object spans we need to set the limit before returning from allocSpan, since part of the span is not valid space for objects, but is instead only for heap metadata. However, we do need to allow allocLarge to adjust the value, because that's the only place we have the true size of the object. Luckily, it's fairly safe if a lookup tries to check the 'dead' part of a large object span.

Once upon a time I had a change for (3), which was necessary for some experimental performance changes, and it didn't seem to cost much extra to clear the span table entries, if anything at all. But perhaps let's leave that alone for now, since there is a performance risk.

AFAICT, this bug has existed since Go 1.14 when conservative scanning was introduced. mspan.limit was essentially always just used for debug checks before then, but in Go 1.14 it became load-bearing.

This bug is filed on behalf of a crash found within Google.

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions