-
Notifications
You must be signed in to change notification settings - Fork 984
fix(rp2350): add software spinlocks #5034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release
Are you sure you want to change the base?
Conversation
Writing to the UART takes time and that may not be a good idea inside an interrupt, but it is essential for debugging sometimes (especially since USB-CDC typically doesn't work inside an interrupt). This fixes UART support in interrupts for the RP2040 at least. You can test it with `-serial=uart` and connecting a USB-UART adapter to the right pins.
…he first error is returned In some cases, e.g nothing connected on the bus, repeated resume-stop sequences can lead to the bus never reaching the stop state, hanging Tx. This change ensures the resume-stop sequence is submitted once on error. It also moves the error code read to before the sequence to ensure it's valid. Fixes: tinygo-org#4998
Oh, whoops. My go fmt extension has been flaking out on me. Will have the missing rp2040 imports updated in a moment. Here's the lock/unlock disassembled output with inlining disabled:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support the switch to atomic instructions, but if you need something that works right away, RP2350-E2 mentions that some spinlocks are not affected:
The following SIO spinlocks can be used normally because they don’t alias with writable registers: 5, 6, 7,
10, 11, and 18 through 31. Some of the other lock addresses may be used safely depending on which of
the high-addressed SIO registers are in use.
Locks 18 through 24 alias with some read-only TMDS encoder registers, which is safe as only writes are
mis-decoded.
src/runtime/runtime_rp2350.go
Outdated
// r0 is automatically filled with the pointer value "l" here. | ||
// We create a variable to permit access to the state byte (l.state) and | ||
// avoid a memory fault when accessing it in assembly. | ||
state := &l.state | ||
_ = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hoping state
ends up in r0
seems brittle to me, and I'm surprised the compiler doesn't optimize it away. Are you sure you can't bind state
to an asm register a better way? https://tinygo.org/docs/concepts/compiler-internals/inline-assembly/ mentions that Cgo assembly is more full-featured and also inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, don't do this. This will break eventually. The compiler is free to put it in any register it likes, store it on the stack, whatever.
Also, all the assembly instructions below are independent from a compiler POV so the compiler is free to modify registers between them if it wants to (it probably won't, but it would be allowed to).
A much better way would be to use atomic operations directly, and with that I mean sync/atomic. See the section I posted before:
tinygo/src/runtime/runtime_tinygoriscv_qemu.go
Lines 360 to 375 in 3869f76
func (l *spinLock) Lock() { | |
// Try to replace 0 with 1. Once we succeed, the lock has been acquired. | |
for !l.Uint32.CompareAndSwap(0, 1) { | |
spinLoopWait() | |
} | |
} | |
func (l *spinLock) Unlock() { | |
// Safety check: the spinlock should have been locked. | |
if schedulerAsserts && l.Uint32.Load() != 1 { | |
runtimePanic("unlock of unlocked spinlock") | |
} | |
// Unlock the lock. Simply write 0, because we already know it is locked. | |
l.Uint32.Store(0) | |
} |
This should result in similar assembly, and if it doesn't we'd have to investigate why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's questionable, but because the Go receiver variable is always passed as the first parameter and r0 is where the first parameter will always be per the AAPCS this should always be consistent. I won't die on that hill if atomics can produce a similar enough result though. I don't recall what method I was testing, but the atomic lock method I was initially looking at disassembled to about 4x as long as this, hence the hacky setup. Looking at that one though, it's only ~2x the size which seems reasonable to me.
Hacky version:
0x10001178 <(*runtime.spinLock).Lock+0>: cbz r0, 0x10001192 <(*runtime.spinLock).Lock+26>
0x1000117a <(*runtime.spinLock).Lock+2>: ldaexb r2, [r0]
0x1000117e <(*runtime.spinLock).Lock+6>: movs r1, #1
0x10001180 <(*runtime.spinLock).Lock+8>: cmp r2, #0
0x10001182 <(*runtime.spinLock).Lock+10>: bne.n 0x1000117a <(*runtime.spinLock).Lock+2>
0x10001184 <(*runtime.spinLock).Lock+12>: strexb r2, r1, [r0]
0x10001188 <(*runtime.spinLock).Lock+16>: cmp r2, #0
0x1000118a <(*runtime.spinLock).Lock+18>: bne.n 0x1000117a <(*runtime.spinLock).Lock+2>
0x1000118c <(*runtime.spinLock).Lock+20>: dmb sy
0x10001190 <(*runtime.spinLock).Lock+24>: bx lr
0x10001192 <(*runtime.spinLock).Lock+26>: bl 0x100013f4 <runtime.nilPanic>
Extensive version:
0x10001178 <(*runtime.spinLock).Lock+0>: cbz r0, 0x100011d8 <(*runtime.spinLock).Lock+96>
0x1000117a <(*runtime.spinLock).Lock+2>: adds r0, #4
0x1000117c <(*runtime.spinLock).Lock+4>: movs r1, #1
0x1000117e <(*runtime.spinLock).Lock+6>: nop
0x10001180 <(*runtime.spinLock).Lock+8>: ldaex r2, [r0]
0x10001184 <(*runtime.spinLock).Lock+12>: cbnz r2, 0x10001190 <(*runtime.spinLock).Lock+24>
0x10001186 <(*runtime.spinLock).Lock+14>: stlex r2, r1, [r0]
0x1000118a <(*runtime.spinLock).Lock+18>: cmp r2, #0
0x1000118c <(*runtime.spinLock).Lock+20>: bne.n 0x10001180 <(*runtime.spinLock).Lock+8>
0x1000118e <(*runtime.spinLock).Lock+22>: b.n 0x100011d2 <(*runtime.spinLock).Lock+90>
0x10001190 <(*runtime.spinLock).Lock+24>: clrex
0x10001194 <(*runtime.spinLock).Lock+28>: ldaex r2, [r0]
0x10001198 <(*runtime.spinLock).Lock+32>: cbnz r2, 0x100011a4 <(*runtime.spinLock).Lock+44>
0x1000119a <(*runtime.spinLock).Lock+34>: stlex r2, r1, [r0]
0x1000119e <(*runtime.spinLock).Lock+38>: cmp r2, #0
0x100011a0 <(*runtime.spinLock).Lock+40>: bne.n 0x10001194 <(*runtime.spinLock).Lock+28>
0x100011a2 <(*runtime.spinLock).Lock+42>: b.n 0x100011d2 <(*runtime.spinLock).Lock+90>
0x100011a4 <(*runtime.spinLock).Lock+44>: clrex
0x100011a8 <(*runtime.spinLock).Lock+48>: ldaex r2, [r0]
0x100011ac <(*runtime.spinLock).Lock+52>: cbnz r2, 0x100011b8 <(*runtime.spinLock).Lock+64>
0x100011ae <(*runtime.spinLock).Lock+54>: stlex r2, r1, [r0]
0x100011b2 <(*runtime.spinLock).Lock+58>: cmp r2, #0
0x100011b4 <(*runtime.spinLock).Lock+60>: bne.n 0x100011a8 <(*runtime.spinLock).Lock+48>
0x100011b6 <(*runtime.spinLock).Lock+62>: b.n 0x100011d2 <(*runtime.spinLock).Lock+90>
0x100011b8 <(*runtime.spinLock).Lock+64>: clrex
0x100011bc <(*runtime.spinLock).Lock+68>: ldaex r2, [r0]
0x100011c0 <(*runtime.spinLock).Lock+72>: cbnz r2, 0x100011cc <(*runtime.spinLock).Lock+84>
0x100011c2 <(*runtime.spinLock).Lock+74>: stlex r2, r1, [r0]
0x100011c6 <(*runtime.spinLock).Lock+78>: cmp r2, #0
0x100011c8 <(*runtime.spinLock).Lock+80>: bne.n 0x100011bc <(*runtime.spinLock).Lock+68>
0x100011ca <(*runtime.spinLock).Lock+82>: b.n 0x100011d2 <(*runtime.spinLock).Lock+90>
0x100011cc <(*runtime.spinLock).Lock+84>: clrex
0x100011d0 <(*runtime.spinLock).Lock+88>: b.n 0x10001180 <(*runtime.spinLock).Lock+8>
0x100011d2 <(*runtime.spinLock).Lock+90>: dmb sy
0x100011d6 <(*runtime.spinLock).Lock+94>: bx lr
0x100011d8 <(*runtime.spinLock).Lock+96>: bl 0x10001438 <runtime.nilPanic>
That version:
0x10001178 <(*runtime.spinLock).Lock+0>: cbz r0, 0x100011ae <(*runtime.spinLock).Lock+54>
0x1000117a <(*runtime.spinLock).Lock+2>: adds r0, #4
0x1000117c <(*runtime.spinLock).Lock+4>: movs r1, #1
0x1000117e <(*runtime.spinLock).Lock+6>: nop
0x10001180 <(*runtime.spinLock).Lock+8>: ldaex r2, [r0]
0x10001184 <(*runtime.spinLock).Lock+12>: cbnz r2, 0x10001192 <(*runtime.spinLock).Lock+26>
0x10001186 <(*runtime.spinLock).Lock+14>: stlex r2, r1, [r0]
0x1000118a <(*runtime.spinLock).Lock+18>: cmp r2, #0
0x1000118c <(*runtime.spinLock).Lock+20>: it eq
0x1000118e <(*runtime.spinLock).Lock+22>: bxeq lr
0x10001190 <(*runtime.spinLock).Lock+24>: b.n 0x10001180 <(*runtime.spinLock).Lock+8>
0x10001192 <(*runtime.spinLock).Lock+26>: movs r1, #1
0x10001194 <(*runtime.spinLock).Lock+28>: clrex
0x10001198 <(*runtime.spinLock).Lock+32>: wfe
0x1000119a <(*runtime.spinLock).Lock+34>: nop
0x1000119c <(*runtime.spinLock).Lock+36>: ldaex r2, [r0]
0x100011a0 <(*runtime.spinLock).Lock+40>: cmp r2, #0
0x100011a2 <(*runtime.spinLock).Lock+42>: bne.n 0x10001194 <(*runtime.spinLock).Lock+28>
0x100011a4 <(*runtime.spinLock).Lock+44>: stlex r2, r1, [r0]
0x100011a8 <(*runtime.spinLock).Lock+48>: cmp r2, #0
0x100011aa <(*runtime.spinLock).Lock+50>: bne.n 0x1000119c <(*runtime.spinLock).Lock+36>
0x100011ac <(*runtime.spinLock).Lock+52>: bx lr
0x100011ae <(*runtime.spinLock).Lock+54>: bl 0x10001414 <runtime.nilPanic>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed this one isn't doing a memory barrier at the end. I assume we'll want to add that to the rp2350 atomics, but I'm not sure where this implementation is coming from exactly. Is this generated from the arm assembly in the mainstream sync/atomic package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's questionable, but because the Go receiver variable is always passed as the first parameter and r0 is where the first parameter will always be per the AAPCS this should always be consistent.
The AAPCS only applies on non-inlined externally available functions. That doesn't apply here. The compiler is free to inline these anywhere and use any register.
There are a few cases where you can rely on the calling convention, but this is not one of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I didn't consider inlining, that could have been problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even without inlining there is no reason the compiler would be required to keep state
in any particular register. It might have picked any register, you're lucky it picked r0 in this case. In fact I had expected it would have optimized it out entirely.
src/runtime/runtime_rp2350.go
Outdated
arm.Asm("1:") | ||
// Exclusively load (lock) the state byte and put its value in r2. | ||
arm.Asm("ldaexb r2, [r0]") | ||
// Set the r1 register to '1' for later use. | ||
arm.Asm("movs r1, #1") | ||
// Check if the lock was already taken (r2 != 0). | ||
arm.Asm("cmp r2, #0") | ||
// Jump back to the loop start ("1:") if the lock is already held. | ||
arm.Asm("bne 1b") | ||
|
||
// Attempt to store '1' into the lock state byte. | ||
// The return code (0 for success, 1 for failure) is placed in r2. | ||
arm.Asm("strexb r2, r1, [r0]") | ||
// Check if the result was successful (r2 == 0). | ||
arm.Asm("cmp r2, #0") | ||
// Jump back to the loop start ("1:") if the lock was not acquired. | ||
arm.Asm("bne 1b") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With register binding through Cgo assembly, it seems to me that the assembly can be cut down to just the special instructions (ldaexb
and strexb
) and the rest kept in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about the arm.AsmFull()
functions? I originally tried that, but it doesn't allow passing through pointer types for some reason (has a note about having been removed in v0.23.0)
If you mean something else I'm curious though. Initial google results aren't bringing up much
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking about "Inline assembly using CGo": https://tinygo.org/docs/concepts/compiler-internals/inline-assembly/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline assembly through CGo is indeed an option, and can make the code slightly more efficient. I recommend reading this page to get an understanding of how it works: http://www.ethernut.de/en/documents/arm-inline-asm.html
Other than that, it's just standard CGo. You can make the function static
and put it directly in the Go file like so:
// static void spinlock_lock(unsigned *lock) { ... }
import "C"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment below.
Also,
Another thing I noticed in that section, they also always disable interrupts when the spinlocks are being held, we may want to do the same: [...]
The runtime does this in various places if needed. The spinlock implementation doesn't need to disable interrupts too.
src/runtime/runtime_rp2350.go
Outdated
// r0 is automatically filled with the pointer value "l" here. | ||
// We create a variable to permit access to the state byte (l.state) and | ||
// avoid a memory fault when accessing it in assembly. | ||
state := &l.state | ||
_ = state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, don't do this. This will break eventually. The compiler is free to put it in any register it likes, store it on the stack, whatever.
Also, all the assembly instructions below are independent from a compiler POV so the compiler is free to modify registers between them if it wants to (it probably won't, but it would be allowed to).
A much better way would be to use atomic operations directly, and with that I mean sync/atomic. See the section I posted before:
tinygo/src/runtime/runtime_tinygoriscv_qemu.go
Lines 360 to 375 in 3869f76
func (l *spinLock) Lock() { | |
// Try to replace 0 with 1. Once we succeed, the lock has been acquired. | |
for !l.Uint32.CompareAndSwap(0, 1) { | |
spinLoopWait() | |
} | |
} | |
func (l *spinLock) Unlock() { | |
// Safety check: the spinlock should have been locked. | |
if schedulerAsserts && l.Uint32.Load() != 1 { | |
runtimePanic("unlock of unlocked spinlock") | |
} | |
// Unlock the lock. Simply write 0, because we already know it is locked. | |
l.Uint32.Store(0) | |
} |
This should result in similar assembly, and if it doesn't we'd have to investigate why.
Thank you for tracking those down. I figured there would probably be some, but I couldn't find them |
…tests that use Chromium headless browser to avoid use of older incompatible version. Signed-off-by: deadprogram <[email protected]>
…wser used for running the wasm tests. Also add favicon link to avoid extra fetches during test runs. Signed-off-by: deadprogram <[email protected]>
With these flags, the TinyGo binary gets 18.8MB (11.6%) smaller. That seems like a quite useful win for such a small change! This is only for Linux for now. MacOS and Windows can be tested later, the flags for those probably need to be modified. Originally inspired by: https://discourse.llvm.org/t/state-of-the-art-for-reducing-executable-size-with-heavily-optimized-program/87952/18 There are some other flags like -Wl,--pack-dyn-relocs=relr that did not shrink binary size in my testing, so I've left them out. This also switches the linker to prefer mold or lld over the default linker, since the system linker is usually ld.bfd which is very slow. (Also, for some reason mold produces smaller binaries than lld).
The revive command seems to have had a syntax error in the file input glob. It appears to have been broken in a way that did not result in a return code being set. This change uses 'find' to build the input to the linter. Note that it is expected to fail the CI script, because it is uncovering some existing lint issues that were not being caught.
Signed-off-by: deadprogram <[email protected]>
This switches the Espressif fork from LLVM 19 to LLVM 20, so we can use the improvements made between those LLVM versions. It also better aligns with the system-LLVM build method, which currently also defaults to LLVM 20. Note that this disables the machine outliner for RISC-V. It appears there's a bug in there somewhere, with the machine outliner enabled the crypto/elliptic package tests fail with -target=riscv-qemu. This should ideally be investigated and reported upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than a nit, this looks good to me. @aykevl WDYT?
src/runtime/runtime_rp2.go
Outdated
printLock = spinLock{id: 0} | ||
schedulerLock = spinLock{id: 1} | ||
atomicsLock = spinLock{id: 2} | ||
futexLock = spinLock{id: 3} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id
is an implementation detail of rp2040 spinlocks, whereas on rp2350 id
s have some meaning but are unusued. I suggest moving the spinlock variables to the rpXXXX.go files and avoid the id
field on rp2350.
src/runtime/runtime_rp2350.go
Outdated
|
||
type spinLock struct { | ||
atomic.Uint32 | ||
id uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That one's still needed for compatibility with how the rp2040 hardware spinlocks are initialized here:
tinygo/src/runtime/runtime_rp2.go
Lines 295 to 298 in 109e076
printLock = spinLock{id: 20} | |
schedulerLock = spinLock{id: 21} | |
atomicsLock = spinLock{id: 22} | |
futexLock = spinLock{id: 23} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Above, I suggested moving that var ()
block to the runtime_rp2xxx.go
files to get rid of that dependency.
After doing some testing, I think this might not actually be locking properly. Going to do some more thorough testing |
- do not install cmake, instead use the version already installed - add macOS 15 to the CI builds - update a could of GH actions to latest release - update Go version being use to Go 1.25.1 Signed-off-by: deadprogram <[email protected]>
GitHub org. The git server for git.musl-libc.org is having troubles, and it also seems like a safer bet to have our own mirror just in case. Signed-off-by: deadprogram <[email protected]>
…d by the test-macos-homebrew job, and it conflicts with the actual build that we want, which is macOS 14 for backwards-compatibility. Signed-off-by: deadprogram <[email protected]>
Signed-off-by: Piotr Bocheński <[email protected]>
Yes, this looks good. @mikesmitty can you apply the changes proposed by @eliasnaur? Also, you may want to rebase on the dev branch to hopefully fix the assert-test-linux CI failure. |
1f2642c
to
cee4537
Compare
Sure, here's the rebase. I haven't had time to test or diagnose it yet, been stuck working on a project that's consumed all my free time, but I'm pretty sure it's not functioning properly as confirmed by that CI failure |
Hmm, no I guess it is actually locking. I'm not sure what's happening with that CI test though |
As it turns out, the RP2350 has hardware spinlocks that can be unlocked by writes to nearby addresses, the lower spinlocks currently in use in TinyGo happen to be unlocked by writes to the doorbell interrupt registers used to signal between cores, very possibly leading to some unexpected unlocks. This was not corrected in the A3 or A4 steppings and instead software spinlocks are used by default on RP2350 in pico-sdk:
https://www.raspberrypi.com/documentation/pico-sdk/hardware.html#group_hardware_sync
Another thing I noticed in that section, they also always disable interrupts when the spinlocks are being held, we may want to do the same:
These are the software spinlock macros ported over:
https://github.com/raspberrypi/pico-sdk/blob/2.2.0/src/rp2_common/hardware_sync_spin_lock/include/hardware/sync/spin_lock.h#L112
https://github.com/raspberrypi/pico-sdk/blob/2.2.0/src/rp2_common/hardware_sync_spin_lock/include/hardware/sync/spin_lock.h#L197