-
Notifications
You must be signed in to change notification settings - Fork 100
Description
Which version of MSAL Go are you using?
Latest in main
Where is the issue?
- Public client
- Device code flow
- Username/Password (ROPC grant)
- Authorization code flow
- Confidential client
- Authorization code flow
- Client credentials:
- client secret
- client certificate
- Token cache serialization
- In-memory cache
- Other (please describe)
- Managed identity client/token caching
Is this a new or an existing app?
The app is in production, although the area with most likelihood of issue have the mitigation installed.
What version of Go are you using (go version)?
$ go version go version go1.24.3 linux/amd64
What operating system and processor architecture are you using (go env)?
go env Output
$ go env
Repro
Adding this unit test to managedidentity package:
package managedidentity
import (
"context"
"encoding/json"
"fmt"
"net/http"
"sync"
"testing"
"time"
"github.com/AzureAD/microsoft-authentication-library-for-go/apps/internal/base/storage"
"github.com/AzureAD/microsoft-authentication-library-for-go/apps/internal/mock"
)
func TestAcquireTokenConcurrency(t *testing.T) {
resource := "https://management.azure.com"
miType := SystemAssigned()
setEnvVars(t, DefaultToIMDS)
before := cacheManager
defer func() { cacheManager = before }()
cacheManager = storage.New(nil)
// Track the number of HTTP requests made to IMDS, and the number of unique tokens received from AcquireToken
// Optimally, both are supposed to be 1, as the first request should write to the cache, then become reusable
var requestCount int32
var requestCountMutex sync.Mutex
var acquiredTokens []string
var acquiredTokensMutex sync.Mutex
tries := 10
// Assume no caching on token provider server
mockClient := mock.NewClient()
for i := 0; i < tries; i++ {
token := fmt.Sprintf("[%d]", i)
responseBody, err := json.Marshal(SuccessfulResponse{
AccessToken: token,
ExpiresIn: 3600,
ExpiresOn: time.Now().Add(time.Hour).Unix(),
Resource: resource,
TokenType: "Bearer",
})
if err != nil {
t.Fatal(err)
}
mockClient.AppendResponse(
mock.WithHTTPStatusCode(http.StatusOK),
mock.WithBody(responseBody),
mock.WithCallback(func(r *http.Request) {
requestCountMutex.Lock()
t.Logf("token provider server providing token: %s", token)
requestCount++
requestCountMutex.Unlock()
}),
)
}
client, err := New(miType, WithHTTPClient(mockClient))
if err != nil {
t.Fatal(err)
}
// Launch multiple goroutines for AcquireToken() simultaneously
numGoroutines := tries
var wg sync.WaitGroup
for i := 0; i < numGoroutines; i++ {
wg.Add(1)
go func(routineID int) {
defer wg.Done()
// Call AcquireToken in each goroutine
result, err := client.AcquireToken(context.Background(), resource)
if err != nil {
t.Errorf("goroutine %d failed: %v", routineID, err)
return
}
// Capture the token received
acquiredTokensMutex.Lock()
t.Logf("AcquireToken() caller receives token: %s", result.AccessToken)
acquiredTokens = append(acquiredTokens, result.AccessToken)
acquiredTokensMutex.Unlock()
}(i)
}
wg.Wait()
// Although, in the current implementation, we expect
// - Multiple HTTP requests to IMDS until the cache is written and the subsequent call/goroutine sees the cached token
// - Each new token received from the server overwrites the previous one in cache
// - Different goroutines may receive different tokens
uniqueTokens := make(map[string]bool)
for _, token := range acquiredTokens {
uniqueTokens[token] = true
}
// Log the results to show the race condition
t.Logf("- Total HTTP requests made: %d", requestCount)
t.Logf("- All tokens captured: %v", acquiredTokens)
t.Logf("- Unique tokens received: %d", len(uniqueTokens))
}Expected behavior
AcquireToken(...) sends the request to identity provider once, caches, and subsequent (concurrent) calls to AcquireToken(...) reuses the cache.
$ go test -v -run TestAcquireTokenConcurrency
=== RUN TestAcquireTokenConcurrency
race_condition_test.go:53: token provider server providing token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:98: - Total HTTP requests made: 1
race_condition_test.go:99: - All tokens captured: [[0] [0] [0] [0] [0] [0] [0] [0] [0] [0]]
race_condition_test.go:100: - Unique tokens received: 1
Actual behavior
Cache read of the subsequent (concurrent) calls may occur before cache write by the previous call(s). This have resulted in these calls also sending requests to identity provider, when it could have used the previous one.
Examples:
=== RUN TestAcquireTokenConcurrency
race_condition_test.go:55: token provider server providing token: [0]
race_condition_test.go:83: AcquireToken() caller receives token: [0]
race_condition_test.go:55: token provider server providing token: [1]
race_condition_test.go:55: token provider server providing token: [2]
race_condition_test.go:83: AcquireToken() caller receives token: [0]
race_condition_test.go:83: AcquireToken() caller receives token: [0]
race_condition_test.go:83: AcquireToken() caller receives token: [0]
race_condition_test.go:83: AcquireToken() caller receives token: [0]
race_condition_test.go:83: AcquireToken() caller receives token: [0]
race_condition_test.go:83: AcquireToken() caller receives token: [0]
race_condition_test.go:83: AcquireToken() caller receives token: [0]
race_condition_test.go:83: AcquireToken() caller receives token: [1]
race_condition_test.go:83: AcquireToken() caller receives token: [2]
race_condition_test.go:100: - Total HTTP requests made: 3
race_condition_test.go:101: - All tokens captured: [[0] [0] [0] [0] [0] [0] [0] [0] [1] [2]]
race_condition_test.go:102: - Unique tokens received: 3
=== RUN TestAcquireTokenConcurrency
race_condition_test.go:53: token provider server providing token: [0]
race_condition_test.go:53: token provider server providing token: [1]
race_condition_test.go:53: token provider server providing token: [2]
race_condition_test.go:53: token provider server providing token: [3]
race_condition_test.go:53: token provider server providing token: [4]
race_condition_test.go:81: AcquireToken() caller receives token: [1]
race_condition_test.go:81: AcquireToken() caller receives token: [2]
race_condition_test.go:81: AcquireToken() caller receives token: [3]
race_condition_test.go:81: AcquireToken() caller receives token: [3]
race_condition_test.go:81: AcquireToken() caller receives token: [0]
race_condition_test.go:81: AcquireToken() caller receives token: [4]
race_condition_test.go:53: token provider server providing token: [5]
race_condition_test.go:81: AcquireToken() caller receives token: [5]
race_condition_test.go:53: token provider server providing token: [6]
race_condition_test.go:81: AcquireToken() caller receives token: [6]
race_condition_test.go:53: token provider server providing token: [7]
race_condition_test.go:81: AcquireToken() caller receives token: [7]
race_condition_test.go:53: token provider server providing token: [8]
race_condition_test.go:81: AcquireToken() caller receives token: [8]
race_condition_test.go:98: - Total HTTP requests made: 9
race_condition_test.go:99: - All tokens captured: [[1] [2] [3] [3] [0] [4] [5] [6] [7] [8]]
race_condition_test.go:100: - Unique tokens received: 9
Possible solution
The evaluation of whether to fetch the new token should be made after the cache is stabilized. Maybe wrap the logic of AcquireToken() with a lock?
Additional context / logs / screenshots
One of its uses in azure-sdk-for-go seems to lock this anyway. However, I don't think this lock requirement is intentional(?), and we might just get lucky that this use case locks this(?).
My use case is not through azure-sdk-for-go, so we have to install a lock like azure-sdk-for-go at the moment.