Skip to content

[Bug] Concurrent managedidentity.Client.AcquireToken() calls cause multiple token requests to identity provider before caching #569

@comtalyst

Description

@comtalyst

Which version of MSAL Go are you using?
Latest in main

Where is the issue?

  • Public client
    • Device code flow
    • Username/Password (ROPC grant)
    • Authorization code flow
  • Confidential client
    • Authorization code flow
    • Client credentials:
      • client secret
      • client certificate
  • Token cache serialization
    • In-memory cache
  • Other (please describe)
    • Managed identity client/token caching

Is this a new or an existing app?

The app is in production, although the area with most likelihood of issue have the mitigation installed.

What version of Go are you using (go version)?

$ go version
go version go1.24.3 linux/amd64

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

Repro
Adding this unit test to managedidentity package:

package managedidentity

import (
	"context"
	"encoding/json"
	"fmt"
	"net/http"
	"sync"
	"testing"
	"time"

	"github.com/AzureAD/microsoft-authentication-library-for-go/apps/internal/base/storage"
	"github.com/AzureAD/microsoft-authentication-library-for-go/apps/internal/mock"
)

func TestAcquireTokenConcurrency(t *testing.T) {
	resource := "https://management.azure.com"
	miType := SystemAssigned()
	setEnvVars(t, DefaultToIMDS)
	before := cacheManager
	defer func() { cacheManager = before }()
	cacheManager = storage.New(nil)

	// Track the number of HTTP requests made to IMDS, and the number of unique tokens received from AcquireToken
	// Optimally, both are supposed to be 1, as the first request should write to the cache, then become reusable
	var requestCount int32
	var requestCountMutex sync.Mutex
	var acquiredTokens []string
	var acquiredTokensMutex sync.Mutex

	tries := 10

	// Assume no caching on token provider server
	mockClient := mock.NewClient()
	for i := 0; i < tries; i++ {
		token := fmt.Sprintf("[%d]", i)
		responseBody, err := json.Marshal(SuccessfulResponse{
			AccessToken: token,
			ExpiresIn:   3600,
			ExpiresOn:   time.Now().Add(time.Hour).Unix(),
			Resource:    resource,
			TokenType:   "Bearer",
		})
		if err != nil {
			t.Fatal(err)
		}

		mockClient.AppendResponse(
			mock.WithHTTPStatusCode(http.StatusOK),
			mock.WithBody(responseBody),
			mock.WithCallback(func(r *http.Request) {
				requestCountMutex.Lock()
				t.Logf("token provider server providing token: %s", token)
				requestCount++
				requestCountMutex.Unlock()
			}),
		)
	}
	client, err := New(miType, WithHTTPClient(mockClient))
	if err != nil {
		t.Fatal(err)
	}

	// Launch multiple goroutines for AcquireToken() simultaneously
	numGoroutines := tries
	var wg sync.WaitGroup
	for i := 0; i < numGoroutines; i++ {
		wg.Add(1)
		go func(routineID int) {
			defer wg.Done()

			// Call AcquireToken in each goroutine
			result, err := client.AcquireToken(context.Background(), resource)
			if err != nil {
				t.Errorf("goroutine %d failed: %v", routineID, err)
				return
			}

			// Capture the token received
			acquiredTokensMutex.Lock()
			t.Logf("AcquireToken() caller receives token: %s", result.AccessToken)
			acquiredTokens = append(acquiredTokens, result.AccessToken)
			acquiredTokensMutex.Unlock()
		}(i)
	}
	wg.Wait()
	// Although, in the current implementation, we expect
	// - Multiple HTTP requests to IMDS until the cache is written and the subsequent call/goroutine sees the cached token
	// - Each new token received from the server overwrites the previous one in cache
	// - Different goroutines may receive different tokens

	uniqueTokens := make(map[string]bool)
	for _, token := range acquiredTokens {
		uniqueTokens[token] = true
	}

	// Log the results to show the race condition
	t.Logf("- Total HTTP requests made: %d", requestCount)
	t.Logf("- All tokens captured: %v", acquiredTokens)
	t.Logf("- Unique tokens received: %d", len(uniqueTokens))
}

Expected behavior
AcquireToken(...) sends the request to identity provider once, caches, and subsequent (concurrent) calls to AcquireToken(...) reuses the cache.

$ go test -v -run TestAcquireTokenConcurrency
=== RUN   TestAcquireTokenConcurrency
    race_condition_test.go:53: token provider server providing token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:98: - Total HTTP requests made: 1
    race_condition_test.go:99: - All tokens captured: [[0] [0] [0] [0] [0] [0] [0] [0] [0] [0]]
    race_condition_test.go:100: - Unique tokens received: 1

Actual behavior
Cache read of the subsequent (concurrent) calls may occur before cache write by the previous call(s). This have resulted in these calls also sending requests to identity provider, when it could have used the previous one.
Examples:

=== RUN   TestAcquireTokenConcurrency
    race_condition_test.go:55: token provider server providing token: [0]
    race_condition_test.go:83: AcquireToken() caller receives token: [0]
    race_condition_test.go:55: token provider server providing token: [1]
    race_condition_test.go:55: token provider server providing token: [2]
    race_condition_test.go:83: AcquireToken() caller receives token: [0]
    race_condition_test.go:83: AcquireToken() caller receives token: [0]
    race_condition_test.go:83: AcquireToken() caller receives token: [0]
    race_condition_test.go:83: AcquireToken() caller receives token: [0]
    race_condition_test.go:83: AcquireToken() caller receives token: [0]
    race_condition_test.go:83: AcquireToken() caller receives token: [0]
    race_condition_test.go:83: AcquireToken() caller receives token: [0]
    race_condition_test.go:83: AcquireToken() caller receives token: [1]
    race_condition_test.go:83: AcquireToken() caller receives token: [2]
    race_condition_test.go:100: - Total HTTP requests made: 3
    race_condition_test.go:101: - All tokens captured: [[0] [0] [0] [0] [0] [0] [0] [0] [1] [2]]
    race_condition_test.go:102: - Unique tokens received: 3
=== RUN   TestAcquireTokenConcurrency
    race_condition_test.go:53: token provider server providing token: [0]
    race_condition_test.go:53: token provider server providing token: [1]
    race_condition_test.go:53: token provider server providing token: [2]
    race_condition_test.go:53: token provider server providing token: [3]
    race_condition_test.go:53: token provider server providing token: [4]
    race_condition_test.go:81: AcquireToken() caller receives token: [1]
    race_condition_test.go:81: AcquireToken() caller receives token: [2]
    race_condition_test.go:81: AcquireToken() caller receives token: [3]
    race_condition_test.go:81: AcquireToken() caller receives token: [3]
    race_condition_test.go:81: AcquireToken() caller receives token: [0]
    race_condition_test.go:81: AcquireToken() caller receives token: [4]
    race_condition_test.go:53: token provider server providing token: [5]
    race_condition_test.go:81: AcquireToken() caller receives token: [5]
    race_condition_test.go:53: token provider server providing token: [6]
    race_condition_test.go:81: AcquireToken() caller receives token: [6]
    race_condition_test.go:53: token provider server providing token: [7]
    race_condition_test.go:81: AcquireToken() caller receives token: [7]
    race_condition_test.go:53: token provider server providing token: [8]
    race_condition_test.go:81: AcquireToken() caller receives token: [8]
    race_condition_test.go:98: - Total HTTP requests made: 9
    race_condition_test.go:99: - All tokens captured: [[1] [2] [3] [3] [0] [4] [5] [6] [7] [8]]
    race_condition_test.go:100: - Unique tokens received: 9

Possible solution
The evaluation of whether to fetch the new token should be made after the cache is stabilized. Maybe wrap the logic of AcquireToken() with a lock?

Additional context / logs / screenshots
One of its uses in azure-sdk-for-go seems to lock this anyway. However, I don't think this lock requirement is intentional(?), and we might just get lucky that this use case locks this(?).
My use case is not through azure-sdk-for-go, so we have to install a lock like azure-sdk-for-go at the moment.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingp2

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions