feat(aws): add warning for provider-specific properties without setIdentifier #5799

u-kai · 2025-09-03T12:37:48Z

What does it do ?

Add warning logs when AWS Route53 provider-specific routing policy properties (weight, region, failover, geolocation, geoproximity, multi-value) are specified without the required setIdentifier.

This helps users identify misconfigurations where their routing policies are silently ignored by Route53.

Motivation

Add warning logs when AWS Route53 provider-specific routing policy properties (weight, region, failover, geolocation, geoproximity, multi-value) are specified without the required setIdentifier.
This helps users identify misconfigurations where their routing policies are silently ignored by Route53.

Fixes #5775

When users configure AWS Route53 routing policies using ExternalDNS annotations like external-dns.alpha.kubernetes.io/aws-weight: "200" but forget to include external-dns.alpha.kubernetes.io/set-identifier, Route53 silently ignores the routing policy and creates a standard DNS record instead.

This leads to confusion as users expect weighted/failover routing to be active but see no effect.

More

Yes, this PR title follows Conventional Commits
Yes, I added unit tests
Yes, I updated end user documentation accordingly

…entifier

k8s-ci-robot · 2025-09-03T12:37:55Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign szuecs for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-09-03T12:37:58Z

Hi @u-kai. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

mloiseleur

@u-kai Thanks for this PR to improve userXP.

The use case looks good to me 👍 .

For the implementation, it seems quite inefficient to re-browse all those fields.
Wdyt of adding this debug log inside this loop instead of creating a dedicated loop ? Would it work or did I miss something ?

u-kai · 2025-09-03T13:15:56Z

@mloiseleur

Thank you for the suggestion!
I checked the loop you mentioned, but it's in the records method for reading existing records from Route53,
while my implementation is in the newChange method for creating/updating records.
These serve different purposes.

However, I agree about efficiency.
I can move providerSpecificRequiringSetIdentifier to a variable to avoid recreating it on every call.
This maintains the current approach (single consolidated warning) while improving performance.

mloiseleur · 2025-09-04T06:21:54Z

I can move providerSpecificRequiringSetIdentifier to a variable to avoid recreating it on every call.

That would be a good first step.

I also noticed that GetProviderSpecificProperty is iterating over all specific properties, so it means double loop with O(n*m) complexity.

=> Wdyt of adding an intersect method ?

Using an Hash for providerSpecificRequiringSetIdentifier you may reach O(n*x) complexity, with n between 1 and 2 (source)

u-kai · 2025-09-04T14:06:38Z

@mloiseleur

Thanks — does this implementation look correct?

var providerSpecificRequiringSetIdentifier = []string{
	providerSpecificWeight,
	providerSpecificRegion,
	providerSpecificFailover,
	providerSpecificGeolocationContinentCode,
	providerSpecificGeolocationCountryCode,
	providerSpecificGeolocationSubdivisionCode,
	providerSpecificGeoProximityLocationAWSRegion,
	providerSpecificGeoProximityLocationBias,
	providerSpecificGeoProximityLocationCoordinates,
	providerSpecificGeoProximityLocationLocalZoneGroup,
	providerSpecificMultiValueAnswer,
}

....
	if setIdentifier == "" {
		ignoredProperties := make([]string, 0, len(providerSpecificRequiringSetIdentifier))
		tmpMap := make(map[string]struct{}, len(ep.ProviderSpecific))
		for _, ps := range ep.ProviderSpecific {
			tmpMap[ps.Name] = struct{}{}
		}
		for _, prop := range providerSpecificRequiringSetIdentifier {
			if _, ok := tmpMap[prop]; ok {
				ignoredProperties = append(ignoredProperties, prop)
			}
		}
		if len(ignoredProperties) > 0 {
			log.Warnf("Endpoint %s has provider-specific properties %v that require a setIdentifier, but none was set; ignoring these properties",
				ep.DNSName, ignoredProperties)
		}
	}

It should indeed be faster. However, I have a couple of concerns:

We’re accessing the ProviderSpecific field directly and building a temporary map, rather than calling GetProviderSpecificProperty.
That makes the implementation slightly less idiomatic / a bit harder to follow at first glance.

Given the numbers involved — providerSpecificRequiringSetIdentifier is 11 items and ProviderSpecific is about 15 items for AWS — the absolute work is small, so the practical performance gain is modest even in the worst case.

So this is a readability vs. optimization trade-off.
I’d appreciate your opinion: prefer this small optimization now, or keep the simpler/clearer approach (e.g. keep using GetProviderSpecificProperty or extract a small helper) for better readability?

vflaux · 2025-09-04T14:13:57Z

@u-kai you can check k8s.io/utils/set package. There is an Intersection() method.

u-kai · 2025-09-04T23:51:09Z

@vflaux
Thanks — I didn’t know about that. How about something like this?

if setIdentifier == "" {
	providerSpecificSet := make(set.Set[string], len(ep.ProviderSpecific))
	for _, s := range ep.ProviderSpecific {
		providerSpecificSet.Insert(s.Name)
	}
	ignoredProperties := providerSpecificRequiringSetIdentifier.Intersection(providerSpecificSet)
	if len(ignoredProperties) > 0 {
		pMsg := ignoredProperties.SortedList()
		log.Warnf("Endpoint %s has provider-specific properties %v that require a setIdentifier, but none was set; ignoring these properties",
			ep.DNSName, pMsg)
	}
}

vflaux · 2025-09-05T11:16:56Z

No need to range over ep.ProviderSpecific, there is a constructor:

providerSpecificSet := set.New(ep.ProviderSpecific...)

u-kai · 2025-09-05T13:32:33Z

@vflaux

Thanks! Just to clarify: set.New expects ...string (since it’s defined as func New[E ordered](items ...E) Set[E]). Meanwhile, ep.ProviderSpecific is a slice of structs ([]ProviderSpecificProperty), each with fields like Name and Value. So we can’t pass ep.ProviderSpecific directly to set.New[string](...).

vflaux · 2025-09-05T13:38:53Z

@u-kai You're right, my mistake.

ivankatliarchuk

What I actually see here. This change reveals a design solution where at the moment

// ProviderSpecific holds configuration which is specific to individual DNS providers
type ProviderSpecific []ProviderSpecificProperty

func (e *Endpoint) GetProviderSpecificProperty(key string) (string, bool) {
	for _, providerSpecific := range e.ProviderSpecific {
		if providerSpecific.Name == key {
			return providerSpecific.Value, true
		}
	}
	return "", false
}

func (e *Endpoint) SetProviderSpecificProperty(key string, value string) {
	for i, providerSpecific := range e.ProviderSpecific {
		if providerSpecific.Name == key {
			e.ProviderSpecific[i] = ProviderSpecificProperty{
				Name:  key,
				Value: value,
			}
			return
		}
	}

	e.ProviderSpecific = append(e.ProviderSpecific, ProviderSpecificProperty{Name: key, Value: value})
}

Throughout the codebase, whenever external-dns needs to retrieve a provider-specific property, it performs an iteration. In large environments, this could become a performance bottleneck.

Before adding a warning logg (not something super critical), we should most likely first consider the possible data structures that could better fit ProviderSpecific use case.

ivankatliarchuk · 2025-09-06T07:52:35Z

provider/aws/aws.go


 	setIdentifier := ep.SetIdentifier
+
+	// Check if provider-specific values requiring setIdentifier are present but setIdentifier is empty


Probably there are other design solutions. But could we have it at least a private method, as I see no point to increase complexity as it currently does?

u-kai · 2025-09-06T08:35:48Z

@ivankatliarchuk

You're right, the current slice-based design might not be the most efficient, and switching to a map could be a better fit here.
I'll give it a try to see if I can refactor it that way.
If it works out, I'll open a separate PR for it.

u-kai · 2025-09-06T10:29:09Z

@ivankatliarchuk

I looked into it, and changing ProviderSpecific directly to a map would be a breaking change since it’s part of the CRD schema.

To avoid that, my plan is:

Keep the CRD interface as-is ([]ProviderSpecificProperty) for backward compatibility.

Define a separate internal Endpoint type, where ProviderSpecific is represented as map[string][]string.

Add conversion helpers between the CRD type and the internal type.

This way we can improve performance and readability without introducing a breaking change to users.

Does that sound reasonable to you?

ivankatliarchuk · 2025-09-06T14:59:26Z

Makes sense on paper)))

ivankatliarchuk · 2025-10-02T14:26:36Z

If the other change (the O(1) logic) isn't approved, it doesn't make sense to add a warning message, as that would only increase complexity. So most likely this is going to be on hold

u-kai · 2025-10-02T23:09:55Z

Could you clarify a bit more on your first point? 🙇

If the other change (the O(1) logic) isn't approved, it doesn't make sense to add a warning message, as that would only increase complexity.

I’d like to better understand why the warning message would lose its meaning or increase complexity if the O(1) logic isn’t approved.

From my perspective, based on some quick benchmarks I ran, for our typical workload in external-dns (3–5 ProviderSpecific keys, ~30　membership checks in the AWS provider paths), a simple for loop over the slice is actually faster and simpler than building a set/map.

So my current plan is to adjust this PR to keep the underlying representation as a slice and switch back to a straightforward loop, while still adding the warning log as originally intended.

I’d appreciate a quick sanity check on this assumption. If this looks reasonable, I’d like to proceed in that direction.

ivankatliarchuk · 2025-10-05T15:35:52Z

/hold

mloiseleur · 2025-10-05T19:26:57Z

@u-kai I'll try to explain.

AWS users are only a part of all external dns users.
Users adding specific property weight are only a part of all external AWS users

So let's try to take some high level point of view: Is adding this kind of check for all resources loaded by ExternalDNS is an improvement on UserXP or overengineering ? (ie: adding more complexity to the code base)

Normally, when a required parameter is not there, the API called should fail. In this specific case, The call to AWS API should fail, return an error and so the log would be displayed to the user. Like for all the other error cases with this provider.

We can improve our documentation but like @ivankatliarchuk, I'm not sure we should complexify our code base because of this specific edge case. The AWS APIs are evolving and, maybe this case will return an error in a few months.

But, maybe I missed something. Feel free to share your thoughts.

u-kai · 2025-10-05T23:31:52Z

@mloiseleur

Thanks for the detailed feedback.

Is adding this kind of check for all resources loaded by ExternalDNS an improvement on UserXP or overengineering?

I understand your concern, but just to clarify — this change only applies to the AWS provider, and only when setIdentifier is empty.

You're right that it adds some complexity, but as the issue reporter mentioned, users currently get no warning at all when this happens, which can be confusing. I think this small check improves UX in such cases.
In terms of code complexity, it doesn’t really differ between slice and map implementations — both access the same exposed interface of ProviderSpecific.
I’ve just pushed the latest version, so please take a look and see if the complexity feels acceptable. 🙇
This version no longer uses a set Intersection; based on the earlier benchmarks I shared, iterating over a slice is actually faster for the expected number of ProviderSpecific keys.

Normally, when a required parameter is not there, the API called should fail

Yes, the AWS API already returns an error like below.

 Error: An error occurred (InvalidInput) when calling the ChangeResourceRecordSets
     operation: Invalid request: Missing field 'SetIdentifier' in Change with [Action=CREATE,
     Name=test-failover.example.com, Type=A, SetIdentifier=null]

However, ExternalDNS currently only sets the providerSpecific property when setIdentifier is non-empty, which means the AWS API never gets a chance to validate the invalid case.
If we changed it to always set the property, AWS would indeed return an error — but that would break existing systems that currently succeed under this behavior.

As an alternative approach, in a future version, we could consider setting the property regardless of setIdentifier and let the AWS API error out.
That would make the behavior more explicit, though it might break setups that are currently “working” by accident.

For now, I believe adding a warn-level log strikes a good balance between UX improvement and safety.

feat(aws): add warning for provider-specific properties without setId…

6e920e3

…entifier

k8s-ci-robot requested a review from mloiseleur September 3, 2025 12:37

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 3, 2025

k8s-ci-robot requested a review from szuecs September 3, 2025 12:37

k8s-ci-robot added provider Issues or PRs related to a provider needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 3, 2025

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 3, 2025

mloiseleur reviewed Sep 3, 2025

View reviewed changes

refactor(aws): optimize provider-specific property validation using sets

3947553

u-kai requested a review from mloiseleur September 6, 2025 05:00

ivankatliarchuk reviewed Sep 6, 2025

View reviewed changes

u-kai mentioned this pull request Sep 6, 2025

perf(endpoint): optimize ProviderSpecific to use map for O(1) access #5814

Closed

3 tasks

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 5, 2025

fix set to map and slices

7106c67


		setIdentifier := ep.SetIdentifier

		// Check if provider-specific values requiring setIdentifier are present but setIdentifier is empty

feat(aws): add warning for provider-specific properties without setIdentifier #5799

Are you sure you want to change the base?

feat(aws): add warning for provider-specific properties without setIdentifier #5799

Conversation

u-kai commented Sep 3, 2025

What does it do ?

Motivation

More

Uh oh!

k8s-ci-robot commented Sep 3, 2025

Uh oh!

k8s-ci-robot commented Sep 3, 2025

Uh oh!

mloiseleur left a comment

Choose a reason for hiding this comment

Uh oh!

u-kai commented Sep 3, 2025

Uh oh!

mloiseleur commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

u-kai commented Sep 4, 2025

Uh oh!

vflaux commented Sep 4, 2025

Uh oh!

u-kai commented Sep 4, 2025

Uh oh!

vflaux commented Sep 5, 2025

Uh oh!

u-kai commented Sep 5, 2025

Uh oh!

vflaux commented Sep 5, 2025

Uh oh!

ivankatliarchuk left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivankatliarchuk Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

u-kai commented Sep 6, 2025

Uh oh!

u-kai commented Sep 6, 2025

Uh oh!

ivankatliarchuk commented Sep 6, 2025

Uh oh!

ivankatliarchuk commented Oct 2, 2025

Uh oh!

u-kai commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivankatliarchuk commented Oct 5, 2025

Uh oh!

mloiseleur commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

u-kai commented Oct 5, 2025

Uh oh!

Uh oh!

mloiseleur commented Sep 4, 2025 •

edited

Loading

ivankatliarchuk left a comment •

edited

Loading

u-kai commented Oct 2, 2025 •

edited

Loading

mloiseleur commented Oct 5, 2025 •

edited

Loading