-
Notifications
You must be signed in to change notification settings - Fork 38
Feature/dns skip wait and partial state #1052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Makefile
Outdated
|
|
||
| # GOLANGCI-LINT INSTALLATION | ||
| $(GOLANGCI_LINT): | ||
| curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/master/install.sh | bash -s -- -b bin v$(GOLANGCI_VERSION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running bash scripts blindly from a master branch of another repository is a no-go for me, sorry
Overall, what's the point of this? This whole thing feels wrong to me. For managing development dependencies there are things like dev containers, devenvs, nix flakes, ...
I'm aware we're not providing any of these currently, but this download process inside the Makefile seems pretty hacky to me 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that was a bit too ambitious. You know once you copy it from somewhere you always copy it :P
I replaced it with downloading from the releases which should be secure.
It is actually quite typically to download binaries that are needed to interact with the application (like linting, kubectl, kind, helm, etc) via scripts/make. In many stackit projects that is already the case. And there are also many opensource projects that do similar things like:
- https://github.com/cert-manager/cert-manager/blob/master/Makefile
- https://github.com/crossplane-contrib/provider-upjet-aws/blob/main/Makefile
I guess many ways solve the same problem. Currently my biggest problem is that I cannot lint locally since there are version diffs between my installed golangci lint and the one in the pipeline. Therefore I want to have a make command that runs the same version in the pipeline as in our local env. Some might say that is the shift left approach.
| model.Id = utils.BuildInternalTerraformId(projectId, zoneId, recordSetId) | ||
|
|
||
| // Set all unknown/null fields to null before saving state | ||
| if err := utils.SetModelFieldsToNull(ctx, &model); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I just don't get why one would want to have this. What's the point of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be because of weird clients. Currently we only set project_id and zone_id in the state before waiting. This lead to the following error if the waiting is skipped which some clients want:
"error": "cannot get a terraform workspace for resource: cannot ensure tfstate file: cannot check whether the state is empty: cannot work with a non-string id: <nil>", "errorVerbose": "cannot work with a non-string id: <nil>
So I have set the id as well in the helper function. I think I observed an error in the past that the client got an error because some field in the state were "unknown". But I can no longer find this error message anymore. So currently I get:
apply failed: Provider produced inconsistent result after apply: When applying changes to stackit_dns_zone.example-zone, provider "provider[\"registry.terraform.io/stackitcloud/stackit\"]" produced an unexpected new value: .description: was cty.StringVal("Example DNS zone for demonstration"), but now null.
This is a bug in the provider, which should be reported in the provider's own issue tracker.
Provider produced inconsistent result after apply: When applying changes to stackit_dns_zone.example-zone, provider "provider[\"registry.terraform.io/stackitcloud/stackit\"]" produced an unexpected new value: .dns_name: was cty.StringVal("patrick.test.patrick.patrick"), but now null.
This is a bug in the provider, which should be reported in the provider's own issue tracker.
Provider produced inconsistent result after apply: When applying changes to stackit_dns_zone.example-zone, provider "provider[\"registry.terraform.io/stackitcloud/stackit\"]" produced an unexpected new value: .name: was cty.StringVal("example-zone"), but now null.
This is a bug in the provider, which should be reported in the provider's own issue tracker.
Provider produced inconsistent result after apply: When applying changes to stackit_dns_zone.example-zone, provider "provider[\"registry.terraform.io/stackitcloud/stackit\"]" produced an unexpected new value: .is_reverse_zone: was cty.False, but now null.
This is a bug in the provider, which should be reported in the provider's own issue tracker.
Provider produced inconsistent result after apply: When applying changes to stackit_dns_zone.example-zone, provider "provider[\"registry.terraform.io/stackitcloud/stackit\"]" produced an unexpected new value: .type: was cty.StringVal("primary"), but now null.
This is a bug in the provider, which should be reported in the provider's own issue tracker.
Because of it the client wants to destroy the resource:
"error": "cannot run plan: plan failed: Instance cannot be destroyed: Resource stackit_dns_zone.example-zone has lifecycle.prevent_destroy set, but the plan calls for this resource to be destroyed. To avoid this error and continue with the plan, either disable lifecycle.prevent_destroy or reduce the scope of the plan using the -target flag.", "errorVerbose": "plan failed: Instance cannot be destroyed: Resource stackit_dns_zone.example-zone has lifecycle.prevent_destroy set, but the plan calls for this resource to be destroyed. To avoid this error and continue with the plan, either disable lifecycle.prevent_destroy or reduce the scope of the plan using the -target flag.
For some reason if you set the fields to null instead of unknown the client accepts it and proceeds correctly. Maybe we need to take a look together into the topic. If you have some better ways to handle this case feel free to suggest :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I don't mess things up here, what do you mean with client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crossplane+upjet that then executes terraform cli commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that is a problem with complex objects, list of lists and list of complex objects in the utils function SetModelFieldsToNull.
I also tried adding the same logic as in zone to iaas network and added alot of unit tests to provoke the error and couldn´t reproduce. You can check it here if you want.
Can you provide the input parameters so I can add unit tests for this case to verify if it happens in the implementation or not?
Additionally you can check with in your setup as well if the added functionality resolves the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do you imply that it is perfectly fine to have errors? because if we want to use upjet to generate a crossplane provider we cannot accept such error since it simply does not work :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do you imply that it is perfectly fine to have errors?
Clear no.
because if we want to use upjet to generate a crossplane provider we cannot accept such error since it simply does not work :D
Well, I guess it doesn't work because you modified the code of the terraform provider and didn't understand the impacts of your changes.
I have to start from scratch here: Unknown values are a core concept of Terraform (see https://developer.hashicorp.com/terraform/plugin/framework/handling-data/terraform-concepts#unknown-values). Unknown values are important for Terraform to apply resources in the correct order, ...
But what does this mean for us? After a terraform apply run which creates a new resource, all fields of the resource must be set by the Terraform provider to a value or to null explicitly. If this isn't done for a field of the resource, you will get a message like this:
Whenever you get a message like this it's clear that this is a bug in the Terraform provider. And I'm going to lean myself out of the window here and say this doesn't happen for the stackit_dns_record_set resource on the main branch of our STACKIT Terraform provider repository. 😄
Let me explain why
We create the resource on API side and then use the wait handler.
terraform-provider-stackit/stackit/internal/services/dns/recordset/resource.go
Lines 215 to 235 in b5f82e7
| recordSetResp, err := r.client.CreateRecordSet(ctx, projectId, zoneId).CreateRecordSetPayload(*payload).Execute() | |
| if err != nil || recordSetResp.Rrset == nil || recordSetResp.Rrset.Id == nil { | |
| core.LogAndAddError(ctx, &resp.Diagnostics, "Error creating record set", fmt.Sprintf("Calling API: %v", err)) | |
| return | |
| } | |
| // Write id attributes to state before polling via the wait handler - just in case anything goes wrong during the wait handler | |
| utils.SetAndLogStateFields(ctx, &resp.Diagnostics, &resp.State, map[string]any{ | |
| "project_id": projectId, | |
| "zone_id": zoneId, | |
| "record_set_id": *recordSetResp.Rrset.Id, | |
| }) | |
| if resp.Diagnostics.HasError() { | |
| return | |
| } | |
| waitResp, err := wait.CreateRecordSetWaitHandler(ctx, r.client, projectId, zoneId, *recordSetResp.Rrset.Id).WaitWithContext(ctx) | |
| if err != nil { | |
| core.LogAndAddError(ctx, &resp.Diagnostics, "Error creating record set", fmt.Sprintf("Instance creation waiting: %v", err)) | |
| return | |
| } |
After the wait handler we use the mapFields function to map the API response to the Terraform state model.
terraform-provider-stackit/stackit/internal/services/dns/recordset/resource.go
Lines 237 to 248 in b5f82e7
| // Map response body to schema | |
| err = mapFields(ctx, waitResp, &model) | |
| if err != nil { | |
| core.LogAndAddError(ctx, &resp.Diagnostics, "Error creating record set", fmt.Sprintf("Processing API payload: %v", err)) | |
| return | |
| } | |
| // Set state to fully populated data | |
| diags = resp.State.Set(ctx, model) | |
| resp.Diagnostics.Append(diags...) | |
| if resp.Diagnostics.HasError() { | |
| return | |
| } |
Now comes the important part: Here is the section in the mapFields function, which makes sure all fields of the resource get set to a value or null. [1]
terraform-provider-stackit/stackit/internal/services/dns/recordset/resource.go
Lines 432 to 445 in b5f82e7
| model.Id = utils.BuildInternalTerraformId( | |
| model.ProjectId.ValueString(), model.ZoneId.ValueString(), recordSetId, | |
| ) | |
| model.RecordSetId = types.StringPointerValue(recordSet.Id) | |
| model.Active = types.BoolPointerValue(recordSet.Active) | |
| model.Comment = types.StringPointerValue(recordSet.Comment) | |
| model.Error = types.StringPointerValue(recordSet.Error) | |
| if model.Name.IsNull() || model.Name.IsUnknown() { | |
| model.Name = types.StringPointerValue(recordSet.Name) | |
| } | |
| model.FQDN = types.StringPointerValue(recordSet.Name) | |
| model.State = types.StringValue(string(recordSet.GetState())) | |
| model.TTL = types.Int64PointerValue(recordSet.Ttl) | |
| model.Type = types.StringValue(string(recordSet.GetType())) |
Well, and after that the model struct must be persisted in the Terraform state (this doesn't happen automatically):
terraform-provider-stackit/stackit/internal/services/dns/recordset/resource.go
Lines 243 to 248 in b5f82e7
| // Set state to fully populated data | |
| diags = resp.State.Set(ctx, model) | |
| resp.Diagnostics.Append(diags...) | |
| if resp.Diagnostics.HasError() { | |
| return | |
| } |
To sum it up, here's what happens in the main branch implementation of this resource:
- Create request for the API resource
- (Write id fields to the state in case anything goes wrong during the wait handler)
- Wait handler to wait for creation of the API resource to complete
- Map API response to Terraform resource model struct (
mapFields) - Persist the Terraform model struct of the resource in the Terraform state
Now to your changes
Now to your changes and why it's not working (without setting all fields to null using your new reflection-powered util func):
In your func (r *recordSetResource) Create(...) ... implementation...
- You also do the Create request for the API resource (see no. 1 above)
- You write the id fields to the state (see no 2. above)
- And then you jump out of the
Createimplementation of the Terraform resource prematurely with the code below.
if !utils.ShouldWait() {
tflog.Info(ctx, "Skipping wait; async mode for Crossplane/Upjet")
return
}The problem is: This doesn't only skip the wait handler (no. 3 above), but also the mapFields func call (no. 4 above) which (as said) sets explicitly all values to a value or null.
Again, you just skip this. This is a core part of the resource implementation. You don't call it. That's why Terraform complains about unknown values. Terraform says this is a bug in the provider implementation, and it's correct.
But it's sadly not a bug in our implementation on the main branch, but in your implementation.
You circumvent this problem by setting all fields of the Terraform resource state model explicitly to null by using your new util func. This circumvents the problem (Terraform doesn't complain anymore about unknown values), but it doesn't really fix the problem (at least not in a clean way).
In fact setting all fields of the Terraform resource model struct to null circumvents existing checks of Terraform which we want to take advantage of during our resource implementations (at least for pure Terraform usage, without thinking of crossplane here).
[1] Btw, if you forget to set one field of the Terraform resource model struct to a value of null here during the implementation of the Terraform resource you will also get exactly the error After the apply operation, the provider still indicated an unknown value... from above. This is what I consider a terraform feature. As said, unknown values are a concept of Terraform
| "errors" | ||
| "fmt" | ||
| "os" | ||
| "reflect" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its an avoid and not strictly dont :P
As mentioned above we may need to set all fields in the model to null instead of unknown. I don´t write a function in every single resource that does that since it may be quite error prone as it is repetetive and every single a new api field is introduced we should not forget that we need to set the field to null as well.
Therefore I attempted to create one function that sets all fields of a model no null if they are unknown. So we can reuse it in all resources.
If you have a better idea how to achieve the goal feel free to suggest. Depending on the outcome of the discussion above it may not be needed as well.

Description
I want to get the initial buy in to skip the wait handlers (needed for some client libraries) and set the state in the Create implementation of the interface to the model null values + ids. The current implementation throws errors that the model has attributes that are "unknown" for some clients.
Checklist
make fmtexamples/directory)make generate-docs(will be checked by CI)make test(will be checked by CI)make lint(will be checked by CI)