Skip to content

Conversation

StevenPG
Copy link

I will raise an issue to discuss. This is my proposed solution to the issue I've run into.

While deploying the DSS into an AWS VPC, the networking components are named in such a way that there can only be one running instance. Subsequent deployments result in 400 errors on the Service objects in Kubernetes with a "duplicate name" error.

This can be confusing for a developer, since the terraform is designed in such a way that two environments can be deployed.

Copy link
Contributor

@barroco barroco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StevenPG many thanks for this contribution.

You will find some comments inline.

Please request a new review once ready.

@StevenPG StevenPG force-pushed the deploy-multiple-dss-single-vpc-allow-unique-networking-names branch from 38536f0 to 34d494b Compare May 27, 2025 13:58
@StevenPG
Copy link
Author

StevenPG commented May 27, 2025

Testing helm template without new variables in ee48780

    service.beta.kubernetes.io/aws-load-balancer-name: dss-gateway-external
    service.beta.kubernetes.io/aws-load-balancer-name: cockroach-db-external-node

with new variables

dss:
  image: docker.io/interuss/dss:v0.15.0 # See https://hub.docker.com/r/interuss/dss/tags for official image releases.
  # When running local images in minikube, uncomment the following line
  # imagePullPolicy: Never
  gatewayId: myGatewayIdentifier-703ecd1d-b1bf-4f35-a80d-c7654e41437d
...
cockroachdb:
  conf:
    join: []
    cockroachDbLoadbalancerName: cockroach-db-node-bfb5d477-1f44-48b6-a711-0db21845b1b4

Updated values

    service.beta.kubernetes.io/aws-load-balancer-name: myGatewayIdentifier-703ecd1d-b1bf-4f35-a80d-c7654e41437d
    service.beta.kubernetes.io/aws-load-balancer-name: cockroach-db-node-bfb5d477-1f44

@StevenPG StevenPG requested a review from barroco May 27, 2025 13:58
@StevenPG StevenPG force-pushed the deploy-multiple-dss-single-vpc-allow-unique-networking-names branch from 34d494b to ee48780 Compare May 27, 2025 14:04
…SS can be deployed in one VPC

Set old values for new configurable vars to be default, moved documentation into schema file

Update with the missing ordinal value and moved it to the front to avoid truncation issues
@StevenPG StevenPG force-pushed the deploy-multiple-dss-single-vpc-allow-unique-networking-names branch from 6e654ca to d8750be Compare May 27, 2025 17:54
Copy link
Contributor

@barroco barroco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@StevenPG, thanks for the changes. Please find new comments inline.

@@ -12,7 +13,7 @@ metadata:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
{{- include (printf "%s-lb-crdb-annotations" $cloudProvider)
(dict
"name" (printf "%s-%s" "cockroach-db-external-node" ( $i | toString) )
"name" (printf "%s-%s" ( $i | toString) $cockroachDbLoadbalancerName | trunc 31)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the arguments have been inverted. May I kindly ask you to keep the name and index in the same order to ensure backward compatibility?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, after a second thought, it would be preferable that the helm chart fails if the name is too long instead of truncating it silently. Could you please remove the trunc ?

Copy link
Author

@StevenPG StevenPG May 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add a consideration @barroco having too long of a value actually fails quietly and can be difficult to ascertain.

For example. with too long of a value, the service still gets deployed to kubernetes
image

However, when you describe it, you do see the issue in the events that keeps the load balancer from being deployed. In AWS, you just won't see a LB be created.
image

This was the reason for the trunaction, and the name and index were re-ordered to decrease possible issues from the truncation.

I just wanted to clarify what you meant by "it would be preferable that the helm chart fails". If the successful publishing but failed deployment to AWS is still acceptable, then I can continue with the requested changes! Please let me know if that's the case.

Copy link
Contributor

@barroco barroco May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. Indeed, that's not ideal if the helm installation do not fail and the problem is not immediately detectable. An alternative would be to enforce the max length in the values.schema.json definition with the field "maxLength".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 3341b21

@@ -35,6 +35,10 @@
"description": "Name of CockroachDB cluster",
"type": "string"
},
"cockroachDbLoadbalancerName": {
"description": "In AWS, used to name the load balancer. This must be overridden to run multiple DSS instances in a single region.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is directly linked to the loadbalancer, I would suggest to move this information in loadbalancers.cockroachdbNodes with the description: Optional and AWS only: Load balancer name of this Cockroach DB node. This must be overridden to run multiple DSS instances in a single region. (default: cockroach-db-external-node-X where X is the index of this node)

@@ -155,6 +159,10 @@
"type": "string",
"description": "Image of the DSS. Please note that the usage of the `latest` tag is discouraged to prevent accidental upgrades in case of restart. Example: `docker.io/interuss/dss:v0.15.0`. Official image releases: https://hub.docker.com/r/interuss/dss/tags"
},
"gatewayId": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is directly linked to the load balancer definition, I would suggest to move this property to loadbalancers.dssGateway with the description: Optional and AWS only: Load balancer name of the DSS Gateway. This must be overridden to run multiple DSS instances in a single region. (default: cockroach-db-external-node-X where X is the index)

@@ -13,7 +14,7 @@ metadata:
{{- include (printf "%s-ingress-dss-gateway-annotations" $cloudProvider)
(merge .
(dict
"name" "dss-gateway-external"
"name" ($gatewayId | trunc 63)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a second thought, It would be preferable that the helm chart fails if the name is too long instead of truncating it silently. Could you please remove the trunc ?

@StevenPG
Copy link
Author

StevenPG commented May 30, 2025

@barroco Updated in 3341b21

Debugging the changes were much easier when the names of the lb components were the same, so I mirrored them.

@@ -104,6 +104,11 @@
"description": "Load balancers configuration",
"type": "object",
"properties": {
"cockroachDbLoadBalancerId": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was expecting it to be part of the object on line 111 as a property id along the ip and the subnet.

@@ -12,21 +13,21 @@ metadata:
service.alpha.kubernetes.io/tolerate-unready-endpoints: "true"
{{- include (printf "%s-lb-crdb-annotations" $cloudProvider)
(dict
"name" (printf "%s-%s" "cockroach-db-external-node" ( $i | toString) )
"name" (printf "%s-%s" $cockroachDbLoadbalancerName ( $i | toString))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the default value entirely. This would look something like that :

Suggested change
"name" (printf "%s-%s" $cockroachDbLoadbalancerName ( $i | toString))
"name" ($lb.id | default (printf "cockroach-db-external-node-%s" ( $i | toString)))

@@ -125,6 +130,11 @@
"dssGateway": {
"type": "object",
"properties": {
"loadBalancerId": {
"type": "string",
"description": "Optional and AWS only: Load balancer name of the DSS Gateway. This must be overridden to run multiple DSS instances in a single region. (default: cockroach-db-external-node-X where X is the index)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for the inaccurate proposal, here is a small correction:

Suggested change
"description": "Optional and AWS only: Load balancer name of the DSS Gateway. This must be overridden to run multiple DSS instances in a single region. (default: cockroach-db-external-node-X where X is the index)",
"description": "Optional and AWS only: Load balancer name of the DSS Gateway. This must be overridden to run multiple DSS instances in a single region. (default: dss-gateway-external)",

@barroco
Copy link
Contributor

barroco commented May 30, 2025

@barroco Updated in 3341b21

Debugging the changes were much easier when the names of the lb components were the same, so I mirrored them.

I need to think a bit more about this. Could you please revert it for the moment ? Happy to review this in another PR.

edit: The down side of doing this is that we would lose the mapping with the underlying cockroachdb pod in the statefulset. So the question would be, what is the most useful in terms of visibility:
Mapping of the internal resources or with external ? I suggest that we discuss this in an issue.

@StevenPG
Copy link
Author

StevenPG commented Jun 2, 2025

@barroco I created a companion issue here! #1189

I'll link this thread and make an initial comment there

@barroco
Copy link
Contributor

barroco commented Jun 3, 2025

@StevenPG, just to be clear, my remark was only related to mirroring the resources name with the load balancer. The rest is valid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants