externalTrafficPolicy: Local via BGP #10537

amayacitta · 2025-06-08T20:27:02Z

amayacitta
Jun 8, 2025

hey guys,

I'm wanting to use externalTrafficPolicy: Local for optimised endpoints, however when I do this with service type LoadBalancer the /32 external address, which is dished out from a calico loadbalancer ip pool, continues to be advertised from all nodes, the traffic only routes to the pod if i constrain BGP manually to only peer with the node which is running the pod.

Here is an example:

kubectl get pod,svc,ep -o wide

NAME                                     READY   STATUS    RESTARTS   AGE     IP            NODE                                NOMINATED NODE 
pod/apache-deployment-757c6c5857-5cdzq   1/1     Running   0          7h48m   10.42.193.7   rke2-test-worker-pool-58fmt-hzt7v   

NAME                     TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE     SELECTOR
service/apache-service   LoadBalancer   10.43.37.73   10.44.0.9     80:30676/TCP    7h48m   app=web

NAME                       ENDPOINTS                                                  AGE
endpoints/apache-service   10.42.193.7:80                                             7h48m

kubectl get nodes -o wide

NAME                                STATUS   ROLES                       AGE    VERSION          INTERNAL-IP     EXTERNAL-IP   OS-IMAGE               KERNEL-VERSION     CONTAINER-RUNTIME
rke2-test-system-pool-k6zmb-2mfxs   Ready    control-plane,etcd,master   22h    v1.32.5+rke2r1   10.254.32.106   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
rke2-test-system-pool-k6zmb-xv6ns   Ready    control-plane,etcd,master   46h    v1.32.5+rke2r1   10.254.32.102   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
rke2-test-system-pool-k6zmb-zglrg   Ready    control-plane,etcd,master   31h    v1.32.5+rke2r1   10.254.32.104   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
rke2-test-worker-pool-58fmt-hzt7v   Ready    worker                      22h    v1.32.5+rke2r1   10.254.32.105   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
rke2-test-worker-pool-58fmt-mv52b   Ready    worker                      2d1h   v1.32.5+rke2r1   10.254.32.100   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1

This is the routing table on the TOR for the service address of 10.44.0.9.

B       10.44.0.9/32 [20/0] via 10.254.32.100 (recursive is directly connected, DC1-RANCHER-K8S), 00:15:24, [1/0]
                     [20/0] via 10.254.32.102 (recursive is directly connected, DC1-RANCHER-K8S), 00:15:24, [1/0]
                     [20/0] via 10.254.32.104 (recursive is directly connected, DC1-RANCHER-K8S), 00:15:24, [1/0]
                     [20/0] via 10.254.32.105 (recursive is directly connected, DC1-RANCHER-K8S), 00:15:24, [1/0]
                     [20/0] via 10.254.32.106 (recursive is directly connected, DC1-RANCHER-K8S), 00:15:24, [1/0]

If I constrain BGP on the TOR to only communicate with node rke2-test-worker-pool-58fmt-hzt7v on 10.254.32.105 then everything works fine. Also if I change the service to externalTrafficPolicy: Cluster then it works fine, as the endpoints are on all nodes.

Reading this page says "The nodes with a pod backing the service advertise a specific route (/32 or /128) to the service's IP." but it doesn't appear to work for me.

I'm running the below versions, any ideas?

calicoctl version
Client Version:    v3.30.1
Git commit:        393b14e72
Cluster Version:   v3.30.0
Cluster Type:      k8s,operator,bgp,kdd,typha

Answered by amayacitta

Jun 13, 2025

I just wanted to be a little clearer on what exactly worked for me, the below configuration is the only way I got it working.

BGP RR on ToR
nodeSelector filter to only include workers
Disabled default iBGP mesh

With meshing enabled, the control plane node that has the service IP address directly on eth0, is re-advertised back to the ToR. Interestingly, "keepOriginalNextHop: true" made no difference to the received routes, on the ToR. This was for some reason ignored for the directly connected IP on eth0 on the control plane node.

It's a pretty basic setup I think. The question arises for me, can you configure BIRD via Calico to ignore locally connected addresses on eth0? What effect doe…

View full answer

caseydavenport · 2025-06-09T20:34:26Z

caseydavenport
Jun 9, 2025
Maintainer

You may be hitting one of these scenarios: #7512, #3810

The TL;DR is that if you have full-mesh enabled between your Calico nodes, they will advertise the Service LoadBalancer IPs to each other, and may re-advertise those IPs - swapping themselves for the next hop - depending on your BGP topology.

Fixes include:

Disabling node to node mesh in Calico, and instead using your ToR to distribute routes without the mesh.
Setting keepOriginalNextHop: true in your BGPPeer objects, so that Calico doesn't replace the next hop on re-advertisement to the ToR.

0 replies

amayacitta · 2025-06-09T23:07:09Z

amayacitta
Jun 9, 2025
Author

Cheers for assist, though it still isnt quite right. I tried dwith mesh disabled with route reflector on ToR and seperately with keepOriginalNextHop: true. None produce the right result.

Here is what I see with keepOriginalNextHop: true. ToR has no route reflector and meshing is enabled.

NAME                                     READY   STATUS    RESTARTS   AGE     IP             NODE                                NOMINATED NODE   READINESS GATES
pod/apache-deployment-6c79984869-mwwhs   1/1     Running   0          3m51s   10.42.193.12   rke2-test-worker-pool-58fmt-hzt7v   <none>           <none>

NAME                     TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE     SELECTOR
service/apache-service   LoadBalancer   10.43.185.64   10.44.0.13    80:30291/TCP    3m51s   app=web
service/kubernetes       LoadBalancer   10.43.0.1      10.44.0.5     443:32051/TCP   3d4h    <none>

NAME                       ENDPOINTS                                                  AGE
endpoints/apache-service   10.42.193.12:80                                            3m51s
endpoints/kubernetes       10.254.32.102:6443,10.254.32.104:6443,10.254.32.106:6443   3d4h

NAME                                     STATUS   ROLES                       AGE     VERSION          INTERNAL-IP     EXTERNAL-IP   OS-IMAGE               KERNEL-VERSION     CONTAINER-RUNTIME
node/rke2-test-system-pool-k6zmb-2mfxs   Ready    control-plane,etcd,master   2d1h    v1.32.5+rke2r1   10.254.32.106   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-system-pool-k6zmb-xv6ns   Ready    control-plane,etcd,master   3d1h    v1.32.5+rke2r1   10.254.32.102   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-system-pool-k6zmb-zglrg   Ready    control-plane,etcd,master   2d10h   v1.32.5+rke2r1   10.254.32.104   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-worker-pool-58fmt-hzt7v   Ready    worker                      2d1h    v1.32.5+rke2r1   10.254.32.105   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-worker-pool-58fmt-mv52b   Ready    worker                      3d4h    v1.32.5+rke2r1   10.254.32.100   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1

B       10.44.0.13/32 [20/0] via 10.254.32.105 (recursive is directly connected, DC1-RANCHER-K8S), 00:07:04, [1/0]
                      [20/0] via 10.254.32.104 [4] (recursive is directly connected, DC1-RANCHER-K8S), 00:07:04, [1/0]

*  10.44.0.13/32    10.254.32.104   0                      0        0 64512 i <-/->
*                   10.254.32.104   0                      0        0 64512 i <-/->
*                   10.254.32.104   0                      0        0 64512 i <-/->
*                   10.254.32.104   0                      0        0 64512 i <-/->
*>                  10.254.32.105   0                      0        0 64512 i <-/1>

I should see routes to 10.254.32.105 only, but I am also seeing 4 from 10.254.32.104. I restarted all bgp processes, deleted the calico node on 10.254.32.104 but it came back and continued to advertise on the node that does not run the pod for the service. Hmm.

0 replies

caseydavenport · 2025-06-10T15:42:14Z

caseydavenport
Jun 10, 2025
Maintainer

*> 10.254.32.105 0 0 0 64512 i <-/1>

Does not *> mean this is the selected route? In the case where mesh is still enabled, you will still see advertisement from all nodes, but they should all resolve to the same node(s) hosting the service.

A route reflector is going to behave much the same as a full-mesh, as it's job is to distribute routes within the AS and effectively replace the need for a full mesh, so I'd expect similar results when using a RR.

3 replies

amayacitta Jun 10, 2025
Author

Yeah *> means best path but as multipath eBGP is enabled, its actually allowing all 5 paths into the routing table, and only one of them is to the correct node - hence the behaviour is random. If I disable multipath eBGP then the BGP best path algorithm is respected and only 10.254.32.105 is used.

However BGP best path as described here - you have to go through the algorithm for each prefix in the RIB.

In my example they all share the same attributes on points 1-9.
10 is very subjective to the environment, it will depend when the calico deamonset pods came up - or if doing a hard reset on BGP for example, they are not all going to learnt exactly at the same time.
11 how does calico set the router ids? are they bound to loopbacks or are they the k8 host node IP?
12 might not be relevant if 11 or 10

Weight
Local Pref
Prefer the path that was locally originated via a network or aggregate BGP subcommand or through redistribution from an IGP
Shortest AS Path
Lowest Origin Type
MED
eBGP over iBGP
iBGP metric
Determine if multiple paths require installation in the routing table for BGP Multipath
external path learnt time (oldest wins)
router ID
If the originator or router ID is the same for multiple paths, prefer the path with the minimum cluster list length. This is only present in BGP RR environments. It allows clients to peer with RRs or clients in other clusters. In this scenario, the client must be aware of the RR-specific BGP attribute.
Lowest neighbor address

I set it all up again, I have multipath eBGP + keepOriginalNextHop: true and mesh enabled. Here are results:

Kubectl outputs

NAME                                                  ADDRESSTYPE   PORTS   ENDPOINTS                                   AGE
endpointslice.discovery.k8s.io/apache-service-pvsc2   IPv4          80      10.42.193.12                                22h
endpointslice.discovery.k8s.io/kubernetes             IPv4          6443    10.254.32.102,10.254.32.104,10.254.32.106   4d2h

NAME                                     READY   STATUS    RESTARTS   AGE   IP             NODE                                NOMINATED NODE   READINESS GATES
pod/apache-deployment-6c79984869-mwwhs   1/1     Running   0          22h   10.42.193.12   rke2-test-worker-pool-58fmt-hzt7v   <none>           <none>

NAME                                     STATUS   ROLES                       AGE     VERSION          INTERNAL-IP     EXTERNAL-IP   OS-IMAGE               KERNEL-VERSION     CONTAINER-RUNTIME
node/rke2-test-system-pool-k6zmb-2mfxs   Ready    control-plane,etcd,master   2d23h   v1.32.5+rke2r1   10.254.32.106   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-system-pool-k6zmb-xv6ns   Ready    control-plane,etcd,master   3d23h   v1.32.5+rke2r1   10.254.32.102   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-system-pool-k6zmb-zglrg   Ready    control-plane,etcd,master   3d8h    v1.32.5+rke2r1   10.254.32.104   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-worker-pool-58fmt-hzt7v   Ready    worker                      2d23h   v1.32.5+rke2r1   10.254.32.105   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-worker-pool-58fmt-mv52b   Ready    worker                      4d2h    v1.32.5+rke2r1   10.254.32.100   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1

NAME                     TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE    SELECTOR
service/apache-service   LoadBalancer   10.43.185.64   10.44.0.13    80:30291/TCP    22h    app=web
service/kubernetes       LoadBalancer   10.43.0.1      10.44.0.5     443:32051/TCP   4d2h   <none>

BGP RIB

*  10.44.0.13/32    10.254.32.105   0                      0        0 64512 i <-/->
*                   10.254.32.105   0                      0        0 64512 i <-/->
*                   10.254.32.106   0                      0        0 64512 i <-/->
*                   10.254.32.105   0                      0        0 64512 i <-/->
*>                  10.254.32.105   0                      0        0 64512 i <-/1>

BGP routing table, as we have eBGP mpath we have all 5

B       10.44.0.13/32 [20/0] via 10.254.32.105 [4] (recursive is directly connected, DC1-RANCHER-K8S), 00:24:05, [1/0]
                      [20/0] via 10.254.32.106 (recursive is directly connected, DC1-RANCHER-K8S), 00:24:05, [1/0]

The correct node should be 10.254.32.105, for whatever reason I have an advertisement from 10.254.32.106. Given the commonality of advertised prefixes from the calico deamonset pods, I'm not convinced we we can rely on BGP best path, hence if i disable multipath then it may select the 10.254.32.106 path.

Any ideas how I can debug whats happening on node 10.254.32.106 for it to be advertising a prefix it "shouldnt"?

caseydavenport Jun 10, 2025
Maintainer

11 how does calico set the router ids? are they bound to loopbacks or are they the k8 host node IP?

Should be the k8s host IP - so they won't be the same across nodes.

Any ideas how I can debug whats happening on node 10.254.32.106 for it to be advertising a prefix it "shouldnt"?

Could take a look at the BIRD configuration file inputs perhaps? They should be located within calico/node at /etc/calico/confd/config and are responsible for configuring which static routes are exported - specifically bird_aggr.cfg if I recall correctly. The next hop setting is in bird.cfg

You can also use birdc to access the BIRD CLI interface, as an option, to look at BGP imports / exports.

Calico definitely shouldn't be advertising routes for Local type services on different nodes, and if next hop keep is set we shouldn't be replacing the original NH.

amayacitta Jun 10, 2025
Author

Should be the k8s host IP - so they won't be the same across nodes.

If this is the case we 100% cant rely on the best path algorithm as it will pick the lowest node address as best, which could be any node over time as things are built, upgraded, replaced etc.

I found a tool inside the calico pod which shows info and explains why, see comments below.

amayacitta · 2025-06-10T21:11:23Z

amayacitta
Jun 10, 2025
Author

ok so here is why, but i dont get it.

calico node pod nodes as reference

calico-node-ckwkp                          1/1     Running   2 (3d8h ago)    3d23h   10.254.32.102   rke2-test-system-pool-k6zmb-xv6ns   <none>           <none>
calico-node-mgf7l                          1/1     Running   0               2d23h   10.254.32.106   rke2-test-system-pool-k6zmb-2mfxs   <none>           <none>
calico-node-ptdrd                          1/1     Running   1 (2d22h ago)   2d23h   10.254.32.105   rke2-test-worker-pool-58fmt-hzt7v   <none>           <none>
calico-node-splqf                          1/1     Running   0               22h     10.254.32.104   rke2-test-system-pool-k6zmb-zglrg   <none>           <none>
calico-node-xh28t                          1/1     Running   5 (9h ago)      3d23h   10.254.32.100   rke2-test-worker-pool-58fmt-mv52b   <none>           <none>

output from calico-node -show-status on the pods
For some reason it believes that the external IP of the service (10.44.0.13) is directly connected to 10.254.32.106.
Not sure why, this is crazy.

rke2-test-system-pool-k6zmb-2mfxs:~ # kubectl exec -n calico-system calico-node-ckwkp -- calico-node -show-status | grep 10.44.0.13
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), install-cni (init)
| 10.44.0.13/32    | 10.254.32.105 | eth0            | Mesh_10_254_32_105   | *       |
| 10.44.0.13/32    | 10.254.32.106 | eth0            | Mesh_10_254_32_106   |         |
rke2-test-system-pool-k6zmb-2mfxs:~ # kubectl exec -n calico-system calico-node-mgf7l -- calico-node -show-status | grep 10.44.0.13
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), install-cni (init)
| 10.44.0.13/32    | N/A           | eth0            | direct1              | *       |
| 10.44.0.13/32    | 10.254.32.105 | eth0            | Mesh_10_254_32_105   |         |
rke2-test-system-pool-k6zmb-2mfxs:~ # kubectl exec -n calico-system calico-node-ptdrd -- calico-node -show-status | grep 10.44.0.13
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), install-cni (init)
| 10.44.0.13/32    | N/A           | blackhole       | static1              | *       |
| 10.44.0.13/32    | 10.254.32.106 | eth0            | Mesh_10_254_32_106   |         |
rke2-test-system-pool-k6zmb-2mfxs:~ # kubectl exec -n calico-system calico-node-splqf -- calico-node -show-status | grep 10.44.0.13
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), install-cni (init)
| 10.44.0.13/32    | 10.254.32.105 | eth0            | Mesh_10_254_32_105   | *       |
| 10.44.0.13/32    | 10.254.32.106 | eth0            | Mesh_10_254_32_106   |         |
rke2-test-system-pool-k6zmb-2mfxs:~ # kubectl exec -n calico-system calico-node-xh28t -- calico-node -show-status | grep 10.44.0.13
Defaulted container "calico-node" out of: calico-node, flexvol-driver (init), install-cni (init)
| 10.44.0.13/32    | 10.254.32.105 | eth0            | Mesh_10_254_32_105   | *       |
| 10.44.0.13/32    | 10.254.32.106 | eth0            | Mesh_10_254_32_106   |         |

0 replies

amayacitta · 2025-06-10T21:23:14Z

amayacitta
Jun 10, 2025
Author

so this is why but i dont know why, if you get me. The calico assigned external IP of 10.44.0.13/32 has been placed onto node 10.254.32.106. This is locally attached.

The pod behind the service is on node 10.254.32.105.

Calico is advertising a directly attached address, along side the endpointslice/pod behind the service. Is this a bug?

ip config from node 10.25.32.106

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether ca:84:31:14:59:2f brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet 10.254.32.106/24 brd 10.254.32.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 10.44.0.13/32 scope global eth0
       valid_lft forever preferred_lft forever

config of the svc etc from kubectl to map it

NAME                                                  ADDRESSTYPE   PORTS   ENDPOINTS                                   AGE
endpointslice.discovery.k8s.io/apache-service-pvsc2   IPv4          80      10.42.193.12                                22h
endpointslice.discovery.k8s.io/kubernetes             IPv4          6443    10.254.32.102,10.254.32.104,10.254.32.106   4d3h

NAME                                     READY   STATUS    RESTARTS   AGE   IP             NODE                                NOMINATED NODE   READINESS GATES
pod/apache-deployment-6c79984869-mwwhs   1/1     Running   0          22h   10.42.193.12   rke2-test-worker-pool-58fmt-hzt7v   <none>           <none>

NAME                                     STATUS   ROLES                       AGE     VERSION          INTERNAL-IP     EXTERNAL-IP   OS-IMAGE               KERNEL-VERSION     CONTAINER-RUNTIME
node/rke2-test-system-pool-k6zmb-2mfxs   Ready    control-plane,etcd,master   2d23h   v1.32.5+rke2r1   10.254.32.106   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-system-pool-k6zmb-xv6ns   Ready    control-plane,etcd,master   3d23h   v1.32.5+rke2r1   10.254.32.102   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-system-pool-k6zmb-zglrg   Ready    control-plane,etcd,master   3d8h    v1.32.5+rke2r1   10.254.32.104   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-worker-pool-58fmt-hzt7v   Ready    worker                      2d23h   v1.32.5+rke2r1   10.254.32.105   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1
node/rke2-test-worker-pool-58fmt-mv52b   Ready    worker                      4d2h    v1.32.5+rke2r1   10.254.32.100   <none>        SUSE Linux Micro 6.1   6.4.0-19-default   containerd://2.0.5-k3s1

NAME                     TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE    SELECTOR
service/apache-service   LoadBalancer   10.43.185.64   10.44.0.13    80:30291/TCP    22h    app=web
service/kubernetes       LoadBalancer   10.43.0.1      10.44.0.5     443:32051/TCP   4d3h   <none>

0 replies

amayacitta · 2025-06-12T23:10:36Z

amayacitta
Jun 12, 2025
Author

Solution for me, was to combine calico bgp peers with nodeSelectors, to exclude the control plane nodes who are hosting the direct IP of the service type load balancer on eth0 (which is being advertised out directly).

I now correctly see paths to only those nodes whom are are hosting pods that are part of the service. externalTrafficPolicy: Local is now functioning correctly.

Does sound like the documentation needs a bit of an edit to explain this better. As out the box as its described it doesn't "just work" - or its not a simple as the documentation suggests at least.

2 replies

caseydavenport Jun 12, 2025
Maintainer

Thanks for the detailed write up @amayacitta

to exclude the control plane nodes who are hosting the direct IP of the service type load balancer on eth0 (which is being advertised out directly).

Do you mean that the control plane nodes have the LB IP address assigned to eth0, and calico/node was advertising it as a directly connected route as a result?

Does sound like the documentation needs a bit of an edit to explain this better

Yes, I agree the docs could use improvement. The right configuration is going to vary based on each user's BGP topology, and other factors and it should explain the various tools available.

amayacitta Jun 12, 2025
Author

Do you mean that the control plane nodes have the LB IP address assigned to eth0, and calico/node was advertising it as a directly connected route as a result?

Yes exactly, i've excluded that by not peering with them. Which to be fair has other challenges, like I do want to advertise some control plane services too - however for that I can do nodeSelector peering with control plane and then use BGPFilters to be very specific about what I advertise.

Yes, I agree the docs could use improvement. The right configuration is going to vary based on each user's BGP topology, and other factors and it should explain the various tools available.

Yeah there are many permutations. Happy to help where I can.

amayacitta · 2025-06-13T07:07:36Z

amayacitta
Jun 13, 2025
Author

I just wanted to be a little clearer on what exactly worked for me, the below configuration is the only way I got it working.

BGP RR on ToR
nodeSelector filter to only include workers
Disabled default iBGP mesh

With meshing enabled, the control plane node that has the service IP address directly on eth0, is re-advertised back to the ToR. Interestingly, "keepOriginalNextHop: true" made no difference to the received routes, on the ToR. This was for some reason ignored for the directly connected IP on eth0 on the control plane node.

It's a pretty basic setup I think. The question arises for me, can you configure BIRD via Calico to ignore locally connected addresses on eth0? What effect does that have on the wider system? Given the fact I've tailored the BGP routing to effectively ignore the directly connected IP on eth0, it seems it wont have a negative effect - well for services of type LoadBalancer at least. Most of the interfaces we care about are on cali* interfaces - why is this external load balancer ip on eth0?

It might be worthy of a deeper investiation to better align things. Let me know if you want any more infomation than we already have,

Calico Config

apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
  name: default
spec:
  calicoNetwork:
    ipPools:
    - name: default-ipv4-ippool
      blockSize: 26
      cidr: 10.42.0.0/16
      encapsulation: None
      natOutgoing: Disabled
      nodeSelector: all()
---
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  name: default
spec:
  bpfConnectTimeLoadBalancing: TCP
  bpfEnabled: false
  bpfHostNetworkedNATWithoutCTLB: Enabled
  bpfLogLevel: ""
  defaultEndpointToHostAction: Drop
  featureDetectOverride: ChecksumOffloadBroken=true
  floatingIPs: Disabled
  healthPort: 9099
  logSeverityScreen: Info
  logSeveritySys: Info
  nftablesMode: Disabled
  reportingInterval: 0s
  vxlanVNI: 4096
  wireguardEnabled: false
  xdpEnabled: true
---
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
  name: default
spec: {}
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: lab-tor
spec:
  peerIP: 10.254.32.254
  asNumber: 65010
---
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  serviceClusterIPs:
  - cidr: 10.43.0.0/16
  serviceLoadBalancerIPs:
  - cidr: 10.44.0.0/24
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
  name: default-ipv4-ippool
spec:
  allowedUses:
  - Workload
  - Tunnel
  assignmentMode: Automatic
  blockSize: 26
  cidr: 10.42.0.0/16
  ipipMode: Never
  nodeSelector: all()
  vxlanMode: Never
---
apiVersion: projectcalico.org/v3
kind: IPPool
metadata:
 name: loadbalancer-ip-pool
spec:
 cidr: 10.44.0.0/24
 blockSize: 24
 natOutgoing: false
 disabled: false
 assignmentMode: Automatic
 ipipMode: Never
 vxlanMode: Never
 allowedUses:
  - LoadBalancer
---
apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: labfw-worker
spec:
  peerIP: 10.254.32.254
  asNumber: 65010
  nodeSelector: ( node-role.kubernetes.io/worker == 'true' )

Basic config of deployment and service - my namings and use of container chop and change a lot, so forgive me for it not being overly tidy.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: apache-deployment-01
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web-01
  template:
    metadata:
      labels:
        app: web-01
    spec:
      containers:
      - name: nginx
        image: nginx:latest          
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: apache-service-01
  labels: 
    app: web-01
spec:
  type: LoadBalancer
  externalTrafficPolicy: Local		
  internalTrafficPolicy: Cluster
  ports:
  - protocol: TCP
    port: 80
  selector:
    app: web-01

4 replies

caseydavenport Jun 13, 2025
Maintainer

Most of the interfaces we care about are on cali* interfaces - why is this external load balancer ip on eth0?

This is the question that I had as well - this isn't something that Calico would be programming, so it must be coming from somewhere else. I assumed it was there intentionally behind it but perhaps not!

caseydavenport Jun 13, 2025
Maintainer

Calico does support configurable BGP export filters, which I would expect could be used to prevent export of this IP - but I think the question is still "Why was that IP on eth0 in the first place?"

amayacitta Jun 14, 2025
Author

ok so any new service using the load balancer pool in calico populates eth0, its not external or from before.

10.44.0.0/24 is the ip pool and the 4 addresses are from the ip pool - but why? :(

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether fe:90:15:63:f1:77 brd ff:ff:ff:ff:ff:ff
    altname enp1s0
    inet 10.254.32.104/24 brd 10.254.32.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 10.44.0.5/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.44.0.2/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.44.0.16/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 10.44.0.17/32 scope global eth0
       valid_lft forever preferred_lft forever

I did find an ingnoredInterface flag in the BGPConfiguration resource, so have added eth0. I'm hoping that will stop them being advertised, will be running a test shortly to confirm, however I'd like to work out why its attaching them to eth0 on a control plane node.

apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
  name: default
spec:
  serviceClusterIPs:
  - cidr: 10.43.0.0/16
  serviceLoadBalancerIPs:
  - cidr: 10.44.0.0/24
  ignoredInterfaces: 
  - eth0

amayacitta Jun 14, 2025
Author

ok so results are in, if I add eth0 to the ignore list - then i get no advertisements of LoadBalancer services. So it appears Calico uses eth0 as part of its mechanism to publish the serviceLoadBalancerIPs.

That is a proper chicken and egg situation. I presume you dont expect this?

Is it time to reference this as a bug so it can be properly investigated?

Project Calico

externalTrafficPolicy: Local via BGP #10537

Uh oh!

Uh oh!

amayacitta Jun 8, 2025

Replies: 7 comments · 9 replies

Uh oh!

caseydavenport Jun 9, 2025 Maintainer

Uh oh!

amayacitta Jun 9, 2025 Author

Uh oh!

caseydavenport Jun 10, 2025 Maintainer

Uh oh!

amayacitta Jun 10, 2025 Author

Uh oh!

caseydavenport Jun 10, 2025 Maintainer

Uh oh!

amayacitta Jun 10, 2025 Author

Uh oh!

amayacitta Jun 10, 2025 Author

Uh oh!

amayacitta Jun 10, 2025 Author

Uh oh!

amayacitta Jun 12, 2025 Author

Uh oh!

caseydavenport Jun 12, 2025 Maintainer

Uh oh!

Uh oh!

amayacitta Jun 12, 2025 Author

Uh oh!

Uh oh!

amayacitta Jun 13, 2025 Author

Uh oh!

caseydavenport Jun 13, 2025 Maintainer

Uh oh!

caseydavenport Jun 13, 2025 Maintainer

Uh oh!

amayacitta Jun 14, 2025 Author

Uh oh!

amayacitta Jun 14, 2025 Author

amayacitta
Jun 8, 2025

Replies: 7 comments 9 replies

caseydavenport
Jun 9, 2025
Maintainer

amayacitta
Jun 9, 2025
Author

caseydavenport
Jun 10, 2025
Maintainer

amayacitta Jun 10, 2025
Author

caseydavenport Jun 10, 2025
Maintainer

amayacitta Jun 10, 2025
Author

amayacitta
Jun 10, 2025
Author

amayacitta
Jun 10, 2025
Author

amayacitta
Jun 12, 2025
Author

caseydavenport Jun 12, 2025
Maintainer

amayacitta Jun 12, 2025
Author

amayacitta
Jun 13, 2025
Author

caseydavenport Jun 13, 2025
Maintainer

caseydavenport Jun 13, 2025
Maintainer

amayacitta Jun 14, 2025
Author

amayacitta Jun 14, 2025
Author