Skip to content

Linux Network Devices #4538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 19, 2025
Merged

Linux Network Devices #4538

merged 3 commits into from
Jun 19, 2025

Conversation

aojea
Copy link
Contributor

@aojea aojea commented Nov 21, 2024

Implementation of opencontainers/runtime-spec#1271

It implements the new proposal to the OCI spec to be able to specify Network Devices that get attached detached from the containers (updated to match the merged proposal opencontainers/runtime-spec#1271)

@aojea aojea force-pushed the netdevices branch 2 times, most recently from 07d3b0b to 3833056 Compare December 2, 2024 15:40
Copy link

@kad kad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We are also interested in this use case for our accelerator devices.

@aojea aojea force-pushed the netdevices branch 2 times, most recently from 67f12e0 to d114afe Compare December 12, 2024 07:15
@aojea aojea force-pushed the netdevices branch 2 times, most recently from ec90a02 to f4f5d02 Compare December 20, 2024 12:11
@aojea aojea force-pushed the netdevices branch 2 times, most recently from 735f9d5 to ce1f612 Compare January 14, 2025 09:47
@aojea aojea force-pushed the netdevices branch 4 times, most recently from 6262c5e to c530772 Compare February 6, 2025 09:55
@aojea aojea force-pushed the netdevices branch 5 times, most recently from 4380e86 to f53d263 Compare February 10, 2025 21:27
@rata
Copy link
Member

rata commented Feb 25, 2025

@aojea friendly reminder that this should be ready soon, we are cutting 1.3.0-rc.1 soon (maybe this week). The spec part is still not sure when it will be merged?

@lifubang
Copy link
Member

I'm not familiar with those operations but I'm happy to investigate more if you provide some guidance

diff --git a/tests/integration/checkpoint.bats b/tests/integration/checkpoint.bats
index 3db34061..7608e4ff 100644
--- a/tests/integration/checkpoint.bats
+++ b/tests/integration/checkpoint.bats
@@ -2,14 +2,34 @@
 
 load helpers
 
+function create_netns() {
+       # Create a temporary name for the test network namespace.
+       tmp=$(mktemp -u)
+       ns_name=$(basename "$tmp")
+
+       # Create the network namespace.
+       ip netns add "$ns_name"
+       ns_path=$(ip netns add "$ns_name" 2>&1 | sed -e 's/.*"\(.*\)".*/\1/')
+}
+
+function delete_netns() {
+       # Delete the namespace only if the ns_name variable is set.
+       [ -v ns_name ] && ip netns del "$ns_name"
+}
+
 function setup() {
        # XXX: currently criu require root containers.
        requires criu root
 
        setup_busybox
+
+       # Create a dummy interface to move to the container.
+       ip link add dummy0 type dummy
 }
 
 function teardown() {
+       ip link del dev dummy0
+       delete_netns
        teardown_bundle
 }
 
@@ -100,10 +120,16 @@ function runc_restore_with_pipes() {
 }
 
 function simple_cr() {
+       # Tell runc which network namespace to use.
+       # create_netns
+       # update_config '(.. | select(.type? == "network")) .path |= "'"$ns_path"'"'
+       update_config ' .linux.netDevices |= {"dummy0": {} }'
        runc run -d --console-socket "$CONSOLE_SOCKET" test_busybox
        [ "$status" -eq 0 ]
 
        testcontainer test_busybox running
+       runc exec test_busybox ip address show dev dummy0
+       [ "$status" -eq 0 ]
 
        for _ in $(seq 2); do
                # checkpoint the running container
@@ -119,6 +145,8 @@ function simple_cr() {
 
                # busybox should be back up and running
                testcontainer test_busybox running
+               runc exec test_busybox ip address show dev dummy0
+               [ "$status" -eq 0 ]
        done
 }

@aojea
Copy link
Contributor Author

aojea commented May 29, 2025

@lifubang at the cost of duplicating code but to improve test errors troubleshooting I duplicated the test cases so we can have simple_cr and simple_cr_with_netdevice, if there is a problem with the netdevice logic then we can spot it very easy since will only affect ones and no the others

@aojea
Copy link
Contributor Author

aojea commented May 29, 2025

failed job with

ssh: Could not resolve hostname localhost: Name or service not known

@lifubang
Copy link
Member

lifubang commented May 29, 2025

  • ci / test (actuated-arm64-6cpu-8gb, 1.24.x, rootless) (pull_request)

@alexellis May I ask your help here, is there some special changes cause we can't use localhost in actuated-arm64-6cpu-8gb? Could you have a suggestion to help us to resolve this issue? 🙏

It seems that the failure occurred from 4 days ago. Please see: https://github.com/opencontainers/runc/actions/runs/15232466902/job/42841958687

The other solution is to change ssh rootless@localhost to ssh [email protected].

@lifubang
Copy link
Member

lifubang commented Jun 3, 2025

The rootless test on the arm64 architecture continues to fail, opened a new issue to track: #4776.

@alexellis
Copy link

Hi @lifubang happy to help but, we do not provide any support for actuated via GitHub - only by Slack. I've only just seen these mentions.

The other solution is to change ssh rootless@localhost to ssh [email protected].

That sounds like a better solution. I always prefer 127.0.0.1 over "localhost" - especially on systems with IPv6.. sometimes those resolutions will hang indefinitely.

I can't think of a reason why localhost wouldn't resolve off the top of my head, but you can explore the VM image in an SSH session and poke around. That's the best way - access it here - https://docs.actuated.com/tasks/debug-ssh/

You can also create a dummy repo and job and run a command like cat /etc/hosts to see if the entries you expect are present.

Support for rootless containers is built into the Kernel, are there any specific CONFIG_ settings that you typically need?

Alex

@alexellis
Copy link

alexellis commented Jun 3, 2025

What is the context in which this command is being run? On the host directly, in a container?

There is no entry in /etc/hosts if that is being parsed? But I am seeing resolution work.

Can you try a test in your build of adding 127.0.0.1 localhost to your /etc/hosts file for this job?

Perhaps conditionally if needed? Something like this should work

       - name: Add entry to hosts
          if: ${{ runner.arch != 'ARM64' }}
          run: |
             echo "127.0.0.1 localhost" | sudo tee /etc/hosts
runner@87db36ae71e63a0c3a141cd9d597246cd6aaa628:~$ ssh localhost
The authenticity of host '(localhost (::1)' can't be established.
ED25519 key fingerprint is SHA256:adNuHUaOs2yQacN8ERU4t5+OzdmE5ONsE9nNzgkPKQw.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? 

runner@87db36ae71e63a0c3a141cd9d597246cd6aaa628:~$ ping -c1 localhost
PING localhost(localhost (::1)) 56 data bytes
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.038 ms

--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.038/0.038/0.038/0.000 ms
runner@87db36ae71e63a0c3a141cd9d597246cd6aaa628:~$ nslookup localhost
Server:		127.0.0.53
Address:	127.0.0.53#53

Name:	localhost
Address: 127.0.0.1
Name:	localhost
Address: ::1

runner@87db36ae71e63a0c3a141cd9d597246cd6aaa628:~$ cat /etc/hosts
runner@87db36ae71e63a0c3a141cd9d597246cd6aaa628:~$ cat /etc/hostname
87db36ae71e63a0c3a141cd9d597246cd6aaa628
runner@87db36ae71e63a0c3a141cd9d597246cd6aaa628:~$ curl localhost:80
curl: (7) Failed to connect to localhost port 80 after 0 ms: Connection refused
runner@87db36ae71e63a0c3a141cd9d597246cd6aaa628:~$ curl -4 localhost:80
curl: (7) Failed to connect to localhost port 80 after 0 ms: Connection refused
runner@87db36ae71e63a0c3a141cd9d597246cd6aaa628:~$ 

If I had 127.0.0.1 localhost, then I see a resolution to IPv4 (exactly what I was describing above)

Seems like you should be more explicit - if you have only bound SSH to 127.0.0.1 over IPv4 - the default is to resolve to the IPv6 loopback, so your workaround is probably the correct solution ssh 127.0.0.1

@lifubang
Copy link
Member

lifubang commented Jun 4, 2025

Thanks, @alexellis!

ip address add "$global_ip" dev dummy0

# Tell runc which network namespace to use.
create_netns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only use the network namespace created by runc, this test will fail.
I think maybe there is no such scenario, so we can let it to be implemented in the future if someone needs it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang can you elaborate? How do you expect this to fail exactly? Also, tests are green, is that unexpected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rata Please see:
#4538 (comment)

It's all about the netns managed only by runc, not managed by the high-level container runtimes.
It means that we didn't specific the net ns path in config.json.

@alexellis
Copy link

I've put 127.0.0.1 first in /etc/hosts so if you'd still like to use ssh localhost - that should work now.

@lifubang
Copy link
Member

lifubang commented Jun 5, 2025

I've put 127.0.0.1 first in /etc/hosts so if you'd still like to use ssh localhost - that should work now.

Thanks, Alex. It's indeed work now, but very strange, I can't see your mentioned change:

+ cat /etc/hosts
localhost 936ea945c6b4019ff071f4088aea0c722b77ddc1
::1 localhost ip6-localhost ip6-loopback
+ cat /etc/resolv.conf
nameserver 8.8.8.8
nameserver 1.1.1.1
options edns0

And I see the sshd_config, the config about listener looks like this:

#Port 22
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::

So, maybe the ssh listened on all ipv4 and ipv6 now?

aojea added 3 commits June 18, 2025 15:52
Signed-off-by: Antonio Ojea <[email protected]>
Implement support for passing Linux Network Devices to the container
network namespace.

The network device is passed during the creation of the container,
before the process is started.

It implements the logic defined in the OCI runtime specification.

Signed-off-by: Antonio Ojea <[email protected]>
@aojea
Copy link
Contributor Author

aojea commented Jun 18, 2025

kindly reminder @rata and/or @kolyshkin 😄

@aojea
Copy link
Contributor Author

aojea commented Jun 18, 2025

     exec_test.go:1747: extra fd 25 -> 
    exec_test.go:1750: found 1 extra fds after container.Run
--- FAIL: TestFdLeaks (0.19s)

does not look related

@rata
Copy link
Member

rata commented Jun 18, 2025

Yeap, doesn't seem related. I'm triggered a re-run

Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea Thanks for the PR and the patience with the reviews. LGTM :)

Sorry for the late review, I was AFK.

I'll wait for @lifubang to comment on my question (or if @kolyshkin wants to have a look). I'll aim to merge tomorrow, unless @lifubang opposes. Worst case, we can improve it in a follow-up. Can I count on you @aojea if it's needed? :)

ip address add "$global_ip" dev dummy0

# Tell runc which network namespace to use.
create_netns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lifubang can you elaborate? How do you expect this to fail exactly? Also, tests are green, is that unexpected?

@aojea
Copy link
Contributor Author

aojea commented Jun 18, 2025

Can I count on you @aojea if it's needed? :)

@rata I consider myself accountable for all code I add to any project, so you can count on me for maintaining it forever

@rata rata merged commit 82fe6e2 into opencontainers:main Jun 19, 2025
44 of 45 checks passed
@aojea
Copy link
Contributor Author

aojea commented Jun 19, 2025

thank you very much folks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants