Skip to content

Conversation

@kernel-patches-daemon-bpf
Copy link

Pull request for series with
subject: Make TC BPF helpers preserve skb metadata
version: 3
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1015900

Lay groundwork for fixing BPF helpers available to TC(X) programs.

When skb_push() or skb_pull() is called in a TC(X) ingress BPF program, the
skb metadata must be kept in front of the MAC header. Otherwise, BPF
programs using the __sk_buff->data_meta pseudo-pointer lose access to it.

Introduce a helper that moves both metadata and a specified number of
packet data bytes together, suitable as a drop-in replacement for
memmove().

Signed-off-by: Jakub Sitnicki <[email protected]>
pskb_expand_head() copies headroom, including skb metadata, into the newly
allocated head, but then clears the metadata. As a result, metadata is lost
when BPF helpers trigger an skb head reallocation.

Let the skb metadata remain in the newly created copy of head.

Signed-off-by: Jakub Sitnicki <[email protected]>
Currently bpf_dynptr_from_skb_meta() marks the dynptr as read-only when
the skb is cloned, preventing writes to metadata.

Remove this restriction and unclone the skb head on bpf_dynptr_write() to
metadata, now that the metadata is preserved during uncloning. This makes
metadata dynptr consistent with skb dynptr, allowing writes regardless of
whether the skb is cloned.

Signed-off-by: Jakub Sitnicki <[email protected]>
All callers ignore the return value.

Prepare to reorder memmove() after skb_pull() which is a common pattern.

Signed-off-by: Jakub Sitnicki <[email protected]>
Use the metadata-aware helper to move packet bytes after skb_pull(),
ensuring metadata remains valid after calling the BPF helper.

Signed-off-by: Jakub Sitnicki <[email protected]>
Use the metadata-aware helper to move packet bytes after skb_push(),
ensuring metadata remains valid after calling the BPF helper.

Also, take care to reserve sufficient headroom for metadata to fit.

Signed-off-by: Jakub Sitnicki <[email protected]>
bpf_skb_adjust_room() may push or pull bytes from skb->data. In both cases,
skb metadata must be moved accordingly to stay accessible.

Replace existing memmove() calls, which only move payload, with a helper
that also handles metadata. Reserve enough space for metadata to fit after
skb_push.

Signed-off-by: Jakub Sitnicki <[email protected]>
bpf_skb_change_proto reuses the same headroom operations as
bpf_skb_adjust_room, already updated to handle metadata safely.

The remaining step is to ensure that there is sufficient headroom to
accommodate metadata on skb_push().

Signed-off-by: Jakub Sitnicki <[email protected]>
Although bpf_skb_change_head() doesn't move packet data after skb_push(),
skb metadata still needs to be relocated. Use the dedicated helper to
handle it.

Signed-off-by: Jakub Sitnicki <[email protected]>
Move metadata verification into the BPF TC programs. Previously,
userspace read metadata from a map and verified it once at test end.

Now TC programs compare metadata directly using __builtin_memcmp() and
set a test_pass flag. This enables verification at multiple points during
test execution rather than a single final check.

Signed-off-by: Jakub Sitnicki <[email protected]>
Add diagnostic output when metadata verification fails to help with
troubleshooting test failures. Introduce a check_metadata() helper that
prints both expected and received metadata to the BPF program's stderr
stream on mismatch. The userspace test reads and dumps this stream on
failure.

Signed-off-by: Jakub Sitnicki <[email protected]>
Since pskb_expand_head() no longer clears metadata on unclone, update tests
for cloned packets to expect metadata to remain intact.

Also simplify the clone_dynptr_kept_on_{data,meta}_slice_write tests.
Creating an r/w dynptr slice is sufficient to trigger an unclone in the
prologue, so remove the extraneous writes to the data/meta slice.

Signed-off-by: Jakub Sitnicki <[email protected]>
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_vlan_push() and bpf_skb_vlan_pop(), which modify the packet
headroom.

Signed-off-by: Jakub Sitnicki <[email protected]>
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_adjust_room(), which modifies the packet headroom and can trigger
head reallocation.

The helper expects an Ethernet frame carrying an IP packet so switch test
packet identification by source MAC address since we can no longer rely on
Ethernet proto being set to zero.

Signed-off-by: Jakub Sitnicki <[email protected]>
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_head() and bpf_skb_change_tail(), which modify packet
headroom/tailroom and can trigger head reallocation.

Signed-off-by: Jakub Sitnicki <[email protected]>
Add a test to verify that skb metadata remains accessible after calling
bpf_skb_change_proto(), which modifies packet headroom to accommodate
different IP header sizes.

Signed-off-by: Jakub Sitnicki <[email protected]>
@kernel-patches-daemon-bpf
Copy link
Author

Upstream branch: e758657
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1015900
version: 3

@kernel-patches-review-bot
Copy link

commit fabe6b25026c20ad312b1eefd3b3f95f55b19b96
Author: Jakub Sitnicki <[email protected]>

selftests/bpf: Dump skb metadata on verification failure

This commit adds diagnostic output to help troubleshoot BPF selftest
failures by printing expected vs received metadata when verification
fails.

> diff --git a/tools/testing/selftests/bpf/progs/test_xdp_meta.c b/tools/testing/selftests/bpf/progs/test_xdp_meta.c
> index 11288b20f..74d7e2aab 100644
> --- a/tools/testing/selftests/bpf/progs/test_xdp_meta.c
> +++ b/tools/testing/selftests/bpf/progs/test_xdp_meta.c
> @@ -27,6 +27,23 @@ static const __u8 meta_want[META_SIZE] = {
>  	0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
>  };
>
> +static bool check_metadata(const char *file, int line, __u8 *meta_have)
> +{
> +	if (!__builtin_memcmp(meta_have, meta_want, META_SIZE))
> +		return true;
> +
> +	bpf_stream_printk(BPF_STREAM_STDERR,
> +			  "FAIL:%s:%d: metadata mismatch\n"
> +			  "  have:\n    %pI6\n    %pI6\n"
> +			  "  want:\n    %pI6\n    %pI6\n",
> +			  file, line,
> +			  &meta_have[0x00], &meta_have[0x10],
> +			  &meta_want[0x00], &meta_have[0x10]);
> +	return false;
> +}

Is the last argument correct in check_metadata()? The format string
shows "want:\n    %pI6\n    %pI6\n" but the fourth argument is
&meta_have[0x10] instead of &meta_want[0x10]. This would print the
received metadata in the "want" section, making the diagnostic output
confusing when troubleshooting test failures.



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: selftests/bpf: Dump skb metadata on verification failure
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/18819281165

@kernel-patches-daemon-bpf
Copy link
Author

Forwarding comment 3448594380 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

@kernel-patches-review-bot
Copy link

commit 8056f67d49fa399e08d13005af8c60e5f6261d0c
Author: Jakub Sitnicki <[email protected]>

bpf: Make bpf_skb_vlan_push helper metadata-safe

This patch updates __vlan_insert_inner_tag() to use metadata-aware
helpers and reserve headroom for metadata. However, the metadata
protection appears incomplete for the mac_len <= ETH_TLEN case.

> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
> index 4ecc2509b..f7f34eb15 100644
> --- a/include/linux/if_vlan.h
> +++ b/include/linux/if_vlan.h
> @@ -355,16 +355,17 @@ static inline int __vlan_insert_inner_tag(struct sk_buff *skb,
>  					  __be16 vlan_proto, u16 vlan_tci,
>  					  unsigned int mac_len)
>  {
> +	const u8 meta_len = mac_len > ETH_TLEN ? skb_metadata_len(skb) : 0;

When mac_len <= ETH_TLEN (e.g., mac_len=0 as seen in net/nsh/nsh.c),
meta_len is set to 0 even if metadata exists on the skb. Does this mean
insufficient headroom is reserved for metadata?

Consider this scenario: an skb with metadata passes through NSH
encapsulation (which sets mac_len=0) and later reaches
vlan_insert_inner_tag()->__vlan_insert_inner_tag(). The metadata length
calculation would return 0, but the skb still has metadata.

>  	struct vlan_ethhdr *veth;
>
> -	if (skb_cow_head(skb, VLAN_HLEN) < 0)
> +	if (skb_cow_head(skb, meta_len + VLAN_HLEN) < 0)
>  		return -ENOMEM;
>
>  	skb_push(skb, VLAN_HLEN);

With meta_len=0, skb_cow_head() reserves only VLAN_HLEN bytes. If the
current headroom already exceeds VLAN_HLEN (common case), no expansion
occurs. Then skb_push() moves skb->data backward by VLAN_HLEN, potentially
into metadata space.

>
>  	/* Move the mac header sans proto to the beginning of the new header. */
>  	if (likely(mac_len > ETH_TLEN))
> -		memmove(skb->data, skb->data + VLAN_HLEN, mac_len - ETH_TLEN);
> +		skb_postpush_data_move(skb, VLAN_HLEN, mac_len - ETH_TLEN);

When mac_len <= ETH_TLEN, skb_postpush_data_move() is not called, so
metadata is not relocated. But skb_push() already moved skb->data into
metadata space, and subsequent writes to skb->data area (lines 372-388)
would corrupt the metadata. Can this corrupt metadata when mac_len is
small but metadata exists?

The commit message from 596212ca3 (net: Preserve metadata on
pskb_expand_head) notes: "If you skb_push() the start of the buffer after
reallocating the header, call skb_postpush_data_move() first to move the
metadata out of the way before writing to &sk_buff->data." This suggests
metadata protection is needed regardless of mac_len value.

Should the metadata length be calculated unconditionally as:
    const u8 meta_len = skb_metadata_len(skb);

>  	if (skb_mac_header_was_set(skb))
>  		skb->mac_header -= VLAN_HLEN;



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: bpf: Make bpf_skb_vlan_push helper metadata-safe
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/18819281165

@kernel-patches-review-bot
Copy link

commit 596212ca387db4047fa024bc7367827033d7b4e7
Author: Jakub Sitnicki <[email protected]>

net: Preserve metadata on pskb_expand_head

This patch removes skb_metadata_clear() from pskb_expand_head() to
preserve XDP metadata when the skb head is reallocated. It also adds
documentation warning that skb_postpush_data_move() must be called if
skb_push() is used after reallocation.

> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 6be01454f..b4fa9aa2d 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -2218,6 +2218,10 @@ EXPORT_SYMBOL(__pskb_copy_fclone);
>   *
>   *	All the pointers pointing into skb header may change and must be
>   *	reloaded after call to this function.
> + *
> + *	Note: If you skb_push() the start of the buffer after reallocating the
> + *	header, call skb_postpush_data_move() first to move the metadata out of
> + *	the way before writing to &sk_buff->data.
>   */
>
>  int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
> @@ -2289,8 +2293,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
>  	skb->nohdr    = 0;
>  	atomic_set(&skb_shinfo(skb)->dataref, 1);
>
> -	skb_metadata_clear(skb);
> -
>  	/* It is not generally safe to change skb->truesize.
>  	 * For the moment, we really care of rx path, or
>  	 * when skb is orphaned (not attached to a socket).

The documentation warns about the new requirement, but there appear to be
existing code paths that violate it. Can metadata be corrupted in the
following locations?

In net/ipv6/seg6_iptunnel.c, the seg6_do_srh() function in
SEG6_IPTUN_MODE_L2ENCAP mode:

    if (pskb_expand_head(skb, skb->mac_len, 0, GFP_ATOMIC) < 0)
        return -ENOMEM;

    skb_mac_header_rebuild(skb);
    skb_push(skb, skb->mac_len);

After pskb_expand_head() now preserves metadata, skb_mac_header_rebuild()
sets the mac header to skb->data - mac_len and copies mac_len bytes there
via memmove(). If XDP metadata exists immediately before skb->data (as
expected by skb_data_move() which checks meta_end + len == skb->data),
doesn't this memmove overwrite the metadata?

Similarly in net/ipv6/exthdrs.c, the ipv6_rthdr_rcv() function when
processing compressed routing headers:

    if (pskb_expand_head(skb, sizeof(struct ipv6hdr) +
                         ((chdr->hdrlen + 1) << 3), 0, GFP_ATOMIC)) {
        [ ... ]
    }

    [ ... ]

    skb_push(skb, ((chdr->hdrlen + 1) << 3) + sizeof(struct ipv6hdr));
    skb_reset_network_header(skb);
    skb_mac_header_rebuild(skb);
    skb_set_transport_header(skb, sizeof(struct ipv6hdr));

    memmove(ipv6_hdr(skb), oldhdr, sizeof(struct ipv6hdr));
    memcpy(skb_transport_header(skb), chdr, (chdr->hdrlen + 1) << 3);

This follows the same pattern: pskb_expand_head(), then skb_push(), then
skb_mac_header_rebuild() which writes to the headroom where metadata would
be located. The subsequent memmove/memcpy operations would also write to
this area.

Both code paths can be reached when forwarding packets that had XDP
metadata set at ingress. Neither calls skb_postpush_data_move() as the
new documentation suggests.



AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: net: Preserve metadata on pskb_expand_head
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/18819281165

@kernel-patches-daemon-bpf
Copy link
Author

Forwarding comment 3448597695 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

@kernel-patches-daemon-bpf
Copy link
Author

Forwarding comment 3448598543 via email
In-Reply-To: [email protected]
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant