The XDP(eXpress Data Path) is a powerful new network feature in Linux based on eBPF, that enables high-performance programmable access to networking packets before they enter the networking stack. But it also has a high start curve to learn, many developers have written introduction blogs for this feature. e.g. Paolo Abeni’s Using XDP in RHEL8 and Toke’s Using XDP in RHEL8.

As the XDP is a new technology and is still in fast-moving. The eBPF/XDP coding format and style are also changing. So developers are trying to make it easy to write. Some tools/frameworks are created to help develop BPF applications. e.g. libbpf, xdp-tools which we will use in this blog.

In this blog, I will introduce how to start with XDP program with 4 steps:

  1. Write a small program to drop everything to show what a simple XDP program looks like
    • What a samll program looks like
    • How to build and dump bpf object
    • How to load bpf object
    • How to show bpf programs
    • How to unload bpf object
  2. Extend the program to let you know how to deal with specific packets
  3. Use a packet counter to show how to use BPF maps
  4. Add a userspace tool to show how to load the BPF program with customer tools

The reader needs to be familiar with C code and IP header structures. All examples are tested with RHEL-8.3

The article should take you about 30m.

Prerequisites

As a preparation of the developing environment. We should install the following packages.

$ sudo dnf install clang llvm gcc libbpf libbpf-devel libxdp libxdp-devel xdp-tools bpftool kernel-headers

Task 1: XDP: Drop everything

Task 1.1: Write a Hello World program

First, let’s start with a simple XDP program in C:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>

SEC("xdp_drop")
int xdp_drop_prog(struct xdp_md *ctx)
{
	return XDP_DROP;
}

char _license[] SEC("license") = "GPL";

The linux/bpf.h is provided by the kernel-header package, which defines all the supported bpf helpers and xdp_actions, like XDP_DROP in this example.

The bpf/bpf_helpers.h is provided by libbpf-devel, which provided some useful eBPF macros, like SEC in this example. The SEC, short for section, is used to place a fragment of the compiled object in different ELF sections, which we could see in the later llvm-objdump result.

In function xdp_drop_prog() we have a parameter struct xdp_md *ctx, which is not used yet. I will talk about it later. This function returns XDP_DROP, which means we will drop all incoming packets. There are also some other XDP actions like XDP_PASS, XDP_TX, XDP_REDIRECT.

Finally, the last line formally specifies the license associated with this program. Some eBPF helpers are accessible only by GPL licensed programs, and the verifier will use this info to enforce such a restriction.

Task 1.2: Build and dump the bpf object

After the programming, let’s build it with clang: $ clang -O2 -g -Wall -target bpf -c xdp_drop.c -o xdp_drop.o

-O Specify which optimization level to use, -g could generate debug information.

You can use llvm-objdump to show the ELF format after build. With -h you can see the sections in the object. With -S it will display the source interleaved with the disassembly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ llvm-objdump -h xdp_drop.o

xdp_drop:       file format ELF64-BPF

Sections:
Idx Name            Size     VMA              Type
  0                 00000000 0000000000000000
  1 .strtab         000000ad 0000000000000000
  2 .text           00000000 0000000000000000 TEXT
  3 xdp_drop        00000010 0000000000000000 TEXT
  4 license         00000004 0000000000000000 DATA
  5 .debug_str      00000125 0000000000000000
  6 .debug_abbrev   000000ba 0000000000000000
  7 .debug_info     00000114 0000000000000000
  8 .rel.debug_info 000001c0 0000000000000000
  9 .BTF            000001df 0000000000000000
 10 .rel.BTF        00000010 0000000000000000
 11 .BTF.ext        00000050 0000000000000000
 12 .rel.BTF.ext    00000020 0000000000000000
 13 .eh_frame       00000030 0000000000000000 DATA
 14 .rel.eh_frame   00000010 0000000000000000
 15 .debug_line     00000084 0000000000000000
 16 .rel.debug_line 00000010 0000000000000000
 17 .llvm_addrsig   00000002 0000000000000000
 18 .symtab         000002d0 0000000000000000

$ llvm-objdump -S -no-show-raw-insn xdp_drop.o

xdp_drop:       file format ELF64-BPF


Disassembly of section xdp_drop:

0000000000000000 xdp_drop_prog:
;       return XDP_DROP;
       0:       r0 = 1
       1:       exit

The llvm-objdump is very useful if you want to know what the object does without source code.

Task 1.3: load the bpf object

Warn: do not load it on default interface, please use veth interface for testing.

After building the object, there are multiple ways to load the object. The easiest way to load it is using ip cmd, like $ sudo ip link set veth1 xdpgeneric obj xdp_drop.o sec xdp_drop

But the ip cmd doesn’t support the BTF(BPF Type Format) type map that we will talk later. Although the latest ip-next has fixed this issue by adding libbpf support, it’s not backport to main line yet.

The recommand way to load XDP object on RHEL is using xdp-loader. Not only because it depends on libbpf which has full BTF support, but also that’s the only way we can get multiple programs on one interface.

To get Red Hat support for XDP feature, please use libxdp based on document XDP is conditionally supported in RHEL 8.3 release note

Now let’s load it the object on interface veth1 with xdp-loader, -m sbk means to use skb mode. There are some other modes like native, offload. As these modes are not supported on all NIC driver, we just use skb mode in this article. -s xdp_drop specify to use the section xdp_drop.

$ sudo xdp-loader load -m skb -s xdp_drop veth1 xdp_drop.o

Task 1.4: show the XDP programs

There are also multi ways to show the XDP program we just load:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ sudo xdp-loader status
CURRENT XDP PROGRAM STATUS:

Interface        Prio  Program name     Mode     ID   Tag               Chain actions
-------------------------------------------------------------------------------------
lo               <no XDP program>
ens3             <no XDP program>
veth1                  xdp_dispatcher   skb      15   d51e469e988d81da
 =>              50    xdp_drop_prog             20   57cd311f2e27366b  XDP_PASS
$ sudo bpftool prog show
15: xdp  name xdp_dispatcher  tag d51e469e988d81da  gpl
        loaded_at 2021-01-13T03:24:43-0500  uid 0
        xlated 616B  jited 638B  memlock 4096B  map_ids 8
        btf_id 12
20: ext  name xdp_drop_prog  tag 57cd311f2e27366b  gpl
        loaded_at 2021-01-13T03:24:43-0500  uid 0
        xlated 16B  jited 40B  memlock 4096B
        btf_id 16
$ sudo ip link show veth1
4: veth1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdpgeneric qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ba:4d:98:21:3b:b3 brd ff:ff:ff:ff:ff:ff link-netns ns
    prog/xdp id 15 tag d51e469e988d81da jited

If you load with ip cmd, there will only 1 XDP program load. If you load with xdp-loader, there will be 2 programs loaded by default. One is xdp_dispatcher, created by xdp-loader, another is xdp_drop_prog, write by us.

From the above result, we can see the xdp_dispatcher with id 15 and xdp_drop_prog with id 20.

Task 1.5: unload the XDP programs

If you use ip cmd to load the program, you can use $ sudo ip link set veth1 xdpgeneric off to unload the program. Note you need to use Correspond xdp flag to unload, e.g. use xdpgeneric off if you load it by ip link set veth1 xdpgeneric obj ... or xdp off if you load it by ip link set veth1 xdp obj ...

Or we can just unload all the programs by xdp-loader, -a means to unload all the programs on the interface: $ sudo xdp-loader unload -a veth1

Task 2: XDP: Drop specific packets

The first example will drop every packet. It is just an intro example and doesn’t have any real meaning. Now let’s do some real stuff. Let’s try to drop some specific packets. Like IPv6 packets.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <linux/if_ether.h>
#include <arpa/inet.h>


SEC("xdp_drop")
int xdp_drop_prog(struct xdp_md *ctx)
{
	void *data_end = (void *)(long)ctx->data_end;
	void *data = (void *)(long)ctx->data;
	struct ethhdr *eth = data;
	__u16 h_proto;

	if (data + sizeof(struct ethhdr) > data_end)
		return XDP_DROP;

	h_proto = eth->h_proto;

	if (h_proto == htons(ETH_P_IPV6))
		return XDP_DROP;

	return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

Compare with the first example. We added two more header files linux/if_ether.h and arpa/inet.h for struct ethhdr and function htons().

The struct xdp_md is also used in this example. It’s defined(in linux 5.10) like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
struct xdp_md {
	__u32 data;
	__u32 data_end;
	__u32 data_meta;
	/* Below access go through struct xdp_rxq_info */
	__u32 ingress_ifindex; /* rxq->dev->ifindex */
	__u32 rx_queue_index;  /* rxq->queue_index  */

	__u32 egress_ifindex;  /* txq->dev->ifindex */
};

The packet contents are between ctx->data and ctx->data_end. And the data starts with ethernet header, so we assign ethhdr like

1
2
	void *data = (void *)(long)ctx->data;
	struct ethhdr *eth = data;

When accessing the data in struct ethhdr, we must make sure we don’t access invalid areas by checking data + sizeof(struct ethhdr) > data_end. This check is compulsory by the BPF verifer.

Then if the protocol in ether header is IPv6 with checking h_proto == htons(ETH_P_IPV6), we will drop it by returning XDP_DROP. For others we just return XDP_PASS.

Task 3: MAP: Count the processed packets

In the previous example we dropped IPv6 packets, but how to know how many packets are dropped? Here we got another BPF feature, MAPS. BPF maps are used to share data between kernel and userspace. We can update the map data in kernel and read it from userspace, or vice versa.

Here is an example of new BTF-defined map(introduced by upstream commit abd29c931459).

1
2
3
4
5
6
struct {
        __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
        __type(key, __u32);
        __type(value, long);
        __uint(max_entries, 1);
} rxcnt SEC(".maps");

The map is named as rxcnt with type BPF_MAP_TYPE_PERCPU_ARRAY. This type indicates that we will have one instance of the map per CPU core (if you have 4 cores, you will have 4 instances of the map). It will be used to count how many packets are processed per core. Then we defined the key/value type and limit the max entries to 1, as we only need to count one number(the received IPv6 packet). In C code it would look like we defined an array on each cpu with a size of one. .e.g unsigned int rxcnt[1].

BPF also supports more map types like BPF_MAP_TYPE_HASH, BPF_MAP_TYPE_ARRAY, etc.

Then we lookup the value with function bpf_map_lookup_elem() and update it. Here is the full code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <linux/if_ether.h>
#include <arpa/inet.h>

struct {
        __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
        __type(key, __u32);
        __type(value, long);
        __uint(max_entries, 1);
} rxcnt SEC(".maps");

SEC("xdp_drop_ipv6")
int xdp_drop_ipv6_prog(struct xdp_md *ctx)
{
        void *data_end = (void *)(long)ctx->data_end;
        void *data = (void *)(long)ctx->data;
        struct ethhdr *eth = data;
        __u16 h_proto;
        __u32 key = 0;
        long *value;

        if (data + sizeof(struct ethhdr) > data_end)
                return XDP_DROP;

        h_proto = eth->h_proto;

        if (h_proto == htons(ETH_P_IPV6)) {
                value = bpf_map_lookup_elem(&rxcnt, &key);
                if (value)
                        *value += 1;
                return XDP_DROP;
        }

        return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

Now let’s name the new program to xdp_drop_ipv6_count.c and build it to xdp_drop_ipv6_count.o.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
$ sudo xdp-loader load -m skb -s xdp_drop_ipv6 veth1 xdp_drop_ipv6_count.o

<receive some IPv6 packets>

$ sudo bpftool map show
bpftool map show
13: percpu_array  name rxcnt  flags 0x0
        key 4B  value 8B  max_entries 1  memlock 4096B
        btf_id 20
19: array  name xdp_disp.rodata  flags 0x480
        key 4B  value 84B  max_entries 1  memlock 8192B
        btf_id 28  frozen
# bpftool map dump id 13
[{
        "key": 0,
        "values": [{
                "cpu": 0,
                "value": 13
            },{
                "cpu": 1,
                "value": 7
            },{
                "cpu": 2,
                "value": 0
            },{
                "cpu": 3,
                "value": 0
            }
        ]
    }
]

After loading the object, then send some IPv6 packets to this interface. With bpftool map show we can see the map rxcnt's id is 13. Then with bpftool map dump id 13 we could see there are 13 packets are processed on cpu 0 and 7 packets are processed on cpu 1.

Task 4: Loader: load XDP objects with custom loader

We can load the XDP objects with ip and show the map number with bpftool. But if we want more advanced features(create, read, write maps, attach xdp programs to interfaces, etc), we need to write the loader ourselves.

Here is an example of how to show the total packets and PPS we dropped. In this example I hard code a lot of stuff like kernel object name, section name, etc. These all can be set via parameters in your own code.

The purpose of each function has commented in the code.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <linux/if_link.h>
#include <signal.h>
#include <net/if.h>
#include <assert.h>

/* In this example we use libbpf-devel and libxdp-devel */
#include <bpf/bpf.h>
#include <bpf/libbpf.h>
#include <xdp/libxdp.h>

/* We define the following variables global */
static int ifindex;
struct xdp_program *prog = NULL;

/* This function will remove XDP from the link when program exit */
static void int_exit(int sig)
{
	xdp_program__detach(prog, ifindex, XDP_MODE_SKB, 0);
	xdp_program__close(prog);
	exit(0);
}

/* This function will count the per CPU number and print out total
 * dropped packets number and PPS(packets per second).
 */
static void poll_stats(int map_fd, int interval)
{
	int ncpus = libbpf_num_possible_cpus();
	if (ncpus < 0) {
		printf("Error get possible cpus\n");
		return;
	}
	long values[ncpus], prev[ncpus], total_pkts;
	int i, key = 0;

	memset(prev, 0, sizeof(prev));

	while (1) {
		long sum = 0;

		sleep(interval);
		assert(bpf_map_lookup_elem(map_fd, &key, values) == 0);
		for (i = 0; i < ncpus; i++)
			sum += (values[i] - prev[i]);
		if (sum) {
			total_pkts += sum;
			printf("total dropped %10llu, %10llu pkt/s\n",
			       total_pkts, sum / interval);
		}
		memcpy(prev, values, sizeof(values));
	}
}

int main(int argc, char *argv[])
{
	int prog_fd, map_fd, ret;
	struct bpf_object *bpf_obj;

	if (argc != 2) {
		printf("Usage: %s IFNAME\n", argv[0]);
		return 1;
	}

	ifindex = if_nametoindex(argv[1]);
	if (!ifindex) {
		printf("get ifindex from interface name failed\n");
		return 1;
	}

	/* load XDP object by libxdp */
	prog = xdp_program__open_file("xdp_drop_ipv6_count.o", "xdp_drop_ipv6", NULL);
	if (libxdp_get_error(prog)) {
		printf("Error, load xdp prog failed\n");
		return 1;
	}

	/* attach XDP program to interface with skb mode.
	 * Please set ulimit if you got an -EPERM error.
	 */
	ret = xdp_program__attach(prog, ifindex, XDP_MODE_SKB, 0);
	if (ret) {
		printf("Error, Set xdp fd on iface %d failed\n", ifindex);
		return ret;
	}

	/* Find the map fd from bpf object */
	bpf_obj = xdp_program__bpf_obj(prog);
	map_fd = bpf_object__find_map_fd_by_name(bpf_obj, "rxcnt");
	if (map_fd < 0) {
		printf("Error, get map fd from bpf obj failed\n");
		return map_fd;
	}

	/* Remove attached program when program is interrupted or killed */
	signal(SIGINT, int_exit);
	signal(SIGTERM, int_exit);

	poll_stats(map_fd, 2);

	return 0;
}

Set the ulimit to unlimited and build the program with flag -lbpf -lxdp. then run the program and we could see the pkt count output.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ sudo ulimit -l unlimited
$ gcc xdp_drop_ipv6_count_user.c -o xdp_drop_ipv6_count -lbpf -lxdp
$ sudo ./xdp_drop_ipv6_count veth1
total dropped          2,          1 pkt/s
total dropped        129,         63 pkt/s
total dropped        311,         91 pkt/s
total dropped        492,         90 pkt/s
total dropped        674,         91 pkt/s
total dropped        856,         91 pkt/s
total dropped       1038,         91 pkt/s
^C

Summary

With this article we know what a XDP program looks like, how to add a BPF map and how to write customer loader. If you want to learn more about XDP programming, you can reference to xdp-tutorial.