Data source

  • tracepoints: kernel static tracing
  • kprobes: kernel dynamic tracing
  • uprobes: user level dynamic tracing
  • USDT / dtrace probes

These tracepoints are hard coded in interesting and logical locations of the kernel, so that higher-level behavior can be easily traced. You can find all the tracepoints in include/trace/events/

Dynamic tracing can see everything, it’s also an unstable interface since it is instrumenting raw code.

cBPF used for filter. eBPF has (key/value) maps and more helper functions.

Ref: https://jvns.ca/blog/2017/07/05/linux-tracing-systems/

Mechanisms for collecting data

  • ftrace/kprobes/tracepoints: /sys/kernel/debug/tracing
  • perf_events
  • eBPF
  • Systemtap kernel module
  • sysdig

user frontends: display data

  • perf
  • ftrace/trace-cmd
  • bpftrace/bcc
  • systemtap

For user to choose, here are some reference

  • PMC statistics: perf
  • CPU sampling with timestamps: perf, as its perf.data output has been optimized
  • Function flow tracing: ftrace function graphing.

bpfbrace usage

One-Liners

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# bpftrace -l "tracepoint:net:*"
# bpftrace -lv tracepoint:net:net_dev_xmit

# bpftrace -lv "struct path"
 struct path {
	         struct vfsmount *mnt;
		         struct dentry *dentry;
 };
# bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

# bpftrace -e 'BEGIN { printf("hello world\n"); }'
# bpftrace -lv "t:syscalls:sys_enter_openat"
# bpftrace -e 'tracepoint:syscalls:sys_enter_openat / comm == "vim" /{ printf("%s %s\n", comm, str(args->filename)); }'

# bpftrace -e 'kprobe:ip_output { @syscalls = count(); }'
# bpftrace -e 'kprobe:ip_output { @syscalls = count(); } interval:s:1 { print(@syscalls); clear(@syscalls); } '
# bpftrace -e 'kprobe:ip_output { @[kstack()] = count(); }'

bpftrace file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# cat path.bt
#!/usr/bin/bpftrace

kprobe:vfs_open
{
    printf("open path: %s\n", str(((struct path *)arg0)->dentry->d_name.name));
}

# bpftrace path.bt
Attaching 1 probe...
open path: dev
open path: if_inet6
open path: retrans_time_ms

buildin examples

1
2
3
4
5
6
7
# rpm -ql bpftrace | grep tcp.*bt
/usr/share/bpftrace/tools/tcpaccept.bt
/usr/share/bpftrace/tools/tcpconnect.bt
/usr/share/bpftrace/tools/tcpdrop.bt
/usr/share/bpftrace/tools/tcplife.bt
/usr/share/bpftrace/tools/tcpretrans.bt
/usr/share/bpftrace/tools/tcpsynbl.bt

Drop monitor

Drop monitor:

  • software drops: rigister probe on kfree_skb tracepoint
  • hardware drops
    • devlink call back (origin)
    • devlink tracepoint devlink_trap_report (5.10)

comparation: perf, bpftrace, dropwatch

1
2
3
# tc qdisc add dev lo clsact
# tc filter add dev lo egress proto ip flower dst_ip 127.0.0.1 action drop
# ping 127.0.0.1

##drop watch

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# dropwatch -l kas
> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
6 drops at ip_rcv+98 (0xffffffff83e276d8)
1 drops at __netif_receive_skb_core+12c (0xffffffff83d9b61c)
2 drops at __netif_receive_skb_core+12c (0xffffffff83d9b61c)
3 drops at ip_rcv+98 (0xffffffff83e276d8)
5 drops at ip_rcv+98 (0xffffffff83e276d8)
...

perf

1
2
3
4
5
6
7
8
9
# perf record -e skb:kfree_skb -ag
# perf script
swapper     0 [001] 3557578.978726: skb:kfree_skb: skbaddr=0xffff9850400bba00 protocol=2048 location=0xffffffff83e276d8
        ffffffff83d81563 kfree_skb+0x73 (/usr/lib/debug/lib/modules/4.18.0-300.el8.x86_64/vmlinux)
        ffffffff83d81563 kfree_skb+0x73 (/usr/lib/debug/lib/modules/4.18.0-300.el8.x86_64/vmlinux)
        ffffffff83e276d8 ip_rcv+0x98 (/usr/lib/debug/lib/modules/4.18.0-300.el8.x86_64/vmlinux)
        ffffffff83d9c03b __netif_receive_skb_core+0xb4b (/usr/lib/debug/lib/modules/4.18.0-300.el8.x86_64/vmlinux)
        ffffffff83d9c1cd netif_receive_skb_internal+0x3d (/usr/lib/debug/lib/modules/4.18.0-300.el8.x86_64/vmlinux)
	...

bpftrace

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# bpftrace -e 'tracepoint:skb:kfree_skb { @[kstack] = count(); }'
@[
    kfree_skb+115
    kfree_skb+115
    ip_rcv+152
    __netif_receive_skb_core+2891
    netif_receive_skb_internal+61
    napi_gro_receive+186
    ...
    secondary_startup_64_no_verify+194
]: 4

drop watch usage

For software

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# dropwatch -lkas
Initalizing kallsyms db
dropwatch> start
Enabling monitoring...
Kernel monitoring activated.
Issue Ctrl-C to stop monitoring
2 drops at ip_rcv+98 (0xffffffffafc276d8)
1 drops at br_net_exit+7160 (0xffffffffc075f510)
1 drops at ip_rcv_finish+20d (0xffffffffafc2702d)
3 drops at ip_rcv+98 (0xffffffffafc276d8)
1 drops at __init_scratch_end+e0ad2db (0xffffffffc06ad2db)
1 drops at br_net_exit+7160 (0xffffffffc075f510)
2 drops at __udp4_lib_rcv+ab0 (0xffffffffafc60da0)
2 drops at ip_rcv+98 (0xffffffffafc276d8)
1 drops at __init_scratch_end+e0ad2db (0xffffffffc06ad2db)
3 drops at ip_rcv+98 (0xffffffffafc276d8)
2 drops at __init_scratch_end+e0ad2db (0xffffffffc06ad2db)
1 drops at br_net_exit+7160 (0xffffffffc075f510)
4 drops at ip_rcv+98 (0xffffffffafc276d8)
1 drops at br_net_exit+7160 (0xffffffffc075f510)
4 drops at ip_rcv+98 (0xffffffffafc276d8)
2 drops at __init_scratch_end+e0ad2db (0xffffffffc06ad2db)
3 drops at ip6_mc_input+17a (0xffffffffafcb7b9a)
4 drops at ip_rcv+98 (0xffffffffafc276d8)
1 drops at br_net_exit+7160 (0xffffffffc075f510)
1 drops at __init_scratch_end+e0ad2db (0xffffffffc06ad2db)
1 drops at icmpv6_rcv+17c (0xffffffffafcdf55c)
96 drops at ip6_mc_input+17a (0xffffffffafcb7b9a)
71 drops at ip6_mc_input+17a (0xffffffffafcb7b9a)
1 drops at ip_rcv+98 (0xffffffffafc276d8)
1 drops at br_net_exit+7160 (0xffffffffc075f510)
2 drops at ip_rcv+98 (0xffffffffafc276d8)
^CGot a stop message
dropwatch> exit
Shutting down ...

For hardware

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# modprobe netdevsim
# echo "10 2" > /sys/bus/netdevsim/new_device
# ls /sys/bus/netdevsim/devices/netdevsim10/net
eth4  eth5
# ip link set eth4 up
# devlink dev show
netdevsim/netdevsim10
# devlink trap show
netdevsim/netdevsim10:
  name source_mac_is_multicast type drop generic true action drop group l2_drops
  name vlan_tag_mismatch type drop generic true action drop group l2_drops
  name ingress_vlan_filter type drop generic true action drop group l2_drops
  ...

# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap
# devlink trap show netdevsim/netdevsim10 trap blackhole_route
netdevsim/netdevsim10:
  name blackhole_route type drop generic true action trap group l3_drops

Reference: https://lwn.net/Articles/796088/