Linux bridge features introduction

A Linux bridge behaves like a network switch. It forwards packets between interfaces that are connected to it.

The Linux bridge has added basic STP, multicast, netfilter support since the 2.4 and 2.6 kernel series. And it added more features after that. e.g.

Config via netlink
VLAN filter
VxLAN tunnel mapping
IGMPv3/MLDv2
switchdev
netfilter
Others

We will introduce all the features in this article. At the end I will also show the limits of bridge and when OVS should be used.

Basic Usage

All the cmds used in this article are via iproute2, which is using netlink message to config the bridge.

There are 2 iproute2 cmds for bridge setting/configuring. ip link and bridge. ip link cmd used for add/set/remove bridge. bridge is used to show/manipulate bridge fdb/mdb/vlan. I will introduce the details of these 2 cmds in the following sections.

Show ip link bridge help info

1
2
# ip link help bridge
# bridge -h

Create a bridge with name br0

1
# ip link add br0 type bridge

Show bridge details

1
# ip -d link show br0

Show bridge details with pretty JSON format (which is a good way to get bridge key/values)

1
# ip -j -p -d link show br0

Add interfaces to bridge

1
2
# ip link set veth0 master br0
# ip link set tap0 master br0

STP

Linux bridge has supported STP(Spanning Tree Protocol) since 2.4 and 2.6 kernel series. STP is used to prevent a networking loop.

Note: Linux bridge does not support RSTP.

To enable bridge STP

1
# ip link set br0 type bridge stp_state 1

Here is an example that a user creating a loop in the network. If STP is not enabled, there will be a traffic storm in the network.

br_1

With STP enabled, the bridges will send BPDUs and elect a root bridge and block an interface to make the network topology loop-free.

br_2

Show bridge STP blocking state

1
2
3
4
5
6
7
8
9
# ip -j -p -d link show br0 | grep root_port
                "root_port": 1,
# ip -j -p -d link show br1 | grep root_port
                "root_port": 0,
# bridge link show
7: veth0@veth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 2
8: veth1@veth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br1 state forwarding priority 32 cost 2
9: veth2@veth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state blocking priority 32 cost 2
10: veth3@veth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br1 state forwarding priority 32 cost 2

We can see the interface veth2 is in blocking state after enabling STP

br_3

To change the STP hello time,

1
2
3
# ip link set br0 type bridge hello_time 300
# ip -j -p -d link show br0 | grep \"hello_time\"
                "hello_time": 300,

You can use the same way to change other STP parameters like: max age, forward delay, ageing time, etc.

VLAN filter

VLAN filter was introduced in Linux kernel 3.8. Before this feature, the user needs to create multi bridge/VLAN interfaces when separating VLAN traffic on the bridge.

e.g. with the following topo, users need to add multi bridges and VLANs to make sure VLAN traffic goes to correspond netns.

br_4 But with VLAN filter, only one bridge device is enough to set all the VLAN configs.

br_5

we can do like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# ip link set br0 type bridge vlan_filtering 1
# ip link set eth0 master br0
# ip link set eth0 up
# ip link set br0 up

# bridge vlan add dev veth1 vid 2
# bridge vlan add dev veth2 vid 2 pvid untagged
# bridge vlan add dev veth3 vid 3 pvid untagged master

# bridge vlan add dev eth1 vid 2-3

# bridge vlan show
port              vlan-id
eth1              1 PVID Egress Untagged
                  2
                  3
br0               1 PVID Egress Untagged
veth1             1 Egress Untagged
                  2
veth2             1 Egress Untagged
                  2 PVID Egress Untagged
veth3             1 Egress Untagged
                  3 PVID Egress Untagged

Command ip link set br0 type bridge vlan_filtering 1 enables VLAN filter on bridge br0.

Command bridge vlan add dev veth1 vid 2 makes bridge port veth1 only transmit VLAN 2 data.

Command bridge vlan add dev veth2 vid 2 pvid untagged make bridge port veth2 transmit VLAN 2 data. With parameter pvid, any untagged frames will be assigned to this VLAN at ingress (veth2 to bridge). With parameter untagged, the VLAN 2 tag will be untagged on egress (bridge to veth2).

Command bridge vlan add dev veth3 vid 3 pvid untagged master do the same thing as previous one, the parameter master is a default value, which means the link setting is configured on the software bridge. You can omit this as the previous cmd.

Command bridge vlan add dev eth1 vid 2-3 enalbes VLAN 2 and VLAN 3 traffic on eth1.

To show the VLAN traffic state, we can enable VLAN stats (added in kernel 4.7) by ip link set br0 type bridge vlan_stats_enabled 1. But this is a global VLAN stats on the bridge and we could not find each VLAN’s state. To enalbe per-vlan stats, we also need to enable vlan_stats_per_port (added in kernel 4.20) when there is no port vlans in the bridge. With ip link set br0 type bridge vlan_stats_per_port 1, and show the stats with

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# bridge -s vlan show
port              vlan-id
br0               1 PVID Egress Untagged
                    RX: 248 bytes 3 packets
                    TX: 333 bytes 1 packets
eth1              1 PVID Egress Untagged
                    RX: 333 bytes 1 packets
                    TX: 248 bytes 3 packets
                  2
                    RX: 0 bytes 0 packets
                    TX: 56 bytes 1 packets
                  3
                    RX: 0 bytes 0 packets
                    TX: 224 bytes 7 packets
veth1             1 Egress Untagged
                    RX: 0 bytes 0 packets
                    TX: 581 bytes 4 packets
                  2 PVID Egress Untagged
                    RX: 6356 bytes 77 packets
                    TX: 6412 bytes 78 packets
veth2             1 Egress Untagged
                    RX: 0 bytes 0 packets
                    TX: 581 bytes 4 packets
                  2 PVID Egress Untagged
                    RX: 6412 bytes 78 packets
                    TX: 6356 bytes 77 packets
veth3             1 Egress Untagged
                    RX: 0 bytes 0 packets
                    TX: 581 bytes 4 packets
                  3 PVID Egress Untagged
                    RX: 224 bytes 7 packets
                    TX: 0 bytes 0 packets

We can see each VLAN’s stats now.

VLAN to Tunnel mapping

As we know VxLAN is used to build layer 2 virtual networks across the underlay layer3 infrastructure. A VxLAN tunnel endpoint (VTEP) originates and terminates VxLAN tunnels. VxLAN bridging is the function provided by VTEPs to terminate VxLAN tunnels and map the VxLAN vni to traditional end host VLAN.

To achieve this, previously we need to add local ports and VxLAN netdevs into a VLAN filtering bridge. The local ports are configured as trunk ports carrying all VLANs. A VxLAN netdev per vni is added to the bridge. VLAN mapping to vni is achieved by configuring the VLAN as pvid on the corresponding VxLAN netdev. like:

br_6

Since kernel 4.11, kernel has provided a native way to support Vxlan bridging. First, add related VIDs to the interfaces (Note: vxlan0 was added with lwt to handle multi vni)

br_7

1
2
3
4
bridge vlan add dev eth1 vid 100-101
bridge vlan add dev eth1 vid 200
bridge vlan add dev vxlan0 vid 100-101
bridge vlan add dev vxlan0 vid 200

Now enable VLAN tunnel mapping on a bridge port:

1
2
3
4
5
# ip link set dev vxlan0 type bridge_slave vlan_tunnel on

or

# bridge link set dev vxlan0 vlan_tunnel on

Then add vlan tunnel mapping

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# bridge vlan add dev vxlan0 vid 2000 tunnel_info id 2000
# bridge vlan add dev vxlan0 vid 1000-1001 tunnel_info id 1000-1001

# bridge -j -p vlan tunnelshow
[ {
        "ifname": "vxlan0",
        "tunnels": [ {
                "vlan": 100,
                "vlanEnd": 101,
                "tunid": 100,
                "tunidEnd": 101
            },{
                "vlan": 200,
                "tunid": 200
            } ]
    } ]

Multicast

Linux bridge had IGMPv2/MLDv1 support since 2.6. And IGMPv3/MLDv2 support has been added in kernel 5.10.

Enable bridge multicast snooping, querier, and stats

1
2
3
4
5
6
# ip link set br0 type bridge mcast_snooping 1
# ip link set br0 type bridge mcast_querier 1
# ip link set br0 type bridge mcast_stats_enabled 1
# tcpdump -i br0 -nn -l
02:47:03.417331 IP 0.0.0.0 > 224.0.0.1: igmp query v2
02:47:03.417340 IP6 fe80::3454:82ff:feb9:d7b4 > ff02::1: HBH ICMP6, multicast listener querymax resp delay: 10000 addr: ::, length 24

By default, the bridge use IGMPv2/MLDv1 when snooping enabled. You can change the version like

1
2
ip link set br0 type bridge mcast_igmp_version 3
ip link set br0 type bridge mcast_mld_version 2

After a port join a group, we can show the mdb(multicast data base) by

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# bridge mdb show
dev br0 port br0 grp ff02::fb temp
dev br0 port eth1 grp ff02::fb temp
dev br0 port eth2 grp ff02::fb temp
dev br0 port eth2 grp 224.1.1.1 temp
dev br0 port br0 grp ff02::6a temp
dev br0 port eth1 grp ff02::6a temp
dev br0 port eth2 grp ff02::6a temp
dev br0 port br0 grp ff02::1:ffe2:de9f temp
dev br0 port eth1 grp ff02::1:ffe2:de9f temp
dev br0 port eth2 grp ff02::1:ffe2:de9f temp

Bridge also supports setting mcast snooping and querier on a single VLAN. e.g.

1
bridge vlan set vid 10 dev eth1 mcast_snooping 1 mcast_querier 1

Show bridge xstats (mcast RX/TX info)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# ip link xstats type bridge
br0
                    IGMP queries:
                      RX: v1 0 v2 1 v3 0
                      TX: v1 0 v2 131880 v3 0
                    IGMP reports:
                      RX: v1 0 v2 1 v3 0
                      TX: v1 0 v2 496 v3 18956
                    IGMP leaves: RX: 0 TX: 0
                    IGMP parse errors: 0
                    MLD queries:
                      RX: v1 1 v2 0
                      TX: v1 51327 v2 0
                    MLD reports:
                      RX: v1 66 v2 6
                      TX: v1 3264 v2 213794
                    MLD leaves: RX: 0 TX: 0
                    MLD parse errors: 0

There are also some other multicast parameters you can config like mcast_router, mcast_query_interval, mcast_hash_max, etc.

Bridge switchdev

Linux bridge is always used for connecting Virtual Machines with physical networks, by using tap/virtio driver. We can also attach a SRIOV VF in VM guest to get better performance. But the way SR-IOV embedded switches were dealt with in Linux was limited in its expressiveness and flexibility. And the kernel model for controlling the SR-IOV e-switch did not allow the configuration of anything beyond MAC/VLAN based forwarding.

br_8

To make VFs also get dynamic FDB, VLAN filter benefits while still keeping the best performance. Since kernel v4.9 the Linux bridge added switchdev support, which could supply L2 forwarding offloading via switchdev to HW switch like Mellanox Spectrum, DSA based switches, MLX5 CX6 Dx cards etc.

br_9

If we are in switchdev mode(bridge is up and related config enabled, e.g. MLX5_BRIDGE for MLX5 SRIOV E-Switch), we can connect the VF’s representors to the bridge, and the frames that suppose to be transmitted by the bridge, will be transmitted by hardware only. Their routing will be done in the eswitch of the NIC.

Once a frame passes through the VF to its representor, the bridge learns that the source MAC of the VF is behind a particular port, and it adds an entry with the MAC & port to its FDB. Immediately afterward it will send a message to the mlx5 driver, and the driver will add a relevant rule/line to 2 tables located in the eswitch (on the NIC). Later, if such frame (with the same destination MAC address) comes from the VF, it will not go through the kernel - it will go directly through the NIC to the appropriate port.

More info: Switchdev support for embedded switches in NICs is simple but for full featured switches like Mellanox Spectrum the offloading capabilities are much rich and supports things like LAG (team, bonding), tunneling (VxLAN etc.), routing and TC offloading. The last 2 go out of scope of bridge but LAGs can be attached to bridge as well as VxLAN tunnels and offloading of such scenario is fully supported.

bridge netfilter

By default, the traffic forwarded by bridge does not go through iptables. If users want the L2 traffic also be filtered by iptables forward rules. You can enable this option by

1
ip link set br0 type bridge nf_call_iptables 1

The same with ip6tables and arptables

Other bridge configs

bridge aging time the number of seconds a MAC address will be kept in the FDB after a packet has been received from that address. after this time has passed, entries are cleaned up. To change the timer, you can try

1
ip link set br0 type bridge ageing_time 20000

Bridge vs OVS

Linux bridge is very useful and popular in the past years as it supplies L2 forwarding, connects VMs and networks with VLAN/Multicast support. It’s stable, reliable, and easy for setup and config. On the other hand, it also has some limitations, like fewer tunnel supports. If you want to get easier network management, more tunnel supports (GRE, VXLAN, etc), L3 forwarding and incorporated with SDN, you can try OVS.

Contents