Local deliver

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
- ip_local_deliver
  - nf_hook
    - if (hook_head) -> nf_hook_slow() -> nf_hook_entry_hookfn()
      - ipvlan_nf_input()
  - ip_local_deliver_finish
    - ip_protocol_deliver_rcu
      - raw_local_deliver
        - raw_v4_input
          - raw_v4_match()
      - ipprot = rcu_dereference(inet_protos[protocol]);

- ip6_input
  - nf_hook
    - if (hook_head) -> nf_hook_slow() -> nf_hook_entry_hookfn()
      - ipvlan_nf_input()
  - ip6_input_finish
    - ip6_protocol_deliver_rcu
      - raw6_local_deliver
        - ipv6_raw_deliver
          - raw_v6_match
      - ipprot->handler

New/Del addr

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
- inet_rtm_deladdr
  - __inet_del_ifa
    - fib_del_ifaddr    /* remove all routes and add back later */
      - fib_sync_down_addr
        - fib_flush
          - fib_table_flush

- inet6_rtm_deladdr
  - inet6_addr_del
    - ipv6_del_addr
      - ipv6_ifa_notify
      - cleanup_prefix_route
      - rt6_remove_prefsrc

New/Del route

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
struct fib_nh {
    struct fib_nh_common    nh_common;
#define fib_nh_weight           nh_common.nhc_weight
}
struct fib_info {
    int     fib_nhs;
    struct nexthop  *nh;        /* new nexthop api */
    struct fib_nh   fib_nh[];   /* old nexthop api */
}

- inet_rtm_newroute
  - rtm_to_fib_config
    - RTA_MULTIPATH: lwtunnel_valid_encap_type_attr
    - RTA_ENCAP_TYPE: lwtunnel_valid_encap_type
  - fib_new_table
  - fib_table_insert
    - fib_find_alias()
    - Check keys [prefix,dscp,priority] and return -EEXIST if NLM_F_EXCL is set
    - fib_create_info
      - if (cfg->fc_nh_id): iproute nhid, new nexthop api
      - if (cfg->fc_mp): old multipath api
        - nhs = fib_count_nexthops
        - fi->fib_nhs = nhs
        - fib_get_nhs
          - fib_nh_init
        - fib_check_nh
      - if (cfg->fc_scope == RT_SCOPE_HOST), nh->fib_nh_scope = RT_SCOPE_NOWHERE;
    - fib_find_node
    - fib_find_alias
    - rtmsg_fib()

- inet_rtm_delroute
  - rtm_to_fib_config
  - fib_get_table
  - fib_table_delete
    - fib_find_node
    - fib_find_alias
    - fib_nh_match && fib_metrics_match
    - rtmsg_fib()
    - fib_remove_alias

rt netlink msg notify
- rtmsg_fib
  - fib_dump_info
    - if (nhs == 1): fib_nexthop_info
      - put RTA_GATEWAY
    - else: fib_add_multipath
      - Start with RTA_MULTIPATH flag
      - fib_add_nexthop(with fib_nh_weight)
        - add struct rtnexthop *rtnh in front of skb
        - fib_nexthop_info
  - rtnl_notify

- inet6_rtm_newroute
  - rtm_to_fib6_config
    - RTN_PROHIBIT -> cfg->fc_flags |= RTF_REJECT
  - if (cfg.fc_mp): ip6_route_multipath_add
    - ip6_route_info_create -> new struct fib6_info
      - fib6_nh_init
        - fib6_is_reject() -> check if addr need to goto reject route(lo blackhold)
    - ip6_route_info_append -> append to each fib6_info to list rt6_nh_list
      - rt6_duplicate_nexthop -> check each nexthop if dup
    - __ip6_ins_rt
      - fib6_add
        - fib6_add_1
          - Check NLM_F_CREATE/NLM_F_REPLACE flags
        - fib6_add_rt2node
          - rt6_duplicate_nexthop
  - else : ip6_route_add
    - ip6_route_info_create
      - fib6_nh_init
    - __ip6_ins_rt

- inet6_rtm_delroute
  - rtm_to_fib6_config
  - if fc_mp: ip6_route_multipath_del
    - for_each_mp: ip6_route_del()
  - else: ip6_route_del

FIB

Reference from http://ja.ssi.bg/fib.txt

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
		FIB - Forwarding Information Base

- Routes are organized in routing tables
- For "fib_hash" algorithm routing tables have 33 zones (for prefix
lengths 0..32), routing lookup walks them from 32 to 0 to find a
node containing all routing information
- Zones are implemented as hash tables where nodes are hashed by
key (prefix=network) because there can be lots of prefixes in a zone.
- Nodes can be stored with other methods, eg. trie, where nodes are
searched (we hope faster) by prefix and length, no zones are used
in this case
- Nodes have a list of aliases (tos+type+scope+fib_info ptr) sorted by
decreasing TOS because TOS=0 must be a last hit when looking for route,
TOS 0 matches packet with any TOS. type is unicast, local, prohibit, etc.
scope is host, link, etc. Additionally, aliases with same TOS are
sorted by fib_info priority (ascending).
- fib_info is a structure containing protocol (kernel, boot, zebra, etc),
prefsrc, priority (metric), metrics, nexthop(s). Fallback routes have
higher value for priority, they are used if more priority routes
disappear or their nexthops are dead.
- fib_info structures are organized in 2 global hash tables, one
keyed by prefsrc and another by nexthop_count+protocol+prefsrc+priority
- fib_info is a shared structure, different aliases can point to same
fib_info, even aliases from different prefixes, from different routing
tables. By this way if fib_info contains multipath route then many
aliases share same route path scheduling context.
- Nexthop contains gateway, output device, scope and weight. Weight
is used for path scheduling where nexthops have relative priority
compared to other nexthops in multipath route.
- There can be many aliases with same tos, there can be alternative
routes (aliases) with same tos and priority (metric) but only one alias
with particular tos, type, scope and fib_info can exist to avoid duplicate
alternative routes.
- The operation to replace route includes replacing of alias. The alias
in node (table -> prefix/len) is matched by tos and fib_info priority and
they can not be changed. The parameters that are changed are type, scope
and fib_info (except priority).
- The 'ip' tool maps route operations to NLM_F_* flags as follows: 
	- ip route add		-> NLM_F_CREATE | NLM_F_EXCL
		- add unique route
	- ip route change	-> NLM_F_REPLACE
		- change existing route
	- ip route replace	-> NLM_F_CREATE | NLM_F_REPLACE
		- change existing route or create new route
	- ip route prepend	-> NLM_F_CREATE
		- create new alternative route as first
	- ip route append	-> NLM_F_CREATE | NLM_F_APPEND
		- create new alternative route as last
	- ip route test		-> NLM_F_EXCL
		- check if route exists
- By default, 'ip route add' adds no more than one route for particular
prefix/len, tos and fib_info priority. This is guaranteed by the
NLM_F_EXCL flag. Extension to this rule is the support for alternative
routes where 'ip route prepend' and 'ip route append' which avoid the
NLM_F_EXCL flag and allow many routes for prefix/len, tos and
fib_info priority to be added. Still, there should be no more than one
alternative route with same prefix/len, tos and all remaining fib_info
parameters.
- As for the 'route' tool, it works just like 'ip route prepend' which
allows alternative routes to be created.
- Additionally, the IP stack can automatically add local or broadcast
'proto kernel' routes when IP address is added and unicast subnet route
when primary IP address for subnet is added. The routes are created in
the 'ip route append' way in local or main table.

FIB tree:

* routing table
	* node (prefix/len)
		* alias (tos, type, scope)
			-> fib_info (priority, protocol, prefsrc, metrics)
				* nexthop (gateway, outdev, scope, weight)

- one or many routing tables with fast access to multiple nodes, each
node has list with aliases with unique parameters sorted by decreasing tos
and increasing priority.