CVE-2026-46223
published 2026-05-28CVE-2026-46223: In the Linux kernel, the following vulnerability has been resolved: cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated A chain of commits…
In the Linux kernel, the following vulnerability has been resolved:
cgroup: Defer css percpu_ref kill on rmdir until cgroup is depopulated
A chain of commits going back to v7.0 reworked rmdir to satisfy the
controller invariant that a subsystem's ->css_offline() must not run while
tasks are still doing kernel-side work in the cgroup.
[1] d245698d727a ("cgroup: Defer task cgroup unlink until after the task is done switching out")
[2] a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
[3] 1b164b876c36 ("cgroup: Wait for dying tasks to leave on rmdir")
[4] 4c56a8ac6869 ("cgroup: Fix cgroup_drain_dying() testing the wrong condition")
[5] 13e786b64bd3 ("cgroup: Increment nr_dying_subsys_* from rmdir context")
[1] moved task cset unlink from do_exit() to finish_task_switch() so a
task's cset link drops only after the task has fully stopped scheduling.
That made tasks past exit_signals() linger on cset->tasks until their final
context switch, which led to a series of problems as what userspace expected
to see after rmdir diverged from what the kernel needs to wait for. [2]-[5]
tried to bridge that divergence: [2] filtered the exiting tasks from
cgroup.procs; [3] had rmdir(2) sleep in TASK_UNINTERRUPTIBLE for them; [4]
fixed the wait's condition; [5] made nr_dying_subsys_* visible
synchronously.
The cgroup_drain_dying() wait in [3] turned out to be a dead end. When the
rmdir caller is also the reaper of a zombie that pins a pidns teardown (e.g.
host PID 1 systemd reaping orphan pids that were re-parented to it during
the same teardown), rmdir blocks in TASK_UNINTERRUPTIBLE waiting for those
pids to free, the pids can't free because PID 1 is the reaper and it's stuck
in rmdir, and the system A-A deadlocks. No internal lock ordering breaks
this; the wait itself is the bug.
The css killing side that drove the original reorder, however, can be made
cleanly asynchronous: ->css_offline() is already async, run from
css_killed_work_fn() driven by percpu_ref_kill_and_c
Affected
6 ranges
| Vendor | Product | Version range | Fixed in |
|---|---|---|---|
| linux | linux | — | — |
| linux | linux | — | — |
| linux | linux | >= 1b164b876c36c3eb5561dd9b37702b04401b0166 < 33fa2e6b1507a0a377a151a8826438bedad1d0b0 | 33fa2e6b1507a0a377a151a8826438bedad1d0b0 |
| linux | linux | >= 1b164b876c36c3eb5561dd9b37702b04401b0166 < 93618edf753838a727dbff63c7c291dee22d656b | 93618edf753838a727dbff63c7c291dee22d656b |
| linux | linux | >= 6.19.12 < 6.20 | 6.20 |
| linux | linux_kernel | — | — |