Michal Hocko
2020-03-17 07:59:39 UTC
Killing a user process as a result of hitting memcg limits is a serious
decision that is unfortunately needed only when no forward progress in
reclaiming memory can be made.
Deciding the appropriate oom victim can take a sufficient amount of time
that allows another process that is exiting to actually uncharge to the
same memcg hierarchy and prevent unnecessarily killing user processes.
An example is to prevent *multiple* unnecessary oom kills on a system
with two cores where the oom kill occurs when there is an abundance of
Memory cgroup out of memory: Killed process 628 (repro) total-vm:41944kB, anon-rss:40888kB, file-rss:496kB, shmem-rss:0kB, UID:0 pgtables:116kB oom_score_adj:0
<immediately after>
repro invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
CPU: 1 PID: 629 Comm: repro Not tainted 5.6.0-rc5+ #130
dump_stack+0x78/0xb6
dump_header+0x55/0x240
oom_kill_process+0xc5/0x170
out_of_memory+0x305/0x4a0
try_charge+0x77b/0xac0
mem_cgroup_try_charge+0x10a/0x220
mem_cgroup_try_charge_delay+0x1e/0x40
handle_mm_fault+0xdf2/0x15f0
do_user_addr_fault+0x21f/0x420
async_page_fault+0x2f/0x40
memory: usage 61336kB, limit 102400kB, failcnt 74
Notice the second memcg oom kill shows usage is >40MB below its limit of
100MB but a process is still unnecessarily killed because the decision has
already been made to oom kill by calling out_of_memory() before the
initial victim had a chance to uncharge its memory.
Could you be more specific about the specific workload please?decision that is unfortunately needed only when no forward progress in
reclaiming memory can be made.
Deciding the appropriate oom victim can take a sufficient amount of time
that allows another process that is exiting to actually uncharge to the
same memcg hierarchy and prevent unnecessarily killing user processes.
An example is to prevent *multiple* unnecessary oom kills on a system
with two cores where the oom kill occurs when there is an abundance of
Memory cgroup out of memory: Killed process 628 (repro) total-vm:41944kB, anon-rss:40888kB, file-rss:496kB, shmem-rss:0kB, UID:0 pgtables:116kB oom_score_adj:0
<immediately after>
repro invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
CPU: 1 PID: 629 Comm: repro Not tainted 5.6.0-rc5+ #130
dump_stack+0x78/0xb6
dump_header+0x55/0x240
oom_kill_process+0xc5/0x170
out_of_memory+0x305/0x4a0
try_charge+0x77b/0xac0
mem_cgroup_try_charge+0x10a/0x220
mem_cgroup_try_charge_delay+0x1e/0x40
handle_mm_fault+0xdf2/0x15f0
do_user_addr_fault+0x21f/0x420
async_page_fault+0x2f/0x40
memory: usage 61336kB, limit 102400kB, failcnt 74
Notice the second memcg oom kill shows usage is >40MB below its limit of
100MB but a process is still unnecessarily killed because the decision has
already been made to oom kill by calling out_of_memory() before the
initial victim had a chance to uncharge its memory.
caused it to initially get reported?
one and we should have a clear data to support the check actually helps
(in how many instances etc...)
--
Michal Hocko
SUSE Labs
Michal Hocko
SUSE Labs