Discussion:
interaction of MADV_PAGEOUT with CoW anonymous mappings?
(too old to reply)
Minchan Kim
2020-03-17 01:43:40 UTC
Permalink
[...]
From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
Date: Thu, 12 Mar 2020 09:04:35 +0100
Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
Jann has brought up a very interesting point [1]. While shared pages are
excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
that way. This can lead to all sorts of hard to debug problems. E.g.
performance problems outlined by Daniel [2]. There are runtime
environments where there is a substantial memory shared among security
domains via CoW memory and a easy to reclaim way of that memory, which
MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
in for the parent process which might be more privileged or even open
side channel attacks. The feasibility of the later is not really clear
I am not sure it's a good idea to mention performance stuff because
it's rather arguble. You and Johannes already pointed it out when I sbumit
early draft which had shared page filtering out logic due to performance
reason. You guys suggested the shared pages has higher chance to be touched
so that if it's really hot pages, that whould keep in the memory. I agree.
Yes, the hot memory is likely to be referenced but the point was an
unexpected latency because of the major fault. I have to say that I have
I don't understand your point here. If it's likely to be referenced
among several processes, it doesn't have the major fault latency.
What's your point here?
a) the particular CoW page might be cold enough to be reclaimed and b)
If it is, that means it's *cold* so it's really worth to be reclaimed.
nothing really prevents the MADV_PAGEOUT to be called faster than the
reference bit being readded.
Yeb, that's undesirable. I should admit it was not intended when I implemented
PAGEOUT. The thing is page_check_references clears access bit of pte for every
process are sharing the page so that two times MADV_PAGEOUT from a process could
evict the page. That's the really bug.
I do not really think this is a bug. This is a side effect of the
reclaim process and we do not really want MADV_{PAGEOUT,COLD} behave
No, that's the bug since we didn't consider the side effect.
differently here because then the behavior would be even harder to
No, I do want to have difference because it's per-process hint. IOW,
what he know is for only his context, not others so it shouldn't clean
others' pte. That makes difference between LRU aging and the hint.
understand.
It's not hard to understand.. MADV_PAGEOUT should consider only his
context since it's per-process hint(Even, he couldn't know others'
context) so it shouldn't bother others.

Actually, Dave's suggestion is correct to fix the issue if there
was no isse with side channel attack. However, due to the attack
issue, page_mapcount could prevent the problem effectively.
That's why I am not against of the patch now since it fixes
the bug as well as vulnerability.
Michal Hocko
2020-03-17 07:12:39 UTC
Permalink
Post by Minchan Kim
[...]
From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
Date: Thu, 12 Mar 2020 09:04:35 +0100
Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
Jann has brought up a very interesting point [1]. While shared pages are
excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
that way. This can lead to all sorts of hard to debug problems. E.g.
performance problems outlined by Daniel [2]. There are runtime
environments where there is a substantial memory shared among security
domains via CoW memory and a easy to reclaim way of that memory, which
MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
in for the parent process which might be more privileged or even open
side channel attacks. The feasibility of the later is not really clear
I am not sure it's a good idea to mention performance stuff because
it's rather arguble. You and Johannes already pointed it out when I sbumit
early draft which had shared page filtering out logic due to performance
reason. You guys suggested the shared pages has higher chance to be touched
so that if it's really hot pages, that whould keep in the memory. I agree.
Yes, the hot memory is likely to be referenced but the point was an
unexpected latency because of the major fault. I have to say that I have
I don't understand your point here. If it's likely to be referenced
among several processes, it doesn't have the major fault latency.
What's your point here?
a) the particular CoW page might be cold enough to be reclaimed and b)
If it is, that means it's *cold* so it's really worth to be reclaimed.
nothing really prevents the MADV_PAGEOUT to be called faster than the
reference bit being readded.
Yeb, that's undesirable. I should admit it was not intended when I implemented
PAGEOUT. The thing is page_check_references clears access bit of pte for every
process are sharing the page so that two times MADV_PAGEOUT from a process could
evict the page. That's the really bug.
I do not really think this is a bug. This is a side effect of the
reclaim process and we do not really want MADV_{PAGEOUT,COLD} behave
No, that's the bug since we didn't consider the side effect.
differently here because then the behavior would be even harder to
No, I do want to have difference because it's per-process hint. IOW,
what he know is for only his context, not others so it shouldn't clean
others' pte. That makes difference between LRU aging and the hint.
Just to make it clear, are you really suggesting to special case
page_check_references for madvise path?
--
Michal Hocko
SUSE Labs
Minchan Kim
2020-03-17 15:00:55 UTC
Permalink
Post by Michal Hocko
Post by Minchan Kim
[...]
From eca97990372679c097a88164ff4b3d7879b0e127 Mon Sep 17 00:00:00 2001
Date: Thu, 12 Mar 2020 09:04:35 +0100
Subject: [PATCH] mm: do not allow MADV_PAGEOUT for CoW pages
Jann has brought up a very interesting point [1]. While shared pages are
excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
that way. This can lead to all sorts of hard to debug problems. E.g.
performance problems outlined by Daniel [2]. There are runtime
environments where there is a substantial memory shared among security
domains via CoW memory and a easy to reclaim way of that memory, which
MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation
in for the parent process which might be more privileged or even open
side channel attacks. The feasibility of the later is not really clear
I am not sure it's a good idea to mention performance stuff because
it's rather arguble. You and Johannes already pointed it out when I sbumit
early draft which had shared page filtering out logic due to performance
reason. You guys suggested the shared pages has higher chance to be touched
so that if it's really hot pages, that whould keep in the memory. I agree.
Yes, the hot memory is likely to be referenced but the point was an
unexpected latency because of the major fault. I have to say that I have
I don't understand your point here. If it's likely to be referenced
among several processes, it doesn't have the major fault latency.
What's your point here?
a) the particular CoW page might be cold enough to be reclaimed and b)
If it is, that means it's *cold* so it's really worth to be reclaimed.
nothing really prevents the MADV_PAGEOUT to be called faster than the
reference bit being readded.
Yeb, that's undesirable. I should admit it was not intended when I implemented
PAGEOUT. The thing is page_check_references clears access bit of pte for every
process are sharing the page so that two times MADV_PAGEOUT from a process could
evict the page. That's the really bug.
I do not really think this is a bug. This is a side effect of the
reclaim process and we do not really want MADV_{PAGEOUT,COLD} behave
No, that's the bug since we didn't consider the side effect.
differently here because then the behavior would be even harder to
No, I do want to have difference because it's per-process hint. IOW,
what he know is for only his context, not others so it shouldn't clean
others' pte. That makes difference between LRU aging and the hint.
Just to make it clear, are you really suggesting to special case
page_check_references for madvise path?
No, (page_mapcount() > 1) checks *effectively* fixes the performance
bug as well as vulnerability issue.
Michal Hocko
2020-03-17 15:58:55 UTC
Permalink
[...]
Post by Minchan Kim
Post by Michal Hocko
Just to make it clear, are you really suggesting to special case
page_check_references for madvise path?
No, (page_mapcount() > 1) checks *effectively* fixes the performance
bug as well as vulnerability issue.
Ahh, ok then we are on the same page. You were replying to the part
where I have pointed out that you can control aging by these calls
and your response suggested that this is somehow undesirable behavior or
even a bug.
--
Michal Hocko
SUSE Labs
Minchan Kim
2020-03-17 17:20:22 UTC
Permalink
Post by Michal Hocko
[...]
Post by Minchan Kim
Post by Michal Hocko
Just to make it clear, are you really suggesting to special case
page_check_references for madvise path?
No, (page_mapcount() > 1) checks *effectively* fixes the performance
bug as well as vulnerability issue.
Ahh, ok then we are on the same page. You were replying to the part
where I have pointed out that you can control aging by these calls
and your response suggested that this is somehow undesirable behavior or
even a bug.
Sorry about the confusing.

I want to clarify my speaking.

If we don't have vulnerability issue Jann raised, the performance issue
Daniel pointed should be fixed by introducing a special flag in
page_check_references from madvise path to avoid cleaning of access bit
from other processes's pte. With it, we don't need to limit semantic of
MADV_PAGEOUT as "exclusive page only" so that MADV_PAGEOUT will work
*cold* shared pages as well as exclusive one.

However, since we have the vulnerability issue, *unfortunately*, we need
to make MADV_PAGEOUT's semantic working with only exclusive page.
Thus, page_mapcount check in madvise patch will fix both issues
*effectively*.

Thanks.

Loading...