Discussion:
[PATCH] mm: set khugepaged_max_ptes_none by 1/8 of HPAGE_PMD_NR
(too old to reply)
Ebru Akagunduz
2015-02-27 18:30:01 UTC
Permalink
Using THP, programs can access memory faster, by having the
kernel collapse small pages into large pages. The parameter
max_ptes_none specifies how many extra small pages (that are
not already mapped) can be allocated when collapsing a group
of small pages into one large page.

A larger value of max_ptes_none can cause the kernel
to collapse more incomplete areas into THPs, speeding
up memory access at the cost of increased memory use.
A smaller value of max_ptes_none will reduce memory
waste, at the expense of collapsing fewer areas into
THPs.

The problem was reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=93111

Signed-off-by: Ebru Akagunduz <***@gmail.com>
Reviewed-by: Rik van Riel <***@redhat.com>
---
mm/huge_memory.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e08e37a..497fb5a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -59,11 +59,10 @@ static DEFINE_MUTEX(khugepaged_mutex);
static DEFINE_SPINLOCK(khugepaged_mm_lock);
static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
/*
- * default collapse hugepages if there is at least one pte mapped like
- * it would have happened if the vma was large enough during page
- * fault.
+ * The default value should be a compromise between memory use and THP speedup.
+ * To collapse hugepages, unmapped ptes should not exceed 1/8 of HPAGE_PMD_NR.
*/
-static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
+static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR/8;

static int khugepaged(void *none);
static int khugepaged_slab_init(void);
--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Rik van Riel
2015-02-27 21:00:02 UTC
Permalink
Post by Ebru Akagunduz
Using THP, programs can access memory faster, by having the
kernel collapse small pages into large pages. The parameter
max_ptes_none specifies how many extra small pages (that are
not already mapped) can be allocated when collapsing a group
of small pages into one large page.
Not exactly, khugepaged isn't "allocating" small pages to collapse into a
hugepage, rather it is allocating a hugepage and then remapping the
pageblock's mapped pages.
How would you describe the amount of extra memory
allocated, as a result of converting a partially
mapped 2MB area into a THP?

It is not physically allocating 4kB pages, but
I would like to keep the text understandable to
people who do not know the THP internals.
Post by Ebru Akagunduz
A larger value of max_ptes_none can cause the kernel
to collapse more incomplete areas into THPs, speeding
up memory access at the cost of increased memory use.
A smaller value of max_ptes_none will reduce memory
waste, at the expense of collapsing fewer areas into
THPs.
This changelog only describes what max_ptes_none does, it doesn't state
why you want to change it from HPAGE_PMD_NR-1, which is 511 on x86_64
(largest value, more thp), to HPAGE_PMD_NR/8, which is 64 (smaller value,
less thp, less rss as a result of collapsing).
This has particular performance implications on users who already have thp
enabled, so it's difficult to change the default. This is tuanble that
you could easily set in an initscript, so I don't think we need to change
the value for everybody.
I think we do need to change the default.
Post by Ebru Akagunduz
https://bugzilla.kernel.org/show_bug.cgi?id=93111
Now, there may be a better value than HPAGE_PMD_NR/8, but
I am not sure what it would be, or why.

I do know that HPAGE_PMD_NR-1 results in undesired behaviour,
as seen in the bug above...
--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
David Rientjes
2015-02-27 21:20:01 UTC
Permalink
Post by Rik van Riel
Post by Ebru Akagunduz
Using THP, programs can access memory faster, by having the
kernel collapse small pages into large pages. The parameter
max_ptes_none specifies how many extra small pages (that are
not already mapped) can be allocated when collapsing a group
of small pages into one large page.
Not exactly, khugepaged isn't "allocating" small pages to collapse into a
hugepage, rather it is allocating a hugepage and then remapping the
pageblock's mapped pages.
How would you describe the amount of extra memory
allocated, as a result of converting a partially
mapped 2MB area into a THP?
It is not physically allocating 4kB pages, but
I would like to keep the text understandable to
people who do not know the THP internals.
I would say it specifies how much unmapped memory can become mapped by a
hugepage.
Post by Rik van Riel
I think we do need to change the default.
Post by Ebru Akagunduz
https://bugzilla.kernel.org/show_bug.cgi?id=93111
Now, there may be a better value than HPAGE_PMD_NR/8, but
I am not sure what it would be, or why.
I do know that HPAGE_PMD_NR-1 results in undesired behaviour,
as seen in the bug above...
I know that the value of 64 would also be undesirable for Google since we
tightly constrain memory usage, we have used max_ptes_none == 0 since it
was introduced. We can get away with that because our malloc() is
modified to try to give back large contiguous ranges of memory
periodically back to the system, also using madvise(MADV_DONTNEED), and
tries to avoid splitting thp memory.

The value is determined by how the system will be used: do you tightly
constrain memory usage and not allow any unmapped memory be collapsed into
a hugepage, or do you have an abundance of memory and really want an
aggressive value like HPAGE_PMD_NR-1. Depending on the properties of the
system, you can tune this to anything you want just like we do in
initscripts.

I'm only concerned here about changing a default that has been around for
four years and the possibly negative implications that will have on users
who never touch this value. They undoubtedly get less memory backed by
thp, and that can lead to a performance regression. So if this patch is
merged and we get a bug report for the 4.1 kernel, do we tell that user
that we changed behavior out from under them and to adjust the tunable
back to HPAGE_PMD_NR-1?

Meanwhile, the bug report you cite has a workaround that has always been
available for thp kernels:
# echo 64 > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Vlastimil Babka
2015-03-02 14:10:02 UTC
Permalink
Post by David Rientjes
Post by Rik van Riel
I think we do need to change the default.
Post by Ebru Akagunduz
https://bugzilla.kernel.org/show_bug.cgi?id=93111
Now, there may be a better value than HPAGE_PMD_NR/8, but
I am not sure what it would be, or why.
I do know that HPAGE_PMD_NR-1 results in undesired behaviour,
as seen in the bug above...
I know that the value of 64 would also be undesirable for Google since we
tightly constrain memory usage, we have used max_ptes_none == 0 since it
was introduced. We can get away with that because our malloc() is
modified to try to give back large contiguous ranges of memory
periodically back to the system, also using madvise(MADV_DONTNEED), and
tries to avoid splitting thp memory.
The value is determined by how the system will be used: do you tightly
constrain memory usage and not allow any unmapped memory be collapsed into
a hugepage, or do you have an abundance of memory and really want an
aggressive value like HPAGE_PMD_NR-1. Depending on the properties of the
system, you can tune this to anything you want just like we do in
initscripts.
I'm only concerned here about changing a default that has been around for
four years and the possibly negative implications that will have on users
who never touch this value. They undoubtedly get less memory backed by
thp, and that can lead to a performance regression. So if this patch is
merged and we get a bug report for the 4.1 kernel, do we tell that user
that we changed behavior out from under them and to adjust the tunable
back to HPAGE_PMD_NR-1?
Note that the new default has no effect on THP page faults which will
still effectively act like max_ptes_none == 511. That means anyone who
would notice this change of default has been relying on khugepaged,
which is in its default settings quite slow, and (before other Ebru's
patches) wouldn't collapse pmd's with zero pages or swapcache pages. So
I think the chances of bug report due to the new default are lower than
the bug 93111.
Post by David Rientjes
Meanwhile, the bug report you cite has a workaround that has always been
# echo 64 > /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

David Rientjes
2015-02-27 21:00:04 UTC
Permalink
Post by Ebru Akagunduz
Using THP, programs can access memory faster, by having the
kernel collapse small pages into large pages. The parameter
max_ptes_none specifies how many extra small pages (that are
not already mapped) can be allocated when collapsing a group
of small pages into one large page.
Not exactly, khugepaged isn't "allocating" small pages to collapse into a
hugepage, rather it is allocating a hugepage and then remapping the
pageblock's mapped pages.
Post by Ebru Akagunduz
A larger value of max_ptes_none can cause the kernel
to collapse more incomplete areas into THPs, speeding
up memory access at the cost of increased memory use.
A smaller value of max_ptes_none will reduce memory
waste, at the expense of collapsing fewer areas into
THPs.
This changelog only describes what max_ptes_none does, it doesn't state
why you want to change it from HPAGE_PMD_NR-1, which is 511 on x86_64
(largest value, more thp), to HPAGE_PMD_NR/8, which is 64 (smaller value,
less thp, less rss as a result of collapsing).

This has particular performance implications on users who already have thp
enabled, so it's difficult to change the default. This is tuanble that
you could easily set in an initscript, so I don't think we need to change
the value for everybody.
Post by Ebru Akagunduz
https://bugzilla.kernel.org/show_bug.cgi?id=93111
---
mm/huge_memory.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e08e37a..497fb5a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -59,11 +59,10 @@ static DEFINE_MUTEX(khugepaged_mutex);
static DEFINE_SPINLOCK(khugepaged_mm_lock);
static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
/*
- * default collapse hugepages if there is at least one pte mapped like
- * it would have happened if the vma was large enough during page
- * fault.
+ * The default value should be a compromise between memory use and THP speedup.
+ * To collapse hugepages, unmapped ptes should not exceed 1/8 of HPAGE_PMD_NR.
*/
-static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
+static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR/8;
static int khugepaged(void *none);
static int khugepaged_slab_init(void);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Loading...