Discussion:
[locks] 6d390e4b5d: will-it-scale.per_process_ops -96.6% regression
(too old to reply)
Linus Torvalds
2020-03-16 17:26:59 UTC
Permalink
+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread "owns"
+ * the lock and is the only one that might try to claim the lock.
+ * Because fl_blocker is explicitly set last during a delete, it's
+ * safe to locklessly test to see if it's NULL. If it is, then we know
+ * that no new locks can be inserted into its fl_blocked_requests list,
+ * and we can therefore avoid doing anything further as long as that
+ * list is empty.
+ */
+ if (!smp_load_acquire(&waiter->fl_blocker) &&
+ list_empty(&waiter->fl_blocked_requests))
+ return status;
Ack. This looks sane to me now.

yangerkun - how did you find the original problem?

Would you mind using whatever stress test that caused commit
6d390e4b5d48 ("locks: fix a potential use-after-free problem when
wakeup a waiter") with this patch? And if you did it analytically,
you're a champ and should look at this patch too!

Thanks,

Linus
NeilBrown
2020-03-16 22:45:37 UTC
Permalink
@@ -740,6 +739,12 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
waiter->fl_lmops->lm_notify(waiter);
else
wake_up(&waiter->fl_wait);
+
+ /*
+ * Tell the world we're done with it - see comment at
+ * top of locks_delete_block().
+ */
+ smp_store_release(&waiter->fl_blocker, NULL);
}
}
@@ -753,11 +758,30 @@ int locks_delete_block(struct file_lock *waiter)
{
int status = -ENOENT;
+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread "owns"
+ * the lock and is the only one that might try to claim the lock.
+ * Because fl_blocker is explicitly set last during a delete, it's
+ * safe to locklessly test to see if it's NULL. If it is, then we know
+ * that no new locks can be inserted into its fl_blocked_requests list,
+ * and we can therefore avoid doing anything further as long as that
+ * list is empty.
I think it would be worth spelling out what the 'acquire' is needed
for. We seem to have a general policy of requiring comment to explain
the presence of barriers.

The 'acquire' on fl_blocker guarantees two things.
1/ that fl_blocked_requests can be tested locklessly. If something was
recently added to that list it must have been in a locked region
*before* the locked region when fl_blocker was set to NULL.
2/ that no other thread is accessing 'waiter', so it is safe to free it.
__locks_wake_up_blocks is careful not to touch waiter after
fl_blocker is released.
+ */
+ if (!smp_load_acquire(&waiter->fl_blocker) &&
+ list_empty(&waiter->fl_blocked_requests))
+ return status;
+
spin_lock(&blocked_lock_lock);
if (waiter->fl_blocker)
status = 0;
__locks_wake_up_blocks(waiter);
__locks_delete_block(waiter);
+
+ /*
+ * Tell the world we're done with it - see comment at top
+ * of this function
This comment might be misleading. The world doesn't care.
Only this thread cares where ->fl_blocker is NULL. We need the release
semantics when some *other* thread sets fl_blocker to NULL, not when
this thread does.
I don't think we need to spell that out and I'm not against using
store_release here, but locks_delete_block cannot race with itself, so
referring to the comment at the top of this function is misleading.

So:
Reviewed-by: NeilBrown <***@suse.de>

but I'm not totally happy with the comments.

Thanks,
NeilBrown
+ */
+ smp_store_release(&waiter->fl_blocker, NULL);
spin_unlock(&blocked_lock_lock);
return status;
}
Jeff Layton
2020-03-17 15:59:24 UTC
Permalink
Post by NeilBrown
+
+ /*
+ * Tell the world we're done with it - see comment at top
+ * of this function
This comment might be misleading. The world doesn't care.
Only this thread cares where ->fl_blocker is NULL. We need the release
semantics when some *other* thread sets fl_blocker to NULL, not when
this thread does.
I don't think we need to spell that out and I'm not against using
store_release here, but locks_delete_block cannot race with itself, so
referring to the comment at the top of this function is misleading.
but I'm not totally happy with the comments.
Thanks Neil. We can clean up the comments before merge. How about this
revision to the earlier patch? I took the liberty of poaching your your
proposed verbiage:

------------------8<---------------------

From c9fbfae0ab615e20de0bdf1ae7b27591d602f577 Mon Sep 17 00:00:00 2001
From: Jeff Layton <***@kernel.org>
Date: Mon, 16 Mar 2020 18:57:47 -0400
Subject: [PATCH] SQUASH: update with Neil's comments

Signed-off-by: Jeff Layton <***@kernel.org>
---
fs/locks.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index eaf754ecdaa8..e74075b0e8ec 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -741,8 +741,9 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
wake_up(&waiter->fl_wait);

/*
- * Tell the world we're done with it - see comment at
- * top of locks_delete_block().
+ * The setting of fl_blocker to NULL marks the official "done"
+ * point in deleting a block. Paired with acquire at the top
+ * of locks_delete_block().
*/
smp_store_release(&waiter->fl_blocker, NULL);
}
@@ -761,11 +762,23 @@ int locks_delete_block(struct file_lock *waiter)
/*
* If fl_blocker is NULL, it won't be set again as this thread "owns"
* the lock and is the only one that might try to claim the lock.
- * Because fl_blocker is explicitly set last during a delete, it's
- * safe to locklessly test to see if it's NULL. If it is, then we know
- * that no new locks can be inserted into its fl_blocked_requests list,
- * and we can therefore avoid doing anything further as long as that
- * list is empty.
+ *
+ * We use acquire/release to manage fl_blocker so that we can
+ * optimize away taking the blocked_lock_lock in many cases.
+ *
+ * The smp_load_acquire guarantees two things:
+ *
+ * 1/ that fl_blocked_requests can be tested locklessly. If something
+ * was recently added to that list it must have been in a locked region
+ * *before* the locked region when fl_blocker was set to NULL.
+ *
+ * 2/ that no other thread is accessing 'waiter', so it is safe to free
+ * it. __locks_wake_up_blocks is careful not to touch waiter after
+ * fl_blocker is released.
+ *
+ * If a lockless check of fl_blocker shows it to be NULL, we know that
+ * no new locks can be inserted into its fl_blocked_requests list, and
+ * can avoid doing anything further if the list is empty.
*/
if (!smp_load_acquire(&waiter->fl_blocker) &&
list_empty(&waiter->fl_blocked_requests))
@@ -778,8 +791,8 @@ int locks_delete_block(struct file_lock *waiter)
__locks_delete_block(waiter);

/*
- * Tell the world we're done with it - see comment at top
- * of this function
+ * The setting of fl_blocker to NULL marks the official "done" point in
+ * deleting a block. Paired with acquire at the top of this function.
*/
smp_store_release(&waiter->fl_blocker, NULL);
spin_unlock(&blocked_lock_lock);
--
2.24.1
NeilBrown
2020-03-17 21:27:40 UTC
Permalink
Post by Jeff Layton
Post by NeilBrown
+
+ /*
+ * Tell the world we're done with it - see comment at top
+ * of this function
This comment might be misleading. The world doesn't care.
Only this thread cares where ->fl_blocker is NULL. We need the release
semantics when some *other* thread sets fl_blocker to NULL, not when
this thread does.
I don't think we need to spell that out and I'm not against using
store_release here, but locks_delete_block cannot race with itself, so
referring to the comment at the top of this function is misleading.
but I'm not totally happy with the comments.
Thanks Neil. We can clean up the comments before merge. How about this
revision to the earlier patch? I took the liberty of poaching your your
Thanks. I'm happy with that.

(Well.... actually I hate the use of the word "official" unless there is
a well defined office holder being blamed. But the word has come to
mean something vaguer in common usage and there is probably no point
fighting it. In this case "formal" is close but less personally
annoying, but I'm not sure the word is needed at all).

Thanks,
NeilBrown
Post by Jeff Layton
------------------8<---------------------
From c9fbfae0ab615e20de0bdf1ae7b27591d602f577 Mon Sep 17 00:00:00 2001
Date: Mon, 16 Mar 2020 18:57:47 -0400
Subject: [PATCH] SQUASH: update with Neil's comments
---
fs/locks.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/fs/locks.c b/fs/locks.c
index eaf754ecdaa8..e74075b0e8ec 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -741,8 +741,9 @@ static void __locks_wake_up_blocks(struct file_lock *blocker)
wake_up(&waiter->fl_wait);
/*
- * Tell the world we're done with it - see comment at
- * top of locks_delete_block().
+ * The setting of fl_blocker to NULL marks the official "done"
+ * point in deleting a block. Paired with acquire at the top
+ * of locks_delete_block().
*/
smp_store_release(&waiter->fl_blocker, NULL);
}
@@ -761,11 +762,23 @@ int locks_delete_block(struct file_lock *waiter)
/*
* If fl_blocker is NULL, it won't be set again as this thread "owns"
* the lock and is the only one that might try to claim the lock.
- * Because fl_blocker is explicitly set last during a delete, it's
- * safe to locklessly test to see if it's NULL. If it is, then we know
- * that no new locks can be inserted into its fl_blocked_requests list,
- * and we can therefore avoid doing anything further as long as that
- * list is empty.
+ *
+ * We use acquire/release to manage fl_blocker so that we can
+ * optimize away taking the blocked_lock_lock in many cases.
+ *
+ *
+ * 1/ that fl_blocked_requests can be tested locklessly. If something
+ * was recently added to that list it must have been in a locked region
+ * *before* the locked region when fl_blocker was set to NULL.
+ *
+ * 2/ that no other thread is accessing 'waiter', so it is safe to free
+ * it. __locks_wake_up_blocks is careful not to touch waiter after
+ * fl_blocker is released.
+ *
+ * If a lockless check of fl_blocker shows it to be NULL, we know that
+ * no new locks can be inserted into its fl_blocked_requests list, and
+ * can avoid doing anything further if the list is empty.
*/
if (!smp_load_acquire(&waiter->fl_blocker) &&
list_empty(&waiter->fl_blocked_requests))
@@ -778,8 +791,8 @@ int locks_delete_block(struct file_lock *waiter)
__locks_delete_block(waiter);
/*
- * Tell the world we're done with it - see comment at top
- * of this function
+ * The setting of fl_blocker to NULL marks the official "done" point in
+ * deleting a block. Paired with acquire at the top of this function.
*/
smp_store_release(&waiter->fl_blocker, NULL);
spin_unlock(&blocked_lock_lock);
--
2.24.1
yangerkun
2020-03-17 01:41:27 UTC
Permalink
Post by Linus Torvalds
+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread "owns"
+ * the lock and is the only one that might try to claim the lock.
+ * Because fl_blocker is explicitly set last during a delete, it's
+ * safe to locklessly test to see if it's NULL. If it is, then we know
+ * that no new locks can be inserted into its fl_blocked_requests list,
+ * and we can therefore avoid doing anything further as long as that
+ * list is empty.
+ */
+ if (!smp_load_acquire(&waiter->fl_blocker) &&
+ list_empty(&waiter->fl_blocked_requests))
+ return status;
Ack. This looks sane to me now.
yangerkun - how did you find the original problem?\
While try to fix CVE-2019-19769, add some log in __locks_wake_up_blocks
help me to rebuild the problem soon. This help me to discern the problem
soon.
Post by Linus Torvalds
Would you mind using whatever stress test that caused commit
6d390e4b5d48 ("locks: fix a potential use-after-free problem when
wakeup a waiter") with this patch? And if you did it analytically,
you're a champ and should look at this patch too!
I will try to understand this patch, and if it's looks good to me, will
do the performance test!

Thanks
yangerkun
2020-03-17 14:05:09 UTC
Permalink
Post by yangerkun
Post by Linus Torvalds
+       /*
+        * If fl_blocker is NULL, it won't be set again as this
thread "owns"
+        * the lock and is the only one that might try to claim the
lock.
+        * Because fl_blocker is explicitly set last during a delete,
it's
+        * safe to locklessly test to see if it's NULL. If it is,
then we know
+        * that no new locks can be inserted into its
fl_blocked_requests list,
+        * and we can therefore avoid doing anything further as long
as that
+        * list is empty.
+        */
+       if (!smp_load_acquire(&waiter->fl_blocker) &&
+           list_empty(&waiter->fl_blocked_requests))
+               return status;
Ack. This looks sane to me now.
yangerkun - how did you find the original problem?\
While try to fix CVE-2019-19769, add some log in __locks_wake_up_blocks
help me to rebuild the problem soon. This help me to discern the problem
soon.
Post by Linus Torvalds
Would you mind using whatever stress test that caused commit
6d390e4b5d48 ("locks: fix a potential use-after-free problem when
wakeup a waiter") with this patch? And if you did it analytically,
you're a champ and should look at this patch too!
I will try to understand this patch, and if it's looks good to me, will
do the performance test!
This patch looks good to me, with this patch, the bug '6d390e4b5d48
("locks: fix a potential use-after-free problem when wakeup a waiter")'
describes won't happen again. Actually, I find that syzkaller has report
this bug before[1], and the log of it can help us to reproduce it with
some latency in __locks_wake_up_blocks!

Also, some ltp testcases describes in [2] pass too with the patch!

For performance test, I have try to understand will-it-scale/lkp, but it
seem a little complex to me, and may need some more time. So, Rong Chen,
can you help to do this? Or the results may come a little later...

Thanks,
----
[1] https://syzkaller.appspot.com/bug?extid=922689db06e57b69c240
[2] https://lkml.org/lkml/2020/3/11/578
Jeff Layton
2020-03-17 16:07:15 UTC
Permalink
Post by yangerkun
Post by yangerkun
Post by Linus Torvalds
+ /*
+ * If fl_blocker is NULL, it won't be set again as this thread "owns"
+ * the lock and is the only one that might try to claim the lock.
+ * Because fl_blocker is explicitly set last during a delete, it's
+ * safe to locklessly test to see if it's NULL. If it is, then we know
+ * that no new locks can be inserted into its
fl_blocked_requests list,
+ * and we can therefore avoid doing anything further as long as that
+ * list is empty.
+ */
+ if (!smp_load_acquire(&waiter->fl_blocker) &&
+ list_empty(&waiter->fl_blocked_requests))
+ return status;
Ack. This looks sane to me now.
yangerkun - how did you find the original problem?\
While try to fix CVE-2019-19769, add some log in __locks_wake_up_blocks
help me to rebuild the problem soon. This help me to discern the problem
soon.
Post by Linus Torvalds
Would you mind using whatever stress test that caused commit
6d390e4b5d48 ("locks: fix a potential use-after-free problem when
wakeup a waiter") with this patch? And if you did it analytically,
you're a champ and should look at this patch too!
I will try to understand this patch, and if it's looks good to me, will
do the performance test!
This patch looks good to me, with this patch, the bug '6d390e4b5d48
("locks: fix a potential use-after-free problem when wakeup a waiter")'
describes won't happen again. Actually, I find that syzkaller has report
this bug before[1], and the log of it can help us to reproduce it with
some latency in __locks_wake_up_blocks!
Also, some ltp testcases describes in [2] pass too with the patch!
For performance test, I have try to understand will-it-scale/lkp, but it
seem a little complex to me, and may need some more time. So, Rong Chen,
can you help to do this? Or the results may come a little later...
Thanks,
----
[1] https://syzkaller.appspot.com/bug?extid=922689db06e57b69c240
[2] https://lkml.org/lkml/2020/3/11/578
Thanks yangerkun. Let me know if you want to add your Reviewed-by tag.

Cheers,
--
Jeff Layton <***@kernel.org>
Loading...