[PATCH kcsan 27/32] kcsan: Add option to allow watcher interruptions

Discussion:

(too old to reply)

Paul E. McKenney

2020-03-16 15:45:35 UTC

Add option to allow interrupts while a watchpoint is set up. This can be
enabled either via CONFIG_KCSAN_INTERRUPT_WATCHER or via the boot
parameter 'kcsan.interrupt_watcher=1'.
Note that, currently not all safe per-CPU access primitives and patterns
are accounted for, which could result in false positives. For example,
asm-generic/percpu.h uses plain operations, which by default are
instrumented. On interrupts and subsequent accesses to the same
variable, KCSAN would currently report a data race with this option.
Therefore, this option should currently remain disabled by default, but
may be enabled for specific test scenarios.
To avoid new warnings, changes all uses of smp_processor_id() to use the
raw version (as already done in kcsan_found_watchpoint()). The exact SMP
processor id is for informational purposes in the report, and
correctness is not affected.

And I get silent hangs that bisect to this patch when running the
following rcutorture command, run in the kernel source tree on a
bash tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 12 --duration 10 --kconfig "CONFIG_DEBUG_INFO=y CONFIG_KCSAN=y CONFIG_KCSAN_ASSUME_PLAIN_WRITES_ATOMIC=n CONFIG_KCSAN_REPORT_VALUE_CHANGE_ONLY=n CONFIG_KCSAN_REPORT_ONCE_IN_MS=100000 CONFIG_KCSAN_VERBOSE=y CONFIG_KCSAN_INTERRUPT_WATCHER=y" --configs TREE03
It works fine on some (but not all) of the other rcutorture test
scenarios. It fails on TREE01, TREE02, TREE03, TREE09. The common thread
is that these are the TREE scenarios are all PREEMPT=y. So are RUDE01,
SRCU-P, TASKS01, and TASKS03, but these scenarios are not hammering
on Tree RCU, and thus have far less interrupt activity and the like.
Given that it is an interrupt-related feature being added by this commit,
this seems like expected (mis)behavior.
Can you reproduce this? If not, are there any diagnostics I can add to
my testing? Or a diagnostic patch I could apply?

I think I can reproduce it. Let me debug some more, so far I haven't
found anything yet.
What I do know is that it's related to reporting. Turning kcsan_report
into a noop makes the test run to completion.

I should hasten to add that this feature was quite helpful in recent work!

Good to know. :-) We can probably keep this patch, since the default
config doesn't turn this on. But I will try to see what's up with the
hangs, and hopefully find a fix.

So this one turned out to be quite interesting. We can get deadlocks
if we can set up multiple watchpoints per task in case it's
interrupted and the interrupt sets up another watchpoint, and there
are many concurrent races happening; because the other_info struct in
report.c may never be released if an interrupt blocks the consumer due
to waiting for other_info to become released.

Been there, done that! ;-)

Give me another day or 2 to come up with a decent fix.

My thought is to send a pull request for the commits up to but not
including this patch, allowing ample development and testing time for
the fix. My concern with sending this, even with a fix, is that any
further bugs might cast a shadow on the whole series, further slowing
acceptance into mainline.

Fair enough?

Thanx, Paul

Marco Elver

2020-03-17 11:12:36 UTC

Permalink

Hi Marco,

Introduces ASSERT_EXCLUSIVE_WRITER and ASSERT_EXCLUSIVE_ACCESS, which
may be used to assert properties of synchronization logic, where
violation cannot be detected as a normal data race.
==================================================================
BUG: KCSAN: assert: race in test_thread / test_thread
test_thread+0x8d/0x111
debugfs_write.cold+0x32/0x44
...
test_thread+0xa3/0x111
debugfs_write.cold+0x32/0x44
...
==================================================================
==================================================================
BUG: KCSAN: assert: race in test_thread / test_thread
test_thread+0xb9/0x111
debugfs_write.cold+0x32/0x44
...
test_thread+0x77/0x111
debugfs_write.cold+0x32/0x44
...
==================================================================
---
include/linux/kcsan-checks.h | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/include/linux/kcsan-checks.h b/include/linux/kcsan-checks.h
index 5dcadc2..cf69617 100644
--- a/include/linux/kcsan-checks.h
+++ b/include/linux/kcsan-checks.h
@@ -96,4 +96,44 @@ static inline void kcsan_check_access(const volatile void *ptr, size_t size,
kcsan_check_access(ptr, size, KCSAN_ACCESS_ATOMIC | KCSAN_ACCESS_WRITE)
#endif
+/**
+ *
+ * allowed. This assertion can be used to specify properties of concurrent code,
+ * where violation cannot be detected as a normal data race.
+ *

I like the idea that we can assert no other writers, however I think
assertions like ASSERT_EXCLUSIVE_WRITER() are a little limited. For
preempt_disable();
do_sth();
raw_cpu_write(var, 1);
do_sth_else();
preempt_enable();
preempt_disable();
do_sth();
ASSERT_EXCLUSIVE_WRITER(var);
raw_cpu_write(var, 1);
do_sth_else();
preempt_enable();
, but, if I understand how KCSAN works correctly, it only works if the
another writer happens when the ASSERT_EXCLUSIVE_WRITER(var) is called,
IOW, it can only detect another writer between do_sth() and
raw_cpu_write(). But our intent is to prevent other writers for the
whole preemption-off section. With this assertion introduced, people may

To confirm: KCSAN will detect a race if it sets up a watchpoint on
ASSERT_EXCLUSIVE_WRITER(var), and a concurrent write happens. Note
that the watchpoints aren't always set up, but only periodically
(discussed more below). For every watchpoint, we also inject an
if watchpoint for access already set up {
consume watchpoint;
else if should set up watchpoint {
setup watchpoint;
udelay(...);
check watchpoint consumed;
release watchpoint;
}

Yes, I get this part.

preempt_disable();
ASSERT_EXCLUSIVE_WRITER(var);
do_sth();
ASSERT_EXCLUSIVE_WRITER(var);
raw_cpu_write(var, 1);
ASSERT_EXCLUSIVE_WRITER(var);
do_sth_else();
ASSERT_EXCLUSIVE_WRITER(var);
preempt_enable();
and that is horrible...

It is, and I would strongly discourage any such use, because it's not
necessary. See below.

So how about making a pair of annotations
ASSERT_EXCLUSIVE_WRITER_BEGIN() and ASSERT_EXCLUSIVE_WRITER_END(), so
preempt_disable();
ASSERT_EXCLUSIVE_WRITER_BEGIN(var);
do_sth();
raw_cpu_write(var, 1);
do_sth_else();
ASSERT_EXCLUSIVE_WRITER_END(var);
preempt_enable();
ASSERT_EXCLUSIVE_WRITER_BEGIN() could be a rough version of watchpoint
setting up and ASSERT_EXCLUSIVE_WRITER_END() could be watchpoint
removing. So I think it's feasible.

Keep in mind that the time from ASSERT_EXCLUSIVE_WRITER_BEGIN to END
might be on the order of a few nanosec, whereas KCSAN's default
watchpoint delay is 10s of microsec (default ~80 for tasks). That
means we would still have to set up a delay somewhere, and the few
nanosec between BEGIN and END are insignificant and don't buy us
anything.

Yeah, the delay doesn't buy us anything given the default watchpoint
delay, and I agree even with *_{BEGIN/END}, we still need to set up a
delay somewhere. Adding a delay makes the watchpoint live longer so that
a problem will more likely happen, but sometimes the delay won't be
if (per_cpu(var, cpu) == 1)
per_cpu(var, cpu) = 0;
in this user case, percpu variable "var" is used for maintaining some
state machine, and a CPU set a state with its own variable so that other
preempt_disable();
do_sth();
ASSERT_EXCLUSIVE_WRITER(var);
raw_cpu_write(var, 1);
do_sth_else();
preempt_enable();

Right, the example makes sense.

That is assuming there are various other expected racy reads that are
fine. If that's not true, ASSERT_EXCLUSIVE_ACCESS should be
considered.

, no matter how long the delay is set. Another example: let's say the
do_sth_else() above is actually an operation that queues a callback
which writes to "var". In one version, do_sth_else() uses call_rcu(),
which works, because preemption-off is treated as RCU read-side critical
section, so we are fine. But if someone else changes it to queue_work()
for some reason, the code is just broken, and KCSAN cannot detect it, no
matter how long the delay is.
To summarize, a delay is helpful to trigger a problem because it allows
_other_ CPU/threads to run more code and do more memory accesses,
however it's not helpful if a particular problem happens due to some
memory effects of the current/watched CPU/thread. While *_{BEGIN/END}
can be helpful in this case.

Makes sense.

Re feasibility: Right now setting up and removing watchpoints is not
exposed, and doing something like this would be an extremely intrusive
change. Because of that, without being able to quantify the actual
usefulness of this, and having evaluated better options (see below),
I'd recommend not pursuing this.

Thoughts?

Firstly, what is your objective? From what I gather you want to
increase the probability of detecting a race with 'var'.

Right, I want to increase the probablity.

I agree, and have been thinking about it, but there are other options
that haven't been exhausted, before we go and make the interface more
complicated.
== Interface design ==
The interface as it is right now, is intuitive and using it is hard to
get wrong. Demanding begin/end markers introduces complexity that will

Yeah, the interface is intuitive, however it's still an extra effort to
put those assertions, right? Which means it doesn't come for free,
compared to other detection KCSAN can do, the developers don't need to
put extra lines of code. Given the extra effort for developers to use
the detect, I think we should dicuss the design thoroughly.
Besides the semantics of assertions is usually "do some checking right
now to see if things go wrong", and I don't think it quite matches the
semantics of an exclusive writer: "in this piece of code, I'm the only
one who can do the write".

undoubtedly result in incorrect usage, because as soon as you somehow
forget to end the region, you'll get tons of false positives. This may
be due to control-flow that was missed etc. We had a similar problem
with seqlocks, and getting them to work correctly with KCSAN was
extremely difficult, because clear begin and end markers weren't
always given. I imagine introducing an interface like this will
ultimately result in similar problems, as much as we'd like to believe
this won't ever happen.

Well, if we use *_{BEGIN,END} approach, one solution is combining them
with sections introducing primitives (such as preemp_disable() and
preempt_enable()), for example, we can add
#define preempt_disable_for(var) \
do { \
preempt_disable(); \
ASSERT_EXCLUSIVE_WRITER_BEGIN(var); \
}
#define preempt_enable_for(var) \
do { \
ASSERT_EXCLUSIVE_WRITER_END(var); \
preempt_enable(); \
}
(similar for spin lock)
#define spin_lock_for(lock, var) \
do { \
spin_lock(lock); \
ASSERT_EXCLUSIVE_WRITER_BEGIN(var); \
}
#define spin_unlock_for(lock, var) \
do { \
ASSERT_EXCLUSIVE_WRITER_END(var); \
spin_unlock(lock); \
}
I admit that I haven't thought this thoroughly, but I think this works,
and besides primitives like above can help the reader to understand the
questions like: what this lock/preemption-off critical sections are
protecting?

I can't say anything about introducing even more macros. I'd say we
need at least a dozen use-cases or more and understand them, otherwise
we may end up with the wrong API that we can never take back.

Thoughts?

Makes sense for the cases you described.

Changing KCSAN to do this is a major change. On surface, it seems like
a refactor and exporting some existing functionality, but there are
various new corner cases, because now 2 accesses don't really have to
be concurrent anymore to detect a race (and simple properties like a
thread can't race with itself need to be taken care of). The existing
ASSERT_EXCLUSIVE macros were able to leverage existing functionality
mostly as-is. So, to motivate something like this, we need at least a
dozen or so good use-cases, where careful placement of an existing
ASSERT_EXCLUSIVE would not catch what you describe.

Thanks,
-- Marco

Regards,
Boqun

== Improving race detection for KCSAN_ACCESS_ASSERT access types ==
1. Always set up a watchpoint for assert-type accesses, and ignore
KCSAN_SKIP_WATCH/kcsan_skip counter (see 'should_watch()'). One
problem with this is that it would seriously impact overall
performance as soon as we get a few ASSERT_EXCLUSIVE_*() in a hot path
somewhere. A compromise might be simply being more aggressive with
setting up watchpoints on assert-type accesses.
2. Let's say in the above example (without BEGIN/END) the total
duration (via udelay) of watchpoints for 'var' being set up is 4*D.
Why not just increase the watchpoint delay for assert-type accesses to
4*D? Then, just having one ASSERT_EXCLUSIVE_WRITER(var) somewhere in
the region would have the same probability of catching a race.
(Assuming that the region's remaining execution time is on the order
of nanosecs.)
I have some limited evidence that (1) is going to help, but not (2).
This is based on experiments trying to reproduce racy use-after-free
bugs that KASAN found, but with KCSAN. The problem is that it does
slow-down overall system performance if in a hot path like an
allocator. Which led me to a 3rd option.
3. Do option (1) but do the opposite of (2), i.e. always set up a
watchpoint on assert-type accesses, but *reduce* the watchpoint delay.
I haven't yet sent a patch for any one of 1-3 because I'm hesitant
until we can actually show one of them would always be useful and
improve things. For now, the best thing is to dynamically adjust
udelay_{task,interrupt} and skip_watch either via Kconfig options or
/sys/modules/kcsan/parameters/ and not add more complexity without
good justification. A good stress test will also go a long way.
There are some more (probably bad) ideas I have, but the above are the
best options for now.
So, anything that somehow increases the total time that a watchpoint
is set up will increase the probability of detecting a race. However,
we're also trying to balance overall system performance, as poor
performance could equally affect race detection negatively (fewer
instructions executed, etc.). Right now any one of 1-3 might sound
like a decent idea, but I don't know what it will look like once we
have dozens of ASSERT_EXCLUSIVE_*() in places, especially if a few of
them are in hot paths.
Thanks,
-- Marco

Regards,
Boqun

+ * For example, if a per-CPU variable is only meant to be written by a single
+ * CPU, but may be read from other CPUs; in this case, reads and writes must be
+ * marked properly, however, if an off-CPU WRITE_ONCE() races with the owning
+ * CPU's WRITE_ONCE(), would not constitute a data race but could be a harmful
+ * race condition. Using this macro allows specifying this property in the code
+ * and catch such bugs.
+ *
+ */
+#define ASSERT_EXCLUSIVE_WRITER(var) \
+ __kcsan_check_access(&(var), sizeof(var), KCSAN_ACCESS_ASSERT)
+
+/**
+ *
+ * assertion can be used to specify properties of concurrent code, where
+ * violation cannot be detected as a normal data race.
+ *
+ * For example, in a reference-counting algorithm where exclusive access is
+ * expected after the refcount reaches 0. We can check that this property
+ *
+ * if (refcount_dec_and_test(&obj->refcnt)) {
+ * ASSERT_EXCLUSIVE_ACCESS(*obj);
+ * safely_dispose_of(obj);
+ * }
+ *
+ */
+#define ASSERT_EXCLUSIVE_ACCESS(var) \
+ __kcsan_check_access(&(var), sizeof(var), KCSAN_ACCESS_WRITE | KCSAN_ACCESS_ASSERT)
+
#endif /* _LINUX_KCSAN_CHECKS_H */
--
2.9.5

Paul E. McKenney

2020-03-17 17:13:20 UTC

Permalink

Post by Paul E. McKenney

I should hasten to add that this feature was quite helpful in recent work!

Good to know. :-) We can probably keep this patch, since the default
config doesn't turn this on. But I will try to see what's up with the
hangs, and hopefully find a fix.

Been there, done that! ;-)

Give me another day or 2 to come up with a decent fix.

That's fine. I think the features changes can stay on -rcu/kcsan-dev
for now, but the documentation updates don't depend on them.
If it'd be useful, the updated documentation could be moved before
this patch to -rcu/kcsan, so we'd have
kcsan: Add current->state to implicitly atomic accesses
kcsan: Add option for verbose reporting
kcsan: Add option to allow watcher interruptions
-- cut --
kcsan: Update API documentation in kcsan-checks.h
kcsan: Update Documentation/dev-tools/kcsan.rst
kcsan: Fix a typo in a comment
.. rest of series ..
Although I'm fine with either.

Given my churn with a recent merge window, I am more reluctant than
I might otherwise be to do that sort of rearrangement. Sorry to be
so cowardly!

Thanx, Paul

Marco Elver

2020-03-17 17:44:21 UTC

Permalink

Post by Paul E. McKenney

I should hasten to add that this feature was quite helpful in recent work!

Good to know. :-) We can probably keep this patch, since the default
config doesn't turn this on. But I will try to see what's up with the
hangs, and hopefully find a fix.

Been there, done that! ;-)

Give me another day or 2 to come up with a decent fix.

Given my churn with a recent merge window, I am more reluctant than
I might otherwise be to do that sort of rearrangement. Sorry to be
so cowardly!

No problem. This should be fine either way.

Thank you!
-- Marco