Discussion:
Using TASK_SIZE for kernel threads
(too old to reply)
Martin Schwidefsky
2017-02-24 16:20:01 UTC
Permalink
Hello,

this week we had some fun with kernel 4.10 on s390. Carsten found that
the kernel kept crashing reproducibly on his system. Not on mine or any
other system, just his.

The kdevtmpfs kernel thread crashed in __queued_work as it tried to
terminate. The devtmpfsd function got an error on the sys_mount() call.
Why it crashes on termination is a different story, the interesing part
is why sys_mount() got an error.

After some more debugging Carsten found out that copy_mount_options()
only got 2 bytes of the option string, "mo" instead of "mode=0755".
It turned out that the s390 definition of TASK_SIZE together with the
size calculation in copy_mount_options causes this:

#define TASK_SIZE_OF(tsk) ((tsk)->mm->context.asce_limit)
and
size = TASK_SIZE - (unsigned long)data;

For a kernel thread (tsk)->mm is zero and the value located at
(0)->context.asce_limit happened to be close enough to the data
pointer that the 'size' result is a small number, in this case 2.

Now I fixed this in the s390 code, the patch is queued and will be
included in next weeks please-pull. But I am wondering about the use
of TASK_SIZE in kernel threads. For x86 copy_mount_options works
because the size calculation will give a negative result for 'data'
pointing to kernel space. Which is corrected by the size limit:

if (size > PAGE_SIZE)
size = PAGE_SIZE;

Wouldn't it be cleaner to test "get_fs()==KERNEL_DS" and just use
size=4096 in this case? The detour via TASK_SIZE does not make much
sense to me.

To find out how big the problem is, I have added a warning to TASK_SIZE
to create a console messsage if it is called for a task without an mm.
The only hit has been copy_mount_options.

For reference I have included the s390 patch.

Martin Schwidefsky (1):
s390: TASK_SIZE for kernel threads

arch/s390/include/asm/processor.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--
2.7.4
Martin Schwidefsky
2017-02-24 16:20:02 UTC
Permalink
Return a sensible value if TASK_SIZE if called from a kernel thread.

This gets us around an issue with copy_mount_options that does a magic
size calculation "TASK_SIZE - (unsigned long)data" while in a kernel
thread and data pointing to kernel space.

Cc: <***@vger.kernel.org>
Signed-off-by: Martin Schwidefsky <***@de.ibm.com>
---
arch/s390/include/asm/processor.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index dacba34..4045639 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -89,7 +89,8 @@ extern void execve_tail(void);
* User space process size: 2GB for 31 bit, 4TB or 8PT for 64 bit.
*/

-#define TASK_SIZE_OF(tsk) ((tsk)->mm->context.asce_limit)
+#define TASK_SIZE_OF(tsk) ((tsk)->mm ? \
+ (tsk)->mm->context.asce_limit : TASK_MAX_SIZE)
#define TASK_UNMAPPED_BASE (test_thread_flag(TIF_31BIT) ? \
(1UL << 30) : (1UL << 41))
#define TASK_SIZE TASK_SIZE_OF(current)
--
2.7.4
Linus Torvalds
2017-02-25 18:20:02 UTC
Permalink
On Fri, Feb 24, 2017 at 8:15 AM, Martin Schwidefsky
Post by Martin Schwidefsky
Now I fixed this in the s390 code, the patch is queued and will be
included in next weeks please-pull. But I am wondering about the use
of TASK_SIZE in kernel threads. For x86 copy_mount_options works
because the size calculation will give a negative result for 'data'
if (size > PAGE_SIZE)
size = PAGE_SIZE;
Wouldn't it be cleaner to test "get_fs()==KERNEL_DS" and just use
size=4096 in this case? The detour via TASK_SIZE does not make much
sense to me.
To find out how big the problem is, I have added a warning to TASK_SIZE
to create a console messsage if it is called for a task without an mm.
The only hit has been copy_mount_options.
So copy_mount_options() is a horrible hack. It doesn't have a size
limit, and it can copy binary data, so our good auto-limiting code in
strncpy_from_user() isn't usable either.

It probably *should* use the same user_addr_max() logic that
strncpy_from_user() uses, but that wouldn't actually have helped s390,
because s390 doesn't use the generic strncpy_from_user(), and doesn't
have that user_addr_max() thing.

So from everything I see, I think this is actually a s390 bug in every
way. Your TASK_SIZE_OF() implementation is simply bogus and broken,
and that's the core problem.

For example, you could have just had

#define user_addr_max() (current_thread_info()->addr_limit.seg)

like some other architectures, and it would have been all good.

If somebody is willing to add user_addr_max() to all architectures and
make copy_mount_options() use the same logic as
lib/strncpy_from_user.c, then that would certainly be acceptable to
me. As it is, I think it uses TASK_SIZE in ways that are not pretty,
but are what they are..

Linus
Martin Schwidefsky
2017-02-27 09:50:02 UTC
Permalink
On Sat, 25 Feb 2017 10:19:04 -0800
Post by Linus Torvalds
On Fri, Feb 24, 2017 at 8:15 AM, Martin Schwidefsky
Post by Martin Schwidefsky
Now I fixed this in the s390 code, the patch is queued and will be
included in next weeks please-pull. But I am wondering about the use
of TASK_SIZE in kernel threads. For x86 copy_mount_options works
because the size calculation will give a negative result for 'data'
if (size > PAGE_SIZE)
size = PAGE_SIZE;
Wouldn't it be cleaner to test "get_fs()==KERNEL_DS" and just use
size=4096 in this case? The detour via TASK_SIZE does not make much
sense to me.
To find out how big the problem is, I have added a warning to TASK_SIZE
to create a console messsage if it is called for a task without an mm.
The only hit has been copy_mount_options.
So copy_mount_options() is a horrible hack. It doesn't have a size
limit, and it can copy binary data, so our good auto-limiting code in
strncpy_from_user() isn't usable either.
It probably *should* use the same user_addr_max() logic that
strncpy_from_user() uses, but that wouldn't actually have helped s390,
because s390 doesn't use the generic strncpy_from_user(), and doesn't
have that user_addr_max() thing.
I see, set_fs(KERNEL_DS) sets a different address for user_addr_max to
return. That would work but requires that all architectures have the
define.
Post by Linus Torvalds
So from everything I see, I think this is actually a s390 bug in every
way. Your TASK_SIZE_OF() implementation is simply bogus and broken,
and that's the core problem.
For example, you could have just had
#define user_addr_max() (current_thread_info()->addr_limit.seg)
like some other architectures, and it would have been all good.
The background is that TASK_SIZE on s390 is not a constant, it depends
on the layout of the mm. There are three, 2GB for 31-bit with a 2-level
page table, 4TB for a standard 64-bit process with a 3-level page table
and 8PB with 4 levels for a process that did a really large mmap.
The upgrade from 4TB to 8PB is at runtime, that is why the size
of the mm is stored in mm->context. It is an attribute of the mm, if
one thread changes it, it changes for all threads.
Post by Linus Torvalds
If somebody is willing to add user_addr_max() to all architectures and
make copy_mount_options() use the same logic as
lib/strncpy_from_user.c, then that would certainly be acceptable to
me. As it is, I think it uses TASK_SIZE in ways that are not pretty,
but are what they are..
I guess that won't happen anytime soon. I will use the proposed fix
within the arch code. Thanks.
--
blue skies,
Martin.

"Reality continues to ruin my life." - Calvin.
Loading...