Salvatore Bonaccorso
2023-10-24 21:20:01 UTC
Hi Timo,
forward this directly to upstream including the people for the
bisected commit to get some idea.
Can you reproduce the issue with 6.5.8-1 in unstable as well?
If not, are you able to isolate an upstream fix which should be
backported to the 6.1.y series as well?
Regards,
Salvatore
Package: src:linux
Version: 6.1.55-1
Severity: normal
1) Install Debian 12 as a virtual machine using virt-manager, choose qxl
graphics card. You only need basic installation without wayland or X.
#!/bin/bash
chvt 3
for j in $(seq 80); do
echo "$(date) starting round $j"
if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ];
then
echo "bug was reproduced after $j tries"
exit 1
fi
for i in $(seq 100); do
dmesg > /dev/tty3
done
done
echo "bug could not be reproduced"
exit 0
3) Run chmod a+x reproduce.bash
4) Run ./reproduce.bash and wait for up to 20 minutes.
4) The system prints a steady flow of text without kernel error messages
4) At some point the text stops flowing and the script prints "bug was
reproduced". If you run "journalctl --boot" you see
kernel: [TTM] Buffer eviction failed
kernel: qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
1) The bug does not occur if I downgrade the kernel to
linux-image-5.10.0-26-amd64_5.10.197-1_amd64.deb from Debian 11.
2) I used the following test_linux.bash to bisect this issue against
#!/bin/bash
set -x
git checkout drivers/gpu/drm/ttm/ttm_bo.c include/drm/ttm/ttm_bo_api.h
git show bec771b5e0901f4b0bc861bcb58056de5151ae3a | patch -p1
# Build
cp ~/kernel.config .config
# cp /boot/config-$(uname -r) .config
# scripts/config --enable LOCALVERSION_AUTO
# scripts/config --disable DEBUG_INFO
# scripts/config --disable SYSTEM_TRUSTED_KEYRING
# scripts/config --set-str SYSTEM_TRUSTED_KEYS ''
# scripts/config --disable STACKPROTECTOR_STRONG
make olddefconfig
# make localmodconfig
make -j$(nproc --all) bindeb-pkg
rc="$?"
if [ "$rc" != "0" ]; then
exit 125
fi
git checkout drivers/gpu/drm/ttm/ttm_bo.c include/drm/ttm/ttm_bo_api.h
package="$(ls --sort=time ../linux-image-*_amd64.deb|head -n1)"
version=$(echo $package | cut -d_ -f1|cut -d- -f3-)
if [ "$gitversion" != "$version" ]; then
echo "Build produced version $gitversion but got $version, ignoring"
#exit 255
fi
# Deploy
scp $package target:a.deb
ssh target sudo apt install ./a.deb
ssh target rm -f a.deb
ssh target ./grub_set_default_version.bash $version
ssh target sudo shutdown -r now
sleep 40
detected_version=$(ssh target uname -r)
if [ "$detected_version" != "$version" ]; then
echo "Booted to $detected_version but expected $version"
exit 255
fi
# Test
exec ssh target sudo ./reproduce.bash
git bisect start
# bad: [ed29c2691188cf7ea2a46d40b891836c2bd1a4f5] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.
git bisect bad ed29c2691188cf7ea2a46d40b891836c2bd1a4f5
# bad: [762949bb1da78941b25e63f7e952af037eee15a9] drm: fix drm_mode_create_blob comment
git bisect bad 762949bb1da78941b25e63f7e952af037eee15a9
# bad: [e40f97ef12772f8eb04b6a155baa1e0e2e8f3ecc] drm/gma500: Drop DRM_GMA600 config option
git bisect bad e40f97ef12772f8eb04b6a155baa1e0e2e8f3ecc
# bad: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
git bisect bad 5a838e5d5825c85556011478abde708251cc0776
# bad: [d2b6f8a179194de0ffc4886ffc2c4358d86047b8] Merge tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad d2b6f8a179194de0ffc4886ffc2c4358d86047b8
# bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
# bad: [0698b13403788a646073fcd9b2294f2dce0ce429] drm/amdgpu: skip PP_MP1_STATE_UNLOAD on aldebaran
git bisect bad 0698b13403788a646073fcd9b2294f2dce0ce429
# bad: [e1a5e6a8c48bf99ea374fb3e535661cfe226bca4] drm/doc: Add RFC section
git bisect bad e1a5e6a8c48bf99ea374fb3e535661cfe226bca4
# bad: [ed29c2691188cf7ea2a46d40b891836c2bd1a4f5] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.
git bisect bad ed29c2691188cf7ea2a46d40b891836c2bd1a4f5
# bad: [2c8ab3339e398bbbcb0980933e266b93bedaae52] drm/i915: Pin timeline map after first timeline pin, v4.
git bisect bad 2c8ab3339e398bbbcb0980933e266b93bedaae52
# bad: [2eb8e1a69d9f8cc9c0a75e327f854957224ba421] drm/i915/gem: Drop relocation support on all new hardware (v6)
git bisect bad 2eb8e1a69d9f8cc9c0a75e327f854957224ba421
# bad: [b5b6f6a610127b17f20c0ca03dd27beee4ddc2b2] drm/i915/gem: Drop legacy execbuffer support (v2)
git bisect bad b5b6f6a610127b17f20c0ca03dd27beee4ddc2b2
# bad: [06debd6e1b28029e6e77c41e59a162868f377897] Merge tag 'drm-intel-next-2021-03-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
git bisect bad 06debd6e1b28029e6e77c41e59a162868f377897
# good: [e19eede54240d64b4baf9b0df4dfb8191f7ae48b] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
git bisect good e19eede54240d64b4baf9b0df4dfb8191f7ae48b
# good: [1e28eed17697bcf343c6743f0028cc3b5dd88bf0] Linux 5.12-rc3
git bisect good 1e28eed17697bcf343c6743f0028cc3b5dd88bf0
# bad: [6af70eb3b40edfc8bdf2373cdc2bcf9d5a20c8c7] drm/atmel-hlcdc: Rename custom plane state variable
git bisect bad 6af70eb3b40edfc8bdf2373cdc2bcf9d5a20c8c7
# good: [4ca77c513537700d3fae69030879f781dde1904c] drm/qxl: release shadow on shutdown
git bisect good 4ca77c513537700d3fae69030879f781dde1904c
# bad: [4a11bd1e88af130f50a72e0f54391c1c7d268e03] drm/ast: Add constants for VGACRCB register bits
git bisect bad 4a11bd1e88af130f50a72e0f54391c1c7d268e03
# bad: [5c209d8056b9763ce544ecd7dadb3782cdaf96ed] drm/gma500: psb_spank() doesn't need it's own file
git bisect bad 5c209d8056b9763ce544ecd7dadb3782cdaf96ed
# bad: [db0c6bd2c0c0dada8927cd46a7c34c316a3a6c04] drm/gem: Export drm_gem_vmap() and drm_gem_vunmap()
git bisect bad db0c6bd2c0c0dada8927cd46a7c34c316a3a6c04
# bad: [f4a84e165e6d58606097dd07b5b78767a94b870c] drm/qxl: allocate dumb buffers in ram
git bisect bad f4a84e165e6d58606097dd07b5b78767a94b870c
# good: [a7709b9b89a67f3ead2d188b1d0c261059b1f291] drm/qxl: handle shadow in primary destroy
git bisect good a7709b9b89a67f3ead2d188b1d0c261059b1f291
# bad: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
git bisect bad 5a838e5d5825c85556011478abde708251cc0776
# good: [5f6c871fe919999774e8535ea611a6f84ee43ee4] drm/qxl: properly free qxl releases
git bisect good 5f6c871fe919999774e8535ea611a6f84ee43ee4
# first bad commit: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
I took a look at
commit 5a838e5d5825c85556011478abde708251cc0776 (refs/bisect/bad)
Date: Thu Feb 4 15:57:10 2021 +0100
drm/qxl: simplify qxl_fence_wait
Now that we have the new release_event wait queue we can just
use that in qxl_fence_wait() and simplify the code a lot.
and noticed that the bug does not occur if I boot 6.1 kernel with this patch
reverted (see attached file).
Thanks for the excelent constructed report! I think it's best toVersion: 6.1.55-1
Severity: normal
1) Install Debian 12 as a virtual machine using virt-manager, choose qxl
graphics card. You only need basic installation without wayland or X.
#!/bin/bash
chvt 3
for j in $(seq 80); do
echo "$(date) starting round $j"
if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ];
then
echo "bug was reproduced after $j tries"
exit 1
fi
for i in $(seq 100); do
dmesg > /dev/tty3
done
done
echo "bug could not be reproduced"
exit 0
3) Run chmod a+x reproduce.bash
4) Run ./reproduce.bash and wait for up to 20 minutes.
4) The system prints a steady flow of text without kernel error messages
4) At some point the text stops flowing and the script prints "bug was
reproduced". If you run "journalctl --boot" you see
kernel: [TTM] Buffer eviction failed
kernel: qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
1) The bug does not occur if I downgrade the kernel to
linux-image-5.10.0-26-amd64_5.10.197-1_amd64.deb from Debian 11.
2) I used the following test_linux.bash to bisect this issue against
#!/bin/bash
set -x
git checkout drivers/gpu/drm/ttm/ttm_bo.c include/drm/ttm/ttm_bo_api.h
git show bec771b5e0901f4b0bc861bcb58056de5151ae3a | patch -p1
# Build
cp ~/kernel.config .config
# cp /boot/config-$(uname -r) .config
# scripts/config --enable LOCALVERSION_AUTO
# scripts/config --disable DEBUG_INFO
# scripts/config --disable SYSTEM_TRUSTED_KEYRING
# scripts/config --set-str SYSTEM_TRUSTED_KEYS ''
# scripts/config --disable STACKPROTECTOR_STRONG
make olddefconfig
# make localmodconfig
make -j$(nproc --all) bindeb-pkg
rc="$?"
if [ "$rc" != "0" ]; then
exit 125
fi
git checkout drivers/gpu/drm/ttm/ttm_bo.c include/drm/ttm/ttm_bo_api.h
package="$(ls --sort=time ../linux-image-*_amd64.deb|head -n1)"
version=$(echo $package | cut -d_ -f1|cut -d- -f3-)
if [ "$gitversion" != "$version" ]; then
echo "Build produced version $gitversion but got $version, ignoring"
#exit 255
fi
# Deploy
scp $package target:a.deb
ssh target sudo apt install ./a.deb
ssh target rm -f a.deb
ssh target ./grub_set_default_version.bash $version
ssh target sudo shutdown -r now
sleep 40
detected_version=$(ssh target uname -r)
if [ "$detected_version" != "$version" ]; then
echo "Booted to $detected_version but expected $version"
exit 255
fi
# Test
exec ssh target sudo ./reproduce.bash
git bisect start
# bad: [ed29c2691188cf7ea2a46d40b891836c2bd1a4f5] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.
git bisect bad ed29c2691188cf7ea2a46d40b891836c2bd1a4f5
# bad: [762949bb1da78941b25e63f7e952af037eee15a9] drm: fix drm_mode_create_blob comment
git bisect bad 762949bb1da78941b25e63f7e952af037eee15a9
# bad: [e40f97ef12772f8eb04b6a155baa1e0e2e8f3ecc] drm/gma500: Drop DRM_GMA600 config option
git bisect bad e40f97ef12772f8eb04b6a155baa1e0e2e8f3ecc
# bad: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
git bisect bad 5a838e5d5825c85556011478abde708251cc0776
# bad: [d2b6f8a179194de0ffc4886ffc2c4358d86047b8] Merge tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad d2b6f8a179194de0ffc4886ffc2c4358d86047b8
# bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
# bad: [0698b13403788a646073fcd9b2294f2dce0ce429] drm/amdgpu: skip PP_MP1_STATE_UNLOAD on aldebaran
git bisect bad 0698b13403788a646073fcd9b2294f2dce0ce429
# bad: [e1a5e6a8c48bf99ea374fb3e535661cfe226bca4] drm/doc: Add RFC section
git bisect bad e1a5e6a8c48bf99ea374fb3e535661cfe226bca4
# bad: [ed29c2691188cf7ea2a46d40b891836c2bd1a4f5] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.
git bisect bad ed29c2691188cf7ea2a46d40b891836c2bd1a4f5
# bad: [2c8ab3339e398bbbcb0980933e266b93bedaae52] drm/i915: Pin timeline map after first timeline pin, v4.
git bisect bad 2c8ab3339e398bbbcb0980933e266b93bedaae52
# bad: [2eb8e1a69d9f8cc9c0a75e327f854957224ba421] drm/i915/gem: Drop relocation support on all new hardware (v6)
git bisect bad 2eb8e1a69d9f8cc9c0a75e327f854957224ba421
# bad: [b5b6f6a610127b17f20c0ca03dd27beee4ddc2b2] drm/i915/gem: Drop legacy execbuffer support (v2)
git bisect bad b5b6f6a610127b17f20c0ca03dd27beee4ddc2b2
# bad: [06debd6e1b28029e6e77c41e59a162868f377897] Merge tag 'drm-intel-next-2021-03-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
git bisect bad 06debd6e1b28029e6e77c41e59a162868f377897
# good: [e19eede54240d64b4baf9b0df4dfb8191f7ae48b] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
git bisect good e19eede54240d64b4baf9b0df4dfb8191f7ae48b
# good: [1e28eed17697bcf343c6743f0028cc3b5dd88bf0] Linux 5.12-rc3
git bisect good 1e28eed17697bcf343c6743f0028cc3b5dd88bf0
# bad: [6af70eb3b40edfc8bdf2373cdc2bcf9d5a20c8c7] drm/atmel-hlcdc: Rename custom plane state variable
git bisect bad 6af70eb3b40edfc8bdf2373cdc2bcf9d5a20c8c7
# good: [4ca77c513537700d3fae69030879f781dde1904c] drm/qxl: release shadow on shutdown
git bisect good 4ca77c513537700d3fae69030879f781dde1904c
# bad: [4a11bd1e88af130f50a72e0f54391c1c7d268e03] drm/ast: Add constants for VGACRCB register bits
git bisect bad 4a11bd1e88af130f50a72e0f54391c1c7d268e03
# bad: [5c209d8056b9763ce544ecd7dadb3782cdaf96ed] drm/gma500: psb_spank() doesn't need it's own file
git bisect bad 5c209d8056b9763ce544ecd7dadb3782cdaf96ed
# bad: [db0c6bd2c0c0dada8927cd46a7c34c316a3a6c04] drm/gem: Export drm_gem_vmap() and drm_gem_vunmap()
git bisect bad db0c6bd2c0c0dada8927cd46a7c34c316a3a6c04
# bad: [f4a84e165e6d58606097dd07b5b78767a94b870c] drm/qxl: allocate dumb buffers in ram
git bisect bad f4a84e165e6d58606097dd07b5b78767a94b870c
# good: [a7709b9b89a67f3ead2d188b1d0c261059b1f291] drm/qxl: handle shadow in primary destroy
git bisect good a7709b9b89a67f3ead2d188b1d0c261059b1f291
# bad: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
git bisect bad 5a838e5d5825c85556011478abde708251cc0776
# good: [5f6c871fe919999774e8535ea611a6f84ee43ee4] drm/qxl: properly free qxl releases
git bisect good 5f6c871fe919999774e8535ea611a6f84ee43ee4
# first bad commit: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
I took a look at
commit 5a838e5d5825c85556011478abde708251cc0776 (refs/bisect/bad)
Date: Thu Feb 4 15:57:10 2021 +0100
drm/qxl: simplify qxl_fence_wait
Now that we have the new release_event wait queue we can just
use that in qxl_fence_wait() and simplify the code a lot.
and noticed that the bug does not occur if I boot 6.1 kernel with this patch
reverted (see attached file).
forward this directly to upstream including the people for the
bisected commit to get some idea.
Can you reproduce the issue with 6.5.8-1 in unstable as well?
If not, are you able to isolate an upstream fix which should be
backported to the 6.1.y series as well?
Regards,
Salvatore