Random freezes under Fedora (amdgpu)

I've been using Fedora 37 (and more recently 38) on two ThinkPads (T14 G3, P14s G3) with AMD Ryzen SoC (6850U) for a few months now.

Fedora 37 originally came with the Linux kernel in version 6.0.7. Later updates brought the versions 6.1 and 6.2. The latter unfortunately also introduced random freezes on my systems. Regardless of the applications running, power profile and system load, the system simply froze and had to be shut down violently. The firmware was always up to date.

Memtest86+ showed that it was not a RAM error as I had first suspected.

A look at the system logs showed nothing abnormal on the P14s G3, but on the T14 G3 I saw the following:

 1Apr 20 15:32:46 tppinkepank kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=6054011, emitted seq=6054013
 2Apr 20 15:32:46 tppinkepank kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2613 thread gnome-shel:cs0 pid 2642
 3Apr 20 15:32:46 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
 4Apr 20 15:32:48 tppinkepank gnome-shell[2613]: amdgpu: amdgpu_cs_query_fence_status failed.
 5Apr 20 15:32:48 tppinkepank gnome-shell[2613]: amdgpu: The CS has been rejected (-125), but the context isn't robust.
 6Apr 20 15:32:48 tppinkepank gnome-shell[2613]: amdgpu: The process will be terminated.
 7...
 8...
 9Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32770, for process firefox pid 3641 thread firefox:cs0 pid 3741)
10Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000800141c11000 from client 0x1b (UTCL2)
11Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00201031
12Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: TCP (0x8)
13Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x1
14Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x0
15Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0x3
16Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x0
17Apr 20 16:09:15 tppinkepank kernel: amdgpu 0000:04:00.0: amdgpu: RW: 0x0

Finally I found the solution in the Fedora Community: one of the users had a similar problem with the related T14s G3 AMD and could solve it together with the known screen glitches. It is sufficient to add the boot parameter amdgpu.dcdebugmask=0x10 in the GRUB configuration (/etc/default/grub):

1...
2GRUB_CMDLINE_LINUX="... amdgpu.dcdebugmask=0x10"
3...

Afterwards the GRUB configuration must be updated and the computer must be rebooted:

1# grub2-mkconfig -o /boot/grub2/grub.cfg
2# reboot

If the boot parameter is set correctly, the following dri parameters (Direct Rendring Infrastructure) should be set:

1# cat /sys/kernel/debug/dri/1/eDP-1/psr_capability
2Sink support: yes [0x03]
3Driver support: no [0xffffff]
4
5# cat /sys/kernel/debug/dri/1/eDP-1/psr_state
60

Apparently there is a problem with PSR (Panel Self Refresh) - a function that is supposed to reduce panel power consumption. This coincides with my observation on the T14 G3, which had massive problems with it when I installed Fedora 37 (with Linux 6.0.7): the screen often failed and could only be reactivated with a reboot or standby/wakeup.

Translations: