Gentoo Archives: gentoo-user

From: Devrin Talen <dct23@×××××××.edu>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Intermittent nouveau graphics failures
Date: Wed, 01 Mar 2017 13:39:23
Message-Id: CA+FDhMOM6Tq5oaZ3qoCQWMFrO8LWNAD1UrDVfERaSVsy-+mRTA@mail.gmail.com
1 Hey all,
2
3 My desktop system has an NVidia graphics card that identifies as:
4
5 % lspci -v
6 # snip...
7 01:00.0 VGA compatible controller: NVIDIA Corporation GK107 [GeForce GTX
8 650] (rev a1) (prog-if 00 [VGA controller])
9 Subsystem: Gigabyte Technology Co., Ltd GK107 [GeForce GTX 650]
10 Flags: bus master, fast devsel, latency 0, IRQ 29
11 Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
12 Memory at e0000000 (64-bit, prefetchable) [size=256M]
13 Memory at f0000000 (64-bit, prefetchable) [size=32M]
14 I/O ports at e000 [size=128]
15 Expansion ROM at 000c0000 [disabled] [size=128K]
16 Capabilities: [60] Power Management version 3
17 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
18 Capabilities: [78] Express Endpoint, MSI 00
19 Capabilities: [b4] Vendor Specific Information: Len=14 <?>
20 Capabilities: [100] Virtual Channel
21 Capabilities: [128] Power Budgeting <?>
22 Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1
23 Len=024 <?>
24 Capabilities: [900] #19
25 Kernel driver in use: nouveau
26
27 From time to time, maybe once or twice a week, my system will fail. The
28 symptoms are:
29
30 - Graphics freeze, no mouse movement, and they never start working no
31 matter how long I wait
32 - Sound is working (spotify keeps playing)
33 - Network connectivity works (I can ssh in)
34
35 When this happens and I ssh in and check out dmesg, I always see an error
36 like the following:
37
38 [11741.905192] nouveau 0000:01:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
39 [11741.905202] nouveau 0000:01:00.0: fifo: gr engine fault on channel 10,
40 recovering...
41
42 Sometimes I see a lot of those errors, sometimes just one. Whenever the
43 system is running normally those don't ever appear. I'm always able to ssh
44 in and reboot cleanly.
45
46 Does anyone have any idea where I can start digging in to find out what's
47 happening? Are these fifo errors happening in some logic that I can
48 disable with a kernel command line option?
49
50 Thanks,
51 Devrin