Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1( - gentoo-user

From:	tuxic@××××××.de
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1(
Date:	Wed, 15 Aug 2018 15:12:11
Message-Id:	`20180815151148.fkk3donqiedpqswl@solfire`
In Reply to:	Re: [gentoo-user] Re: "No CUDA device found" with nvidia-drivers newer than nvidia-drivers-396.24-r1( by "Corentin “Nado” Pazdera"

1

On 08/15 02:32, Corentin “Nado” Pazdera wrote:

2

> August 15, 2018 2:59 PM, tuxic@××××××.de wrote:

3

>

4

> > Yes I did reboot the sustem. In my initial mail I mentioned a tool

5

> > called CUDA-Z and Blender, which both reports a missing CUDA device.

6

>

7

> Ok, so you do not have a specific error which might have been thrown by the module?

8

> Other ideas, check dev-util/nvidia-cuda-toolkit version and double check nvidia/nvidia_uvm with modinfo to ensure they are installed and loaded correctly with the right version?

9

> Could you also run /opt/cuda/extras/demo_suite/deviceQuery (from nvidia-cuda-toolkit) and show its output?

10

>

11

> My installation works, so at least we know their version is not completely broken...

12

> Driver version: 396.51

13

> Cuda version: 9.2.88

14

>

15

> --

16

> Corentin “Nado” Pazdera

17

>

18

19

I compiled the new version of the driver again and rebooted the

20

system.

21

22

# dmesg | grep -i nvidia:

23

24

[   11.375631] nvidia_drm: module license 'MIT' taints kernel.

25

[   12.313260] nvidia-nvlink: Nvlink Core is being initialized, major device number 246

26

[   12.313586] nvidia 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem

27

[   12.313691] nvidia 0000:02:00.0: enabling device (0000 -> 0003)

28

[   12.313737] nvidia 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none

29

[   12.313826] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.51  Tue Jul 31 10:43:06 PDT 2018 (using threaded interrupts)

30

[   12.491106] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input9

31

[   12.492291] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input10

32

[   12.493772] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input11

33

[   12.494605] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input12

34

[   13.963644] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs

35

[   34.236553] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs

36

[   34.516495] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  396.51  Tue Jul 31 14:52:09 PDT 2018

37

38

# modprobe -a nvidia-uvm

39

40

# dmesg | grep uvm

41

42

[  209.441956] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 245

43

44

45

# /opt/cuda/extras/demo_suite/deviceQuery

46

/opt/cuda/extras/demo_suite/deviceQuery Starting...      

47

48

 CUDA Device Query (Runtime API) version (CUDART static linking)

49

50

cudaGetDeviceCount returned 30

51

-> unknown error

52

Result = FAIL

53

[1]    5086 exit 1     /opt/cuda/extras/demo_suite/deviceQuery

54

55

CUDA-Z shows also "no CUDA device" 

56

57

# modinfo nvidia-uvm

58

filename:       /lib/modules/4.18.0-RT/video/nvidia-uvm.ko

59

supported:      external

60

license:        MIT

61

depends:        nvidia

62

name:           nvidia_uvm

63

vermagic:       4.18.0-RT SMP preempt mod_unload 

64

parm:           uvm_perf_prefetch_enable:uint

65

parm:           uvm_perf_prefetch_threshold:uint

66

parm:           uvm_perf_prefetch_min_faults:uint

67

parm:           uvm_perf_thrashing_enable:uint

68

parm:           uvm_perf_thrashing_threshold:uint

69

parm:           uvm_perf_thrashing_pin_threshold:uint

70

parm:           uvm_perf_thrashing_lapse_usec:uint

71

parm:           uvm_perf_thrashing_nap_usec:uint

72

parm:           uvm_perf_thrashing_epoch_msec:uint

73

parm:           uvm_perf_thrashing_max_resets:uint

74

parm:           uvm_perf_thrashing_pin_msec:uint

75

parm:           uvm_perf_map_remote_on_native_atomics_fault:uint

76

parm:           uvm_hmm:Enable (1) or disable (0) HMM mode. Default: 0. Ignored if CONFIG_HMM is not set, or if NEXT settings conflict with HMM. (int)

77

parm:           uvm_global_oversubscription:Enable (1) or disable (0) global oversubscription support. (int)

78

parm:           uvm_leak_checker:Enable uvm memory leak checking. 0 = disabled, 1 = count total bytes allocated and freed, 2 = per-allocation origin tracking. (int)

79

parm:           uvm_force_prefetch_fault_support:uint

80

parm:           uvm_debug_enable_push_desc:Enable push description tracking (int)

81

parm:           uvm_page_table_location:Set the location for UVM-allocated page tables. Choices are: vid, sys. (charp)

82

parm:           uvm_perf_access_counter_mimc_migration_enable:Whether MIMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)

83

parm:           uvm_perf_access_counter_momc_migration_enable:Whether MOMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)

84

parm:           uvm_perf_access_counter_batch_count:uint

85

parm:           uvm_perf_access_counter_granularity:Size of the physical memory region tracked by each counter. Valid values asof Volta: 64k, 2m, 16m, 16g (charp)

86

parm:           uvm_perf_access_counter_threshold:Number of remote accesses on a region required to trigger a notification.Valid values: [1, 65535] (uint)

87

parm:           uvm_perf_reenable_prefetch_faults_lapse_msec:uint

88

parm:           uvm_perf_fault_batch_count:uint

89

parm:           uvm_perf_fault_replay_policy:uint

90

parm:           uvm_perf_fault_replay_update_put_ratio:uint

91

parm:           uvm_perf_fault_max_batches_per_service:uint

92

parm:           uvm_perf_fault_max_throttle_per_service:uint

93

parm:           uvm_perf_fault_coalesce:uint

94

parm:           uvm_fault_force_sysmem:Force (1) using sysmem storage for pages that faulted. Default: 0. (int)

95

parm:           uvm_perf_map_remote_on_eviction:int

96

parm:           uvm_channel_num_gpfifo_entries:uint

97

parm:           uvm_channel_gpfifo_loc:charp

98

parm:           uvm_channel_gpput_loc:charp

99

parm:           uvm_channel_pushbuffer_loc:charp

100

parm:           uvm_enable_debug_procfs:Enable debug procfs entries in /proc/driver/nvidia-uvm (int)

101

parm:           uvm8_ats_mode:Override the default ATS (Address Translation Services) UVM mode by disabling (0) or enabling (1) (int)

102

parm:           uvm_driver_mode:Set the uvm kernel driver mode. Choices include: 8 (charp)

103

parm:           uvm_debug_prints:Enable uvm debug prints. (int)

104

parm:           uvm_enable_builtin_tests:Enable the UVM built-in tests. (This is a security risk) (int)

105

106

107

# ls -l /lib/modules/4.18.0-RT/video/nvidia-uvm.ko

108

-rw-r--r-- 1 root root 1405808 Aug 15 16:49 /lib/modules/4.18.0-RT/video/nvidia-uvm.ko

109

(just installed minytes before)

110

111

# uname -a

112

Linux solfire 4.18.0-RT #1 SMP PREEMPT Mon Aug 13 05:15:26 CEST 2018 x86_64 AMD Phenom(tm) II X6 1090T Processor AuthenticAMD GNU/Linux

113

(the kernel version matches)

114

115

# eix nvidia-cuda-toolkit

116

117

[I] dev-util/nvidia-cuda-toolkit

118

     Available versions:  [M](~)6.5.14(0/6.5.14) [M](~)6.5.19-r1(0/6.5.19) [M](~)7.5.18-r2(0/7.5.18) [M](~)8.0.44(0/8.0.44) [M](~)8.0.61(0/8.0.61) (~)9.0.176(0/9.0.176) (~)9.1.85(0/9.1.85) (~)9.2.88(0/9.2.88) {debugger doc eclipse profiler}

119

     Installed versions:  9.2.88(0/9.2.88)(06:31:32 PM 08/14/2018)(-debugger -doc -eclipse -profiler)

120

     Homepage:            https://developer.nvidia.com/cuda-zone

121

     Description:         NVIDIA CUDA Toolkit (compiler and friends)

It becomes even more weird...

Gentoo Archives: gentoo-user

Replies

1	On 08/15 02:32, Corentin “Nado” Pazdera wrote:
2	> August 15, 2018 2:59 PM, tuxic@××××××.de wrote:
3	>
4	> > Yes I did reboot the sustem. In my initial mail I mentioned a tool
5	> > called CUDA-Z and Blender, which both reports a missing CUDA device.
6	>
7	> Ok, so you do not have a specific error which might have been thrown by the module?
8	> Other ideas, check dev-util/nvidia-cuda-toolkit version and double check nvidia/nvidia_uvm with modinfo to ensure they are installed and loaded correctly with the right version?
9	> Could you also run /opt/cuda/extras/demo_suite/deviceQuery (from nvidia-cuda-toolkit) and show its output?
10	>
11	> My installation works, so at least we know their version is not completely broken...
12	> Driver version: 396.51
13	> Cuda version: 9.2.88
14	>
15	> --
16	> Corentin “Nado” Pazdera
17	>
18
19	I compiled the new version of the driver again and rebooted the
20	system.
21
22	# dmesg \| grep -i nvidia:
23
24	[ 11.375631] nvidia_drm: module license 'MIT' taints kernel.
25	[ 12.313260] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
26	[ 12.313586] nvidia 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
27	[ 12.313691] nvidia 0000:02:00.0: enabling device (0000 -> 0003)
28	[ 12.313737] nvidia 0000:02:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
29	[ 12.313826] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 396.51 Tue Jul 31 10:43:06 PDT 2018 (using threaded interrupts)
30	[ 12.491106] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input9
31	[ 12.492291] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:0b.0/0000:02:00.1/sound/card2/input10
32	[ 12.493772] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input11
33	[ 12.494605] input: HDA NVidia HDMI as /devices/pci0000:00/0000:00:02.0/0000:07:00.1/sound/card1/input12
34	[ 13.963644] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
35	[ 34.236553] caller _nv001112rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
36	[ 34.516495] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 396.51 Tue Jul 31 14:52:09 PDT 2018
37
38	# modprobe -a nvidia-uvm
39
40	# dmesg \| grep uvm
41
42	[ 209.441956] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 245
43
44
45	# /opt/cuda/extras/demo_suite/deviceQuery
46	/opt/cuda/extras/demo_suite/deviceQuery Starting...
47
48	CUDA Device Query (Runtime API) version (CUDART static linking)
49
50	cudaGetDeviceCount returned 30
51	-> unknown error
52	Result = FAIL
53	[1] 5086 exit 1 /opt/cuda/extras/demo_suite/deviceQuery
54
55	CUDA-Z shows also "no CUDA device"
56
57	# modinfo nvidia-uvm
58	filename: /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
59	supported: external
60	license: MIT
61	depends: nvidia
62	name: nvidia_uvm
63	vermagic: 4.18.0-RT SMP preempt mod_unload
64	parm: uvm_perf_prefetch_enable:uint
65	parm: uvm_perf_prefetch_threshold:uint
66	parm: uvm_perf_prefetch_min_faults:uint
67	parm: uvm_perf_thrashing_enable:uint
68	parm: uvm_perf_thrashing_threshold:uint
69	parm: uvm_perf_thrashing_pin_threshold:uint
70	parm: uvm_perf_thrashing_lapse_usec:uint
71	parm: uvm_perf_thrashing_nap_usec:uint
72	parm: uvm_perf_thrashing_epoch_msec:uint
73	parm: uvm_perf_thrashing_max_resets:uint
74	parm: uvm_perf_thrashing_pin_msec:uint
75	parm: uvm_perf_map_remote_on_native_atomics_fault:uint
76	parm: uvm_hmm:Enable (1) or disable (0) HMM mode. Default: 0. Ignored if CONFIG_HMM is not set, or if NEXT settings conflict with HMM. (int)
77	parm: uvm_global_oversubscription:Enable (1) or disable (0) global oversubscription support. (int)
78	parm: uvm_leak_checker:Enable uvm memory leak checking. 0 = disabled, 1 = count total bytes allocated and freed, 2 = per-allocation origin tracking. (int)
79	parm: uvm_force_prefetch_fault_support:uint
80	parm: uvm_debug_enable_push_desc:Enable push description tracking (int)
81	parm: uvm_page_table_location:Set the location for UVM-allocated page tables. Choices are: vid, sys. (charp)
82	parm: uvm_perf_access_counter_mimc_migration_enable:Whether MIMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
83	parm: uvm_perf_access_counter_momc_migration_enable:Whether MOMC access counters will trigger migrations.Valid values: <= -1 (default policy), 0 (off), >= 1 (on) (int)
84	parm: uvm_perf_access_counter_batch_count:uint
85	parm: uvm_perf_access_counter_granularity:Size of the physical memory region tracked by each counter. Valid values asof Volta: 64k, 2m, 16m, 16g (charp)
86	parm: uvm_perf_access_counter_threshold:Number of remote accesses on a region required to trigger a notification.Valid values: [1, 65535] (uint)
87	parm: uvm_perf_reenable_prefetch_faults_lapse_msec:uint
88	parm: uvm_perf_fault_batch_count:uint
89	parm: uvm_perf_fault_replay_policy:uint
90	parm: uvm_perf_fault_replay_update_put_ratio:uint
91	parm: uvm_perf_fault_max_batches_per_service:uint
92	parm: uvm_perf_fault_max_throttle_per_service:uint
93	parm: uvm_perf_fault_coalesce:uint
94	parm: uvm_fault_force_sysmem:Force (1) using sysmem storage for pages that faulted. Default: 0. (int)
95	parm: uvm_perf_map_remote_on_eviction:int
96	parm: uvm_channel_num_gpfifo_entries:uint
97	parm: uvm_channel_gpfifo_loc:charp
98	parm: uvm_channel_gpput_loc:charp
99	parm: uvm_channel_pushbuffer_loc:charp
100	parm: uvm_enable_debug_procfs:Enable debug procfs entries in /proc/driver/nvidia-uvm (int)
101	parm: uvm8_ats_mode:Override the default ATS (Address Translation Services) UVM mode by disabling (0) or enabling (1) (int)
102	parm: uvm_driver_mode:Set the uvm kernel driver mode. Choices include: 8 (charp)
103	parm: uvm_debug_prints:Enable uvm debug prints. (int)
104	parm: uvm_enable_builtin_tests:Enable the UVM built-in tests. (This is a security risk) (int)
105
106
107	# ls -l /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
108	-rw-r--r-- 1 root root 1405808 Aug 15 16:49 /lib/modules/4.18.0-RT/video/nvidia-uvm.ko
109	(just installed minytes before)
110
111	# uname -a
112	Linux solfire 4.18.0-RT #1 SMP PREEMPT Mon Aug 13 05:15:26 CEST 2018 x86_64 AMD Phenom(tm) II X6 1090T Processor AuthenticAMD GNU/Linux
113	(the kernel version matches)
114
115	# eix nvidia-cuda-toolkit
116
117	[I] dev-util/nvidia-cuda-toolkit
118	Available versions: [M](~)6.5.14(0/6.5.14) [M](~)6.5.19-r1(0/6.5.19) [M](~)7.5.18-r2(0/7.5.18) [M](~)8.0.44(0/8.0.44) [M](~)8.0.61(0/8.0.61) (~)9.0.176(0/9.0.176) (~)9.1.85(0/9.1.85) (~)9.2.88(0/9.2.88) {debugger doc eclipse profiler}
119	Installed versions: 9.2.88(0/9.2.88)(06:31:32 PM 08/14/2018)(-debugger -doc -eclipse -profiler)
120	Homepage: https://developer.nvidia.com/cuda-zone
121	Description: NVIDIA CUDA Toolkit (compiler and friends)
122
123
124
125	It becomes even more weird...