1 |
On Wednesday, 9 May 2018 19:10:13 CEST Mike Gilbert wrote: |
2 |
> On Wed, May 9, 2018 at 12:34 PM, Matt Turner <mattst88@g.o> wrote: |
3 |
> > On Tue, May 8, 2018 at 11:51 PM, Dennis Schridde <devurandom@×××.net> |
4 |
wrote: |
5 |
> >> Hello! |
6 |
> >> |
7 |
> >> I see sandbox violations similar to "ACCESS DENIED: open_wr: /dev/dri/ |
8 |
> >> renderD128" pop up for more and more packages, probably since OpenCL |
9 |
> >> becomes used more widely. Hence I would like to ask: Could we in Gentoo |
10 |
> >> treat GPUs just like CPUs and allow any process to access render nodes |
11 |
> >> (i.e. the GPUs compute capabilities via the specific interface the Linux |
12 |
> >> kernel's DRM offers for that purpose) without sandbox restrictions? |
13 |
> >> |
14 |
> >> --Dennis |
15 |
> >> |
16 |
> >> See-Also: https://bugs.gentoo.org/654216 |
17 |
> > |
18 |
> > This seems like a bad idea. With CPUs we've had decades to work out |
19 |
> > how to isolate processes and prevent them from taking down the system. |
20 |
> > |
21 |
> > GPUs are not there yet. It's simple to trigger an unrecoverable GPU |
22 |
> > hang and not much harder to turn it into a full system lock up. |
23 |
> > |
24 |
> > This is not safe. |
25 |
> |
26 |
> It's worth noting that the default rules shipped with udev assign mode |
27 |
> 0666 to the /dev/dri/renderD* device nodes. So, outside of a sanbox |
28 |
> environment, any user may access these devices. |
29 |
> |
30 |
> This was merged as part of this PR: |
31 |
> https://github.com/systemd/systemd/pull/7112 |
32 |
|
33 |
Also, what's happening right now is that every ebuild that *does* somehow use |
34 |
DRM render nodes receives SANDBOX_PREDICT or SANDBOX_WRITE access to them. |
35 |
|
36 |
And the cycle is usually: |
37 |
* Bump into a usage of render nodes that breaks the build at the very end |
38 |
* Report a bug |
39 |
* Wait |
40 |
* The ebuild gets "allow access to the first render node" code added |
41 |
* Someone with 2 GPUs runs into the same issue for the second render node |
42 |
* ... rinse and repeat ... |
43 |
* Eventually, after enough people ran into it, the ebuild gets its own custom |
44 |
"find all render nodes and allow access" code added |
45 |
|
46 |
Additionally it appears that often the usage is indirect, through another tool |
47 |
or library. So for ebuild developers this is not really predictable. |
48 |
|
49 |
Thus at the very least I would suggest adding code this code (to allow access |
50 |
to all render nodes) to an eclass, so it is easier for ebuild developers to |
51 |
fix their ebuild properly, once and for all. |
52 |
|
53 |
But by then the process is so easy and already so many builds are using render |
54 |
nodes, that the surface for builds to take down the system is very high. If |
55 |
the chromium build (e.g.) could trigger a bug in Mesa that takes down the |
56 |
system, so could anyone else. And if we trust their toolchain (and with a |
57 |
build time of several hours, I believe this to be a large set of tools and a |
58 |
lot of code) to not bring down the system, without a complete audit or |
59 |
something of the sort, why don't we trust anyone else? |
60 |
|
61 |
--Dennis |