Gentoo Archives: gentoo-dev

From: Dennis Schridde <devurandom@×××.net>
To: gentoo-dev@l.g.o
Cc: Mike Gilbert <floppym@g.o>
Subject: Re: [gentoo-dev] Access to DRM render nodes from portage sandbox?
Date: Thu, 10 May 2018 07:18:17
Message-Id: 2361023.kiO95Ok753@monk
In Reply to: Re: [gentoo-dev] Access to DRM render nodes from portage sandbox? by Mike Gilbert
1 On Wednesday, 9 May 2018 19:10:13 CEST Mike Gilbert wrote:
2 > On Wed, May 9, 2018 at 12:34 PM, Matt Turner <mattst88@g.o> wrote:
3 > > On Tue, May 8, 2018 at 11:51 PM, Dennis Schridde <devurandom@×××.net>
4 wrote:
5 > >> Hello!
6 > >>
7 > >> I see sandbox violations similar to "ACCESS DENIED: open_wr: /dev/dri/
8 > >> renderD128" pop up for more and more packages, probably since OpenCL
9 > >> becomes used more widely. Hence I would like to ask: Could we in Gentoo
10 > >> treat GPUs just like CPUs and allow any process to access render nodes
11 > >> (i.e. the GPUs compute capabilities via the specific interface the Linux
12 > >> kernel's DRM offers for that purpose) without sandbox restrictions?
13 > >>
14 > >> --Dennis
15 > >>
16 > >> See-Also: https://bugs.gentoo.org/654216
17 > >
18 > > This seems like a bad idea. With CPUs we've had decades to work out
19 > > how to isolate processes and prevent them from taking down the system.
20 > >
21 > > GPUs are not there yet. It's simple to trigger an unrecoverable GPU
22 > > hang and not much harder to turn it into a full system lock up.
23 > >
24 > > This is not safe.
25 >
26 > It's worth noting that the default rules shipped with udev assign mode
27 > 0666 to the /dev/dri/renderD* device nodes. So, outside of a sanbox
28 > environment, any user may access these devices.
29 >
30 > This was merged as part of this PR:
31 > https://github.com/systemd/systemd/pull/7112
32
33 Also, what's happening right now is that every ebuild that *does* somehow use
34 DRM render nodes receives SANDBOX_PREDICT or SANDBOX_WRITE access to them.
35
36 And the cycle is usually:
37 * Bump into a usage of render nodes that breaks the build at the very end
38 * Report a bug
39 * Wait
40 * The ebuild gets "allow access to the first render node" code added
41 * Someone with 2 GPUs runs into the same issue for the second render node
42 * ... rinse and repeat ...
43 * Eventually, after enough people ran into it, the ebuild gets its own custom
44 "find all render nodes and allow access" code added
45
46 Additionally it appears that often the usage is indirect, through another tool
47 or library. So for ebuild developers this is not really predictable.
48
49 Thus at the very least I would suggest adding code this code (to allow access
50 to all render nodes) to an eclass, so it is easier for ebuild developers to
51 fix their ebuild properly, once and for all.
52
53 But by then the process is so easy and already so many builds are using render
54 nodes, that the surface for builds to take down the system is very high. If
55 the chromium build (e.g.) could trigger a bug in Mesa that takes down the
56 system, so could anyone else. And if we trust their toolchain (and with a
57 build time of several hours, I believe this to be a large set of tools and a
58 lot of code) to not bring down the system, without a complete audit or
59 something of the sort, why don't we trust anyone else?
60
61 --Dennis

Attachments

File name MIME type
signature.asc application/pgp-signature