Gentoo Archives: gentoo-dev

From:	Dennis Schridde <devurandom@×××.net>
To:	gentoo-dev@l.g.o
Cc:	Mike Gilbert <floppym@g.o>
Subject:	Re: [gentoo-dev] Access to DRM render nodes from portage sandbox?
Date:	Thu, 10 May 2018 07:18:17
Message-Id:	`2361023.kiO95Ok753@monk`
In Reply to:	Re: [gentoo-dev] Access to DRM render nodes from portage sandbox? by Mike Gilbert

1	On Wednesday, 9 May 2018 19:10:13 CEST Mike Gilbert wrote:
2	> On Wed, May 9, 2018 at 12:34 PM, Matt Turner <mattst88@g.o> wrote:
3	> > On Tue, May 8, 2018 at 11:51 PM, Dennis Schridde <devurandom@×××.net>
4	wrote:
5	> >> Hello!
6	> >>
7	> >> I see sandbox violations similar to "ACCESS DENIED: open_wr: /dev/dri/
8	> >> renderD128" pop up for more and more packages, probably since OpenCL
9	> >> becomes used more widely. Hence I would like to ask: Could we in Gentoo
10	> >> treat GPUs just like CPUs and allow any process to access render nodes
11	> >> (i.e. the GPUs compute capabilities via the specific interface the Linux
12	> >> kernel's DRM offers for that purpose) without sandbox restrictions?
13	> >>
14	> >> --Dennis
15	> >>
16	> >> See-Also: https://bugs.gentoo.org/654216
17	> >
18	> > This seems like a bad idea. With CPUs we've had decades to work out
19	> > how to isolate processes and prevent them from taking down the system.
20	> >
21	> > GPUs are not there yet. It's simple to trigger an unrecoverable GPU
22	> > hang and not much harder to turn it into a full system lock up.
23	> >
24	> > This is not safe.
25	>
26	> It's worth noting that the default rules shipped with udev assign mode
27	> 0666 to the /dev/dri/renderD* device nodes. So, outside of a sanbox
28	> environment, any user may access these devices.
29	>
30	> This was merged as part of this PR:
31	> https://github.com/systemd/systemd/pull/7112
32
33	Also, what's happening right now is that every ebuild that does somehow use
34	DRM render nodes receives SANDBOX_PREDICT or SANDBOX_WRITE access to them.
35
36	And the cycle is usually:
37	* Bump into a usage of render nodes that breaks the build at the very end
38	* Report a bug
39	* Wait
40	* The ebuild gets "allow access to the first render node" code added
41	* Someone with 2 GPUs runs into the same issue for the second render node
42	* ... rinse and repeat ...
43	* Eventually, after enough people ran into it, the ebuild gets its own custom
44	"find all render nodes and allow access" code added
45
46	Additionally it appears that often the usage is indirect, through another tool
47	or library. So for ebuild developers this is not really predictable.
48
49	Thus at the very least I would suggest adding code this code (to allow access
50	to all render nodes) to an eclass, so it is easier for ebuild developers to
51	fix their ebuild properly, once and for all.
52
53	But by then the process is so easy and already so many builds are using render
54	nodes, that the surface for builds to take down the system is very high. If
55	the chromium build (e.g.) could trigger a bug in Mesa that takes down the
56	system, so could anyone else. And if we trust their toolchain (and with a
57	build time of several hours, I believe this to be a large set of tools and a
58	lot of code) to not bring down the system, without a complete audit or
59	something of the sort, why don't we trust anyone else?
60
61	--Dennis

Attachments

File name	MIME type
signature.asc	application/pgp-signature

Report Message

Find on MARC Find on Google Groups