Gentoo Archives: gentoo-dev

From: Alec Warner <antarus@g.o>
To: Gentoo Dev <gentoo-dev@l.g.o>
Cc: "Robin H. Johnson" <robbat2@g.o>
Subject: Re: [gentoo-dev] network sandbox challenge
Date: Wed, 01 Apr 2020 15:50:11
Message-Id: CAAr7Pr-1qxYwfgjEW0pKBDHgw7smvLXRi7n9i9ae1UukTMP=gQ@mail.gmail.com
In Reply to: Re: [gentoo-dev] network sandbox challenge by Samuel Bernardo
1 On Wed, Apr 1, 2020 at 5:14 AM Samuel Bernardo <
2 samuelbernardo.mail@×××××.com> wrote:
3
4 > Hi Robin,
5 > On 4/1/20 6:36 AM, Robin H. Johnson wrote:
6 >
7 > Normally we don't bundle dependencies, avoiding that problem entirely.
8 > The Go eclasses however are badly designed, committed against protest by
9 > paid corporate interests, and serve only to facilitate large-scale
10 > copyright infringement and security vulnerabilities. If you're looking
11 > for a consistent explanation of how they're supposed to work with the
12 > rest of Gentoo, you won't find one.
13 >
14 > mjo: Can you please substantiate your claims?
15 >
16 > It would have been nice to have heard your concerns during February, any
17 > of one the three times that William and I posted the go-module.eclass
18 > EGO_SUM development work for review on this mailing list. I don't see a
19 > single email from you during that entire period.
20 >
21 > The EGO_SUM support explicitly ensured that upstream distfiles (for each
22 > dependency) remained absolutely as upstream provided them, without
23 > merging the distfiles together or altering their content in way (I admit
24 > that the exact naming of the distfiles changed, because it was terrible,
25 > v0.0.0-20190311183353-d8887717615a.zip for example).
26 >
27 > Forgive my noobishness in this matter that let Alec to comment over my own
28 > statement.
29 >
30 > Alec pointed out some very important issues in go development that break
31 > copyright infringement and security vulnerabilities, but I'm sure that is
32 > not related to the good work done in go-module.eclass to surpass all go
33 > mess. npm is worst and I take from go-module as a good pattern to apply
34 > also into there.
35 >
36 I am antarus, not mjo (but more on that below!) I don't believe bundling
37 presents many challenges with regards to copyright infringement. As a
38 package maintainer you should know the licenses used in your packages. You
39 are required to reflect any licenses used in the LICENSE ebuild variable.
40 Obviously this becomes more work if you are using a bundle due to the fact
41 that bundling will include more code. In the golang ecosystem there is a
42 tool to help maintainers do this (
43 https://packages.gentoo.org/packages/dev-go/golicense). I get that with
44 bundling we cannot share the work from previous packages because packages
45 are not shared in a bundled environment but I expect the golicense tool to
46 have good coverage in practice. If the tool does the work, sharing the work
47 becomes moot.
48
49 I think licensing can be more challenging in other bundling scenarios where
50 tooling is not provided; but note that this is not significantly different
51 from the unbundled scenario in terms of license discovery. If I am
52 packaging a new program (A) and it depends on (B,C,D) I have two options. I
53 can either package [A,B,C,D] (normal gentoo way) or I can package [A] (with
54 B,C,D bundled). The intersection of the LICENSE variables is the same
55 effort for both here. The benefit of the multiple packages is that future
56 users of B,C,D can re-use the license discovery work and that isn't nothing.
57
58 > Going back to my overlay use case, will go-modules download all modules to
59 > distfiles directory? The naming convention will assure that there will be
60 > no modules repetition?
61 >
62 What about eclean-dist, will it work as expected for those modules
63 > dependencies?
64 >
65 > I think some of this answers would worth mention in documentation.
66 >
67 > Sorry for anything I wrongly stated and thank you very much for your help,
68 >
69 > Samuel
70 >
71 I've chosen this part to write my treatise on packaging, but rest assured
72 it's mostly intended as a response to mgorny and mjo; not specifically in
73 response to you.
74
75 The very long answer is that Gentoo was designed around a paradigm of
76 programs written primarily in C. In C programs you have the ability to link
77 to libraries which offer APIs and in the ideal case, each API is offered
78 via a unique SONAME[0]. Upstream packages were written and built in this
79 way (with dynamic linking). So in the case of package A, that uses
80 libraries B, C, and D; the result in many distributions is 4 packages
81 (A,B,C,D) and users who want A will get B, C, and D installed. This in fact
82 was a major selling point of package managers at the time because finding
83 these dependencies by hand and building and merging them all was painful.
84
85 Many applications break this trend; I don't think golang or nodejs are
86 particularly new (python and ruby have had (pip, venv) and rubygems[1] for
87 years, for example, which are similar bundling paradigms.) The struggle as
88 packagers and distribution managers is when upstream decides "my software
89 should be installed via a bundling solution (golang, node, pip, rubygems,
90 and so on)" we are left to decide both whether to map this to the ebuild
91 paradigm (no bundling of dependencies) or omit ebuilds entirely. In the
92 former case we are often left working at odds with upstream (who are
93 confused by our decomposition of their application) and in the latter case,
94 users often use the bundle anyway (e.g. they install the packages by hand
95 or use the ruby gems or whatever.) I assert this is somewhat of a false
96 choice. Bundling isn't all bad and we can learn from past mistakes[2] to
97 try to avoid problems.
98
99 Another challenge with bundling is that often bundling systems (bundler,
100 pip, venv, golang, etc.) specify specific versions, commits, or tags. This
101 is fine when bundling (because each bundle has its own version of a
102 dependency in the bundle) but when you are trying to share a system wide
103 package between N packages, you either need to SLOT the dependencies or
104 have a looser dependency specification. The fine-grained nature of the
105 upstream dependency specification can make this challenging[3].
106
107 Unbundling then made it easier for system operators to operate a system;
108 and you see this often in the security space. A security notice will come
109 out saying "foo-X-Y-Z is vulnerable, move to foo-Y.1." So operators want to
110 know "do I have foo-x-y-z installed?" When every package is in the package
111 manager this is a trivial question. When software is bundled inside of a
112 package, this visibility is lost. I haven't seen any tooling for Gentoo to
113 this problem.
114
115 In addition to the above, bundling can present exciting resource challenges
116 for some deployments. Imagine a common dep (CommonFoo-x-y-z) has a security
117 problem, so we must upgrade to CommonFoo-y-z. In the scenario where
118 CommonFoo is a dynamically linked package we can recompile it once[4] and
119 new consumers will just use the new dynamic shared object. In a bundling
120 scenario, we will be forced to rebuild[5] all consumers. This can take a
121 lot of time and resources depending on the deployment. Is the deployment
122 using a build farm? A binary packages host? How many disparate platforms
123 are in use?
124
125 Which is to say many people in Gentoo dislike bundling for various reasons;
126 many of them legitimate. I wish to present a narrative where bundling is an
127 engineering trade-off, rather than a decision that is settled engineering
128 law. This doesn't mean Gentoo needs to support all the bundling (clearly
129 most people don't want to) but not supporting it means that many packages
130 will not be in Gentoo at all (because unbundling is too costly) and so you
131 end up at this exciting discussion which happens every couple of years.
132
133 -A
134
135 [0] I understand this is not always true in practice, but let's assume
136 spherical cows momentarily.
137 [1] Gentoo has a rubygems-fakegem eclass that makes it pretty streamlined
138 to make an ebuild for a particular gem, but of course if my application
139 depends on 20 gems I still need to make 20 ebuilds in this scheme and merge
140 them all. Rubygems-fakegem is still pretty good though!
141 [2] On windows https://en.wikipedia.org/wiki/DLL_Hell was common. In the
142 .NET ecosystems assemblies addressed some of these problems.
143 [3] Similar to DLL hell but more generically:
144 https://en.wikipedia.org/wiki/Dependency_hell
145 [4] Practice of course, leads to all kinds of weird edge cases where
146 upgrading your shared lib causes dependencies to break for various reasons;
147 which is one reason why application authors like to bundle; because their
148 application ends up being perceived as more reliable and less finicky.
149 [5] The number of package rebuilds in Gentoo is a fairly common complaint,
150 from my personal observation. Obviously binary packages make this problem
151 worse (not better.) I dunno if its something the community should put more
152 effort into or not though; my expectation is that rebuilds are common and
153 making them more common is not a strategic problem; but I'm also not
154 compiling on some single core atom, so what do I know, eh? :)

Replies

Subject Author
Re: [gentoo-dev] network sandbox challenge Michael Orlitzky <mjo@g.o>
Re: [gentoo-dev] network sandbox challenge Samuel Bernardo <samuelbernardo.mail@×××××.com>