Re: [gentoo-dev] Gentoo LTS or: proper backward compatibility? - gentoo-dev

From:	m1027 <m1027@××××××.net>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] Gentoo LTS or: proper backward compatibility?
Date:	Tue, 03 Jan 2023 00:55:17
Message-Id:	`Y7N87WHnaQcxUplD@host`
In Reply to:	Re: [gentoo-dev] Gentoo LTS or: proper backward compatibility? by Alec Warner

1

Many thanks for your detailed thoughs for sharing the rich

2

experiences on this! See below:

3

4

antarus:

5

6

> On Mon, Jan 2, 2023 at 4:48 AM m1027 <m1027@××××××.net> wrote:

7

> >

8

> > Hi and happy new year.

9

> >

10

> > When we create apps on Gentoo they become easily incompatible for

11

> > older Gentoo systems in production where unattended remote world

12

> > updates are risky. This is due to new glibc, openssl-3 etc.

13

>

14

> I wrote a very long reply, but I've removed most of it: I basically

15

> have a few questions, and then some comments:

16

>

17

> I don't quite grasp your problem statement, so I will repeat what I

18

> think it is and you can confirm / deny.

19

>

20

>   - Your devs build using gentoo synced against some recent tree, they

21

> have recent packages, and they build some software that you deploy to

22

> prod.

23

24

Yes.

25

26

>   - Your prod machines are running gentoo synced against some recent

27

> tree, but not upgraded (maybe only glsa-check runs) and so they are

28

> running 'old' packages because you are afraid to update them[0]

29

30

Well, we did sync (without updading packages) in the past but today we

31

even fear to sync against recent trees. Without going into details,

32

as a rule of thumb, weekly or monthly sync + package updates work

33

near to perfect. (It's cool to see what a good job emerge does on our

34

own internal production systems.) Updating systems older than 12

35

months or so may, however, be a hugh task. And too risky for remote

36

production systems of customers.

37

38

39

>   - Your software builds OK in dev, but when you deploy it in prod it

40

> breaks, because prod is really old, and your developments are using

41

> packages that are too new.

42

43

Exactly.

44

45

46

> My main feedback here is:

47

>  - Your "build" environment should be like prod. You said you didn't

48

> want to build "developer VMs" but I am unsure why. For example I run

49

> Ubuntu and I do all my gentoo development (admittedly very little

50

> these days)

51

>    in a systemd-nspawn container, and I have a few shell scripts to

52

> mount everything and set it up (so it has a tree snapshot, some git

53

> repos, some writable space etc.)

54

55

Okay, yes. That is way (1) I mentioned in my OP. It works indeed but

56

has the mentioned drawbacks: VMs and maintenance pile up, and for

57

each developer. And you don't know when there is the moment to

58

create a new VM. But yes it seems to me one of the ways to go:

59

*Before* creating a production system you need to freeze portage,

60

create dev VMs, and prevent updates on the VMs, too. (Freezing aka

61

not updating has many disadvantages, of course.)

62

63

64

>  - Your "prod" environment is too risky to upgrade, and you have

65

> difficulty crafting builds that run in every prod environment. I think

66

> this is fixable by making a build environment more like the prod

67

> environment.

68

>     The challenge here is that if you have not done that (kept the

69

> copies of ebuilds around, the distfiles, etc) it can be challenging to

70

> "recreate" the existing older prod environments.

71

>     But if you do the above thing (where devs build in a container)

72

> and you can make that container like the prod environments, then you

73

> can enable devs to build for the prod environment (in a container on

74

> their local machine) and get the outcome you want.

75

76

Not sure I got your point here. But yes, it comes down to what was

77

said above.

78

79

80

>  - Understand that not upgrading prod is like, to use a finance term,

81

> picking up pennies in front of a steamroller. It's a great strategy,

82

> but eventually you will actually *need* to upgrade something. Maybe

83

> for a critical security issue, maybe for a feature. Having a build

84

> environment that matches prod is good practice, you should do it, but

85

> you should also really schedule maintenance for these prod nodes to

86

> get them upgraded. (For physical machines, I've often seen businesses

87

> just eat the risk and assume the machine will physically fail before

88

> the steamroller comes, but this is less true with virtualized

89

> environments that have longer real lifetimes.)

90

91

Yes, haha, I agree. And yes, I totally ignored backporting security

92

here, as well as the need that we might *require* a dependend

93

package upgrade (e.g. to fix a known memory leak). I left that out

94

for simlicity only.

95

96

97

> > So, what we've thought of so far is:

98

> >

99

> > (1) Keeping outdated developer boxes around and compile there. We

100

> > would freeze portage against accidental emerge sync by creating a

101

> > git branch in /var/db/repos/gentoo. This feels hacky and requires a

102

> > increating number of develper VMs. And sometimes we are hit by a

103

> > silent incompatibility we were not aware of.

104

>

105

> In general when you build binaries for some target, you should build

106

> on that target when possible. To me, this is the crux of your issue

107

> (that you do not) and one of the main causes of your pain.

108

> You will need to figure out a way to either:

109

>  - Upgrade the older environments to new packages.

110

>  - Build in copies of the older environments.

111

>

112

> I actually expect the second one to take 1-2 sprints (so like 1 engineer month?)

113

>  - One sprint to make some scripts that makes a new production 'container'

114

>  - One sprint to sort of integrate that container into your dev

115

> workflow, so devs build in the container instead of what they build in

116

> now.

117

>

118

> It might be more or less daunting depending on how many distinct

119

> (unique?) prod environments you have (how many containers will you

120

> actually need for good build coverage?), how experienced in Gentoo

121

> your developers are, and how many artifacts from prod you have.

122

>  - A few crazy ideas are like:

123

>    - Snapshot an existing prod machine, strip of it machine-specific

124

> bits, and use that as your container.

125

>    - Use quickpkg to generate a bunch of bin pkgs from a prod machine,

126

> use that to bootstrap a container.

127

>    - Probably some other exciting ideas on the list ;)

128

129

Thanks for the enthusiasm on it. ;-) Well:

130

131

We cannot build (develop) on that exact target. Imagine hardware

132

being sold to customers. They just want/need a software update of

133

our app.

134

135

And, unfortunatelly we don't have hardware clones of all the

136

different customer's hardware at ours to build, test etc.

137

138

So, we come back on the question how to have a solid LTS-like

139

software OS / stack onto which newly compiled developer apps can be

140

distributed and just work. And all this in Gentoo. :-)

141

142

143

> > (2) Using Ubuntu LTS for production and Gentoo for development is

144

> > hit by subtile libjpeg incompatibilites and such.

145

>

146

> I would advise, if possible, to make dev and prod as similar as

147

> possible[1]. I'd be curious what blockers you think there are to this

148

> pattern.

149

> Remember that "dev" is not "whatever your devs are using" but is

150

> ideally some maintained environment; segmented from their daily driver

151

> computer (somehow).

152

153

That is again VMs per "release" and per dev, right? See above "way

154

(1)".

155

156

157

> > (3) Distributing apps as VMs or docker: Even those tools advance and

158

> > become incompatible, right? And not suitable when for smaller Arm

159

> > devices.

160

>

161

> I think if your apps are small and self-contained and easily rebuilt,

162

> your (3) and (4) can be workable.

163

>

164

> If you need 1000 dependencies at runtime, your containers are going to

165

> be expensive to build, expensive to maintain, you are gonna have to

166

> build them often (for security issues), it can be challenging to

167

> support incremental builds and incremental updates...you generally

168

> want a clearer problem statement to adopt this pain. Two problem

169

> statements that might be worth it are below ;)

170

>

171

> If you told me you had 100 different production environments, or

172

> needed to support 12 different OSes, I'd tell you to use containers

173

> (or similar)

174

> If you told me you didn't control your production environment (because

175

> users installed the software wherever) I'd tell you use containers (or

176

> similar)

177

>

178

> >

179

> > (4) Flatpak: No experience, does it work well?

180

>

181

> Flatpak is conceptually similar to your (3). I know you are basically

182

> asking "does it work" and the answer is "probably", but see the other

183

> questions for (3). I suspect it's less about "does it work" and more

184

> about "is some container deployment thing really a great idea."

185

186

Well thanks for your comments on containers and flatpak. It's

187

motivating to investigate that further.

188

189

Admittedly, we've been sticking to natively built apps for reasons

190

that might not be relevant these days. (Hardware bound apps, bus

191

systems etc, performance reasons on IoT like devices, no real

192

experience in lean containers yet, only Qemu.)

193

194

195

> Peter's comment about basically running your own fork of gentoo.git

196

> and sort of 'importing the updates' is workable. Google did this for

197

> debian testing (called project Rodete)[2]. I can't say it's a

198

> particularly cheap solution (significant automation and testing

199

> required) but I think as long as you are keeping up (I would advise

200

> never falling more than 365d behind time.now() in your fork) then I

201

> think it provides some benefits.

202

>   - You control when you take updates.

203

>   - You want to stay "close" to time.now() in the tree, since a

204

> rolling distro is how things are tested.

205

>   - This buys you 365d or so to fix any problem you find.

206

>   - It nominally requires that you test against ::gentoo and

207

> ::your-gentoo-fork, so you find problems in ::gentoo before they are

208

> pulled into your fork, giving you a heads up that you need to put work

209

> in.

210

211

I haven't commented on Peter yet but yes I'll have a look on what he

212

added. Something tells me that distributing apps in a container

213

might be the cheaper way for us. We'll see.

214

215

216

> [0] FWIW this is basically what #gentoo-infra does on our boxes and

217

> it's terrible and I would not recommend it to most people in the

218

> modern era. Upgrade your stuff regularly.

219

> [1] When I was at Google we had a hilarious outage because someone

220

> switched login managers (gdm vs kdm) and kdm had a different default

221

> umask somehow? Anyway it resulted in a critical component having the

222

> wrong permissions and it caused a massive outage (luckily we had

223

> sufficient redundancy that it was not user visible) but it was one of

224

> the scariest outages I had ever seen. I was in charge of investigating

225

> (being on the dev OS team at the time) and it was definitely very

226

> difficult to figure out "what changed" to produce the bad build. We

227

> stopped building on developer workstations soon after, FWIW.

228

> [2] https://cloud.google.com/blog/topics/developers-practitioners/how-google-got-to-rolling-linux-releases-for-desktops

229

230

Thanks for sharing! Very interesting insights.

231

232

To sum up:

233

234

You described interesting ways to create and control own releases of

235

Gentoo. So production and developer systems could be aligned on

236

that. The effort depends.

237

238

Another way is containers.

Gentoo Archives: gentoo-dev

Replies

1	Many thanks for your detailed thoughs for sharing the rich
2	experiences on this! See below:
3
4	antarus:
5
6	> On Mon, Jan 2, 2023 at 4:48 AM m1027 <m1027@××××××.net> wrote:
7	> >
8	> > Hi and happy new year.
9	> >
10	> > When we create apps on Gentoo they become easily incompatible for
11	> > older Gentoo systems in production where unattended remote world
12	> > updates are risky. This is due to new glibc, openssl-3 etc.
13	>
14	> I wrote a very long reply, but I've removed most of it: I basically
15	> have a few questions, and then some comments:
16	>
17	> I don't quite grasp your problem statement, so I will repeat what I
18	> think it is and you can confirm / deny.
19	>
20	> - Your devs build using gentoo synced against some recent tree, they
21	> have recent packages, and they build some software that you deploy to
22	> prod.
23
24	Yes.
25
26	> - Your prod machines are running gentoo synced against some recent
27	> tree, but not upgraded (maybe only glsa-check runs) and so they are
28	> running 'old' packages because you are afraid to update them[0]
29
30	Well, we did sync (without updading packages) in the past but today we
31	even fear to sync against recent trees. Without going into details,
32	as a rule of thumb, weekly or monthly sync + package updates work
33	near to perfect. (It's cool to see what a good job emerge does on our
34	own internal production systems.) Updating systems older than 12
35	months or so may, however, be a hugh task. And too risky for remote
36	production systems of customers.
37
38
39	> - Your software builds OK in dev, but when you deploy it in prod it
40	> breaks, because prod is really old, and your developments are using
41	> packages that are too new.
42
43	Exactly.
44
45
46	> My main feedback here is:
47	> - Your "build" environment should be like prod. You said you didn't
48	> want to build "developer VMs" but I am unsure why. For example I run
49	> Ubuntu and I do all my gentoo development (admittedly very little
50	> these days)
51	> in a systemd-nspawn container, and I have a few shell scripts to
52	> mount everything and set it up (so it has a tree snapshot, some git
53	> repos, some writable space etc.)
54
55	Okay, yes. That is way (1) I mentioned in my OP. It works indeed but
56	has the mentioned drawbacks: VMs and maintenance pile up, and for
57	each developer. And you don't know when there is the moment to
58	create a new VM. But yes it seems to me one of the ways to go:
59	Before creating a production system you need to freeze portage,
60	create dev VMs, and prevent updates on the VMs, too. (Freezing aka
61	not updating has many disadvantages, of course.)
62
63
64	> - Your "prod" environment is too risky to upgrade, and you have
65	> difficulty crafting builds that run in every prod environment. I think
66	> this is fixable by making a build environment more like the prod
67	> environment.
68	> The challenge here is that if you have not done that (kept the
69	> copies of ebuilds around, the distfiles, etc) it can be challenging to
70	> "recreate" the existing older prod environments.
71	> But if you do the above thing (where devs build in a container)
72	> and you can make that container like the prod environments, then you
73	> can enable devs to build for the prod environment (in a container on
74	> their local machine) and get the outcome you want.
75
76	Not sure I got your point here. But yes, it comes down to what was
77	said above.
78
79
80	> - Understand that not upgrading prod is like, to use a finance term,
81	> picking up pennies in front of a steamroller. It's a great strategy,
82	> but eventually you will actually need to upgrade something. Maybe
83	> for a critical security issue, maybe for a feature. Having a build
84	> environment that matches prod is good practice, you should do it, but
85	> you should also really schedule maintenance for these prod nodes to
86	> get them upgraded. (For physical machines, I've often seen businesses
87	> just eat the risk and assume the machine will physically fail before
88	> the steamroller comes, but this is less true with virtualized
89	> environments that have longer real lifetimes.)
90
91	Yes, haha, I agree. And yes, I totally ignored backporting security
92	here, as well as the need that we might require a dependend
93	package upgrade (e.g. to fix a known memory leak). I left that out
94	for simlicity only.
95
96
97	> > So, what we've thought of so far is:
98	> >
99	> > (1) Keeping outdated developer boxes around and compile there. We
100	> > would freeze portage against accidental emerge sync by creating a
101	> > git branch in /var/db/repos/gentoo. This feels hacky and requires a
102	> > increating number of develper VMs. And sometimes we are hit by a
103	> > silent incompatibility we were not aware of.
104	>
105	> In general when you build binaries for some target, you should build
106	> on that target when possible. To me, this is the crux of your issue
107	> (that you do not) and one of the main causes of your pain.
108	> You will need to figure out a way to either:
109	> - Upgrade the older environments to new packages.
110	> - Build in copies of the older environments.
111	>
112	> I actually expect the second one to take 1-2 sprints (so like 1 engineer month?)
113	> - One sprint to make some scripts that makes a new production 'container'
114	> - One sprint to sort of integrate that container into your dev
115	> workflow, so devs build in the container instead of what they build in
116	> now.
117	>
118	> It might be more or less daunting depending on how many distinct
119	> (unique?) prod environments you have (how many containers will you
120	> actually need for good build coverage?), how experienced in Gentoo
121	> your developers are, and how many artifacts from prod you have.
122	> - A few crazy ideas are like:
123	> - Snapshot an existing prod machine, strip of it machine-specific
124	> bits, and use that as your container.
125	> - Use quickpkg to generate a bunch of bin pkgs from a prod machine,
126	> use that to bootstrap a container.
127	> - Probably some other exciting ideas on the list ;)
128
129	Thanks for the enthusiasm on it. ;-) Well:
130
131	We cannot build (develop) on that exact target. Imagine hardware
132	being sold to customers. They just want/need a software update of
133	our app.
134
135	And, unfortunatelly we don't have hardware clones of all the
136	different customer's hardware at ours to build, test etc.
137
138	So, we come back on the question how to have a solid LTS-like
139	software OS / stack onto which newly compiled developer apps can be
140	distributed and just work. And all this in Gentoo. :-)
141
142
143	> > (2) Using Ubuntu LTS for production and Gentoo for development is
144	> > hit by subtile libjpeg incompatibilites and such.
145	>
146	> I would advise, if possible, to make dev and prod as similar as
147	> possible[1]. I'd be curious what blockers you think there are to this
148	> pattern.
149	> Remember that "dev" is not "whatever your devs are using" but is
150	> ideally some maintained environment; segmented from their daily driver
151	> computer (somehow).
152
153	That is again VMs per "release" and per dev, right? See above "way
154	(1)".
155
156
157	> > (3) Distributing apps as VMs or docker: Even those tools advance and
158	> > become incompatible, right? And not suitable when for smaller Arm
159	> > devices.
160	>
161	> I think if your apps are small and self-contained and easily rebuilt,
162	> your (3) and (4) can be workable.
163	>
164	> If you need 1000 dependencies at runtime, your containers are going to
165	> be expensive to build, expensive to maintain, you are gonna have to
166	> build them often (for security issues), it can be challenging to
167	> support incremental builds and incremental updates...you generally
168	> want a clearer problem statement to adopt this pain. Two problem
169	> statements that might be worth it are below ;)
170	>
171	> If you told me you had 100 different production environments, or
172	> needed to support 12 different OSes, I'd tell you to use containers
173	> (or similar)
174	> If you told me you didn't control your production environment (because
175	> users installed the software wherever) I'd tell you use containers (or
176	> similar)
177	>
178	> >
179	> > (4) Flatpak: No experience, does it work well?
180	>
181	> Flatpak is conceptually similar to your (3). I know you are basically
182	> asking "does it work" and the answer is "probably", but see the other
183	> questions for (3). I suspect it's less about "does it work" and more
184	> about "is some container deployment thing really a great idea."
185
186	Well thanks for your comments on containers and flatpak. It's
187	motivating to investigate that further.
188
189	Admittedly, we've been sticking to natively built apps for reasons
190	that might not be relevant these days. (Hardware bound apps, bus
191	systems etc, performance reasons on IoT like devices, no real
192	experience in lean containers yet, only Qemu.)
193
194
195	> Peter's comment about basically running your own fork of gentoo.git
196	> and sort of 'importing the updates' is workable. Google did this for
197	> debian testing (called project Rodete)[2]. I can't say it's a
198	> particularly cheap solution (significant automation and testing
199	> required) but I think as long as you are keeping up (I would advise
200	> never falling more than 365d behind time.now() in your fork) then I
201	> think it provides some benefits.
202	> - You control when you take updates.
203	> - You want to stay "close" to time.now() in the tree, since a
204	> rolling distro is how things are tested.
205	> - This buys you 365d or so to fix any problem you find.
206	> - It nominally requires that you test against ::gentoo and
207	> ::your-gentoo-fork, so you find problems in ::gentoo before they are
208	> pulled into your fork, giving you a heads up that you need to put work
209	> in.
210
211	I haven't commented on Peter yet but yes I'll have a look on what he
212	added. Something tells me that distributing apps in a container
213	might be the cheaper way for us. We'll see.
214
215
216	> [0] FWIW this is basically what #gentoo-infra does on our boxes and
217	> it's terrible and I would not recommend it to most people in the
218	> modern era. Upgrade your stuff regularly.
219	> [1] When I was at Google we had a hilarious outage because someone
220	> switched login managers (gdm vs kdm) and kdm had a different default
221	> umask somehow? Anyway it resulted in a critical component having the
222	> wrong permissions and it caused a massive outage (luckily we had
223	> sufficient redundancy that it was not user visible) but it was one of
224	> the scariest outages I had ever seen. I was in charge of investigating
225	> (being on the dev OS team at the time) and it was definitely very
226	> difficult to figure out "what changed" to produce the bad build. We
227	> stopped building on developer workstations soon after, FWIW.
228	> [2] https://cloud.google.com/blog/topics/developers-practitioners/how-google-got-to-rolling-linux-releases-for-desktops
229
230	Thanks for sharing! Very interesting insights.
231
232	To sum up:
233
234	You described interesting ways to create and control own releases of
235	Gentoo. So production and developer systems could be aligned on
236	that. The effort depends.
237
238	Another way is containers.