Re: [gentoo-dev] Gentoo LTS or: proper backward compatibility? - gentoo-dev

From:	Alec Warner <antarus@g.o>
To:	gentoo-dev@l.g.o
Subject:	Re: [gentoo-dev] Gentoo LTS or: proper backward compatibility?
Date:	Mon, 02 Jan 2023 20:24:13
Message-Id:	`CAAr7Pr-30DQr7aY+SbOAX5HNspPJrtYkwBjL-AqmpLKjJxetPw@mail.gmail.com`
In Reply to:	[gentoo-dev] Gentoo LTS or: proper backward compatibility? by m1027

1

On Mon, Jan 2, 2023 at 4:48 AM m1027 <m1027@××××××.net> wrote:

2

>

3

> Hi and happy new year.

4

>

5

> When we create apps on Gentoo they become easily incompatible for

6

> older Gentoo systems in production where unattended remote world

7

> updates are risky. This is due to new glibc, openssl-3 etc.

8

9

I wrote a very long reply, but I've removed most of it: I basically

10

have a few questions, and then some comments:

11

12

I don't quite grasp your problem statement, so I will repeat what I

13

think it is and you can confirm / deny.

14

  - Your devs build using gentoo synced against some recent tree, they

15

have recent packages, and they build some software that you deploy to

16

prod.

17

  - Your prod machines are running gentoo synced against some recent

18

tree, but not upgraded (maybe only glsa-check runs) and so they are

19

running 'old' packages because you are afraid to update them[0]

20

  - Your software builds OK in dev, but when you deploy it in prod it

21

breaks, because prod is really old, and your developments are using

22

packages that are too new.

23

24

My main feedback here is:

25

 - Your "build" environment should be like prod. You said you didn't

26

want to build "developer VMs" but I am unsure why. For example I run

27

Ubuntu and I do all my gentoo development (admittedly very little

28

these days)

29

   in a systemd-nspawn container, and I have a few shell scripts to

30

mount everything and set it up (so it has a tree snapshot, some git

31

repos, some writable space etc.)

32

 - Your "prod" environment is too risky to upgrade, and you have

33

difficulty crafting builds that run in every prod environment. I think

34

this is fixable by making a build environment more like the prod

35

environment.

36

    The challenge here is that if you have not done that (kept the

37

copies of ebuilds around, the distfiles, etc) it can be challenging to

38

"recreate" the existing older prod environments.

39

    But if you do the above thing (where devs build in a container)

40

and you can make that container like the prod environments, then you

41

can enable devs to build for the prod environment (in a container on

42

their local machine) and get the outcome you want.

43

 - Understand that not upgrading prod is like, to use a finance term,

44

picking up pennies in front of a steamroller. It's a great strategy,

45

but eventually you will actually *need* to upgrade something. Maybe

46

for a critical security issue, maybe for a feature. Having a build

47

environment that matches prod is good practice, you should do it, but

48

you should also really schedule maintenance for these prod nodes to

49

get them upgraded. (For physical machines, I've often seen businesses

50

just eat the risk and assume the machine will physically fail before

51

the steamroller comes, but this is less true with virtualized

52

environments that have longer real lifetimes.)

53

54

>

55

> So, what we've thought of so far is:

56

>

57

> (1) Keeping outdated developer boxes around and compile there. We

58

> would freeze portage against accidental emerge sync by creating a

59

> git branch in /var/db/repos/gentoo. This feels hacky and requires a

60

> increating number of develper VMs. And sometimes we are hit by a

61

> silent incompatibility we were not aware of.

62

63

In general when you build binaries for some target, you should build

64

on that target when possible. To me, this is the crux of your issue

65

(that you do not) and one of the main causes of your pain.

66

You will need to figure out a way to either:

67

 - Upgrade the older environments to new packages.

68

 - Build in copies of the older environments.

69

70

I actually expect the second one to take 1-2 sprints (so like 1 engineer month?)

71

 - One sprint to make some scripts that makes a new production 'container'

72

 - One sprint to sort of integrate that container into your dev

73

workflow, so devs build in the container instead of what they build in

74

now.

75

76

It might be more or less daunting depending on how many distinct

77

(unique?) prod environments you have (how many containers will you

78

actually need for good build coverage?), how experienced in Gentoo

79

your developers are, and how many artifacts from prod you have.

80

 - A few crazy ideas are like:

81

   - Snapshot an existing prod machine, strip of it machine-specific

82

bits, and use that as your container.

83

   - Use quickpkg to generate a bunch of bin pkgs from a prod machine,

84

use that to bootstrap a container.

85

   - Probably some other exciting ideas on the list ;)

86

87

>

88

> (2) Using Ubuntu LTS for production and Gentoo for development is

89

> hit by subtile libjpeg incompatibilites and such.

90

91

I would advise, if possible, to make dev and prod as similar as

92

possible[1]. I'd be curious what blockers you think there are to this

93

pattern.

94

Remember that "dev" is not "whatever your devs are using" but is

95

ideally some maintained environment; segmented from their daily driver

96

computer (somehow).

97

98

>

99

> (3) Distributing apps as VMs or docker: Even those tools advance and

100

> become incompatible, right? And not suitable when for smaller Arm

101

> devices.

102

103

I think if your apps are small and self-contained and easily rebuilt,

104

your (3) and (4) can be workable.

105

106

If you need 1000 dependencies at runtime, your containers are going to

107

be expensive to build, expensive to maintain, you are gonna have to

108

build them often (for security issues), it can be challenging to

109

support incremental builds and incremental updates...you generally

110

want a clearer problem statement to adopt this pain. Two problem

111

statements that might be worth it are below ;)

112

113

If you told me you had 100 different production environments, or

114

needed to support 12 different OSes, I'd tell you to use containers

115

(or similar)

116

If you told me you didn't control your production environment (because

117

users installed the software wherever) I'd tell you use containers (or

118

similar)

119

120

>

121

> (4) Flatpak: No experience, does it work well?

122

123

Flatpak is conceptually similar to your (3). I know you are basically

124

asking "does it work" and the answer is "probably", but see the other

125

questions for (3). I suspect it's less about "does it work" and more

126

about "is some container deployment thing really a great idea."

127

128

>

129

> (5) Inventing a full fledged OTA Gentoo OS updater and distribute

130

> that together with the apps... Nah.

131

132

This sounds like a very expensive solution that is likely rife with

133

very exciting security problems, fwiw.

134

135

>

136

> Hm... Comments welcome.

137

>

138

139

Peter's comment about basically running your own fork of gentoo.git

140

and sort of 'importing the updates' is workable. Google did this for

141

debian testing (called project Rodete)[2]. I can't say it's a

142

particularly cheap solution (significant automation and testing

143

required) but I think as long as you are keeping up (I would advise

144

never falling more than 365d behind time.now() in your fork) then I

145

think it provides some benefits.

146

  - You control when you take updates.

147

  - You want to stay "close" to time.now() in the tree, since a

148

rolling distro is how things are tested.

149

  - This buys you 365d or so to fix any problem you find.

150

  - It nominally requires that you test against ::gentoo and

151

::your-gentoo-fork, so you find problems in ::gentoo before they are

152

pulled into your fork, giving you a heads up that you need to put work

153

in.

154

155

[0] FWIW this is basically what #gentoo-infra does on our boxes and

156

it's terrible and I would not recommend it to most people in the

157

modern era. Upgrade your stuff regularly.

158

[1] When I was at Google we had a hilarious outage because someone

159

switched login managers (gdm vs kdm) and kdm had a different default

160

umask somehow? Anyway it resulted in a critical component having the

161

wrong permissions and it caused a massive outage (luckily we had

162

sufficient redundancy that it was not user visible) but it was one of

163

the scariest outages I had ever seen. I was in charge of investigating

164

(being on the dev OS team at the time) and it was definitely very

165

difficult to figure out "what changed" to produce the bad build. We

166

stopped building on developer workstations soon after, FWIW.

167

[2] https://cloud.google.com/blog/topics/developers-practitioners/how-google-got-to-rolling-linux-releases-for-desktops

168

169

> Thanks

170

>

171

>

Gentoo Archives: gentoo-dev

Replies

1	On Mon, Jan 2, 2023 at 4:48 AM m1027 <m1027@××××××.net> wrote:
2	>
3	> Hi and happy new year.
4	>
5	> When we create apps on Gentoo they become easily incompatible for
6	> older Gentoo systems in production where unattended remote world
7	> updates are risky. This is due to new glibc, openssl-3 etc.
8
9	I wrote a very long reply, but I've removed most of it: I basically
10	have a few questions, and then some comments:
11
12	I don't quite grasp your problem statement, so I will repeat what I
13	think it is and you can confirm / deny.
14	- Your devs build using gentoo synced against some recent tree, they
15	have recent packages, and they build some software that you deploy to
16	prod.
17	- Your prod machines are running gentoo synced against some recent
18	tree, but not upgraded (maybe only glsa-check runs) and so they are
19	running 'old' packages because you are afraid to update them[0]
20	- Your software builds OK in dev, but when you deploy it in prod it
21	breaks, because prod is really old, and your developments are using
22	packages that are too new.
23
24	My main feedback here is:
25	- Your "build" environment should be like prod. You said you didn't
26	want to build "developer VMs" but I am unsure why. For example I run
27	Ubuntu and I do all my gentoo development (admittedly very little
28	these days)
29	in a systemd-nspawn container, and I have a few shell scripts to
30	mount everything and set it up (so it has a tree snapshot, some git
31	repos, some writable space etc.)
32	- Your "prod" environment is too risky to upgrade, and you have
33	difficulty crafting builds that run in every prod environment. I think
34	this is fixable by making a build environment more like the prod
35	environment.
36	The challenge here is that if you have not done that (kept the
37	copies of ebuilds around, the distfiles, etc) it can be challenging to
38	"recreate" the existing older prod environments.
39	But if you do the above thing (where devs build in a container)
40	and you can make that container like the prod environments, then you
41	can enable devs to build for the prod environment (in a container on
42	their local machine) and get the outcome you want.
43	- Understand that not upgrading prod is like, to use a finance term,
44	picking up pennies in front of a steamroller. It's a great strategy,
45	but eventually you will actually need to upgrade something. Maybe
46	for a critical security issue, maybe for a feature. Having a build
47	environment that matches prod is good practice, you should do it, but
48	you should also really schedule maintenance for these prod nodes to
49	get them upgraded. (For physical machines, I've often seen businesses
50	just eat the risk and assume the machine will physically fail before
51	the steamroller comes, but this is less true with virtualized
52	environments that have longer real lifetimes.)
53
54	>
55	> So, what we've thought of so far is:
56	>
57	> (1) Keeping outdated developer boxes around and compile there. We
58	> would freeze portage against accidental emerge sync by creating a
59	> git branch in /var/db/repos/gentoo. This feels hacky and requires a
60	> increating number of develper VMs. And sometimes we are hit by a
61	> silent incompatibility we were not aware of.
62
63	In general when you build binaries for some target, you should build
64	on that target when possible. To me, this is the crux of your issue
65	(that you do not) and one of the main causes of your pain.
66	You will need to figure out a way to either:
67	- Upgrade the older environments to new packages.
68	- Build in copies of the older environments.
69
70	I actually expect the second one to take 1-2 sprints (so like 1 engineer month?)
71	- One sprint to make some scripts that makes a new production 'container'
72	- One sprint to sort of integrate that container into your dev
73	workflow, so devs build in the container instead of what they build in
74	now.
75
76	It might be more or less daunting depending on how many distinct
77	(unique?) prod environments you have (how many containers will you
78	actually need for good build coverage?), how experienced in Gentoo
79	your developers are, and how many artifacts from prod you have.
80	- A few crazy ideas are like:
81	- Snapshot an existing prod machine, strip of it machine-specific
82	bits, and use that as your container.
83	- Use quickpkg to generate a bunch of bin pkgs from a prod machine,
84	use that to bootstrap a container.
85	- Probably some other exciting ideas on the list ;)
86
87	>
88	> (2) Using Ubuntu LTS for production and Gentoo for development is
89	> hit by subtile libjpeg incompatibilites and such.
90
91	I would advise, if possible, to make dev and prod as similar as
92	possible[1]. I'd be curious what blockers you think there are to this
93	pattern.
94	Remember that "dev" is not "whatever your devs are using" but is
95	ideally some maintained environment; segmented from their daily driver
96	computer (somehow).
97
98	>
99	> (3) Distributing apps as VMs or docker: Even those tools advance and
100	> become incompatible, right? And not suitable when for smaller Arm
101	> devices.
102
103	I think if your apps are small and self-contained and easily rebuilt,
104	your (3) and (4) can be workable.
105
106	If you need 1000 dependencies at runtime, your containers are going to
107	be expensive to build, expensive to maintain, you are gonna have to
108	build them often (for security issues), it can be challenging to
109	support incremental builds and incremental updates...you generally
110	want a clearer problem statement to adopt this pain. Two problem
111	statements that might be worth it are below ;)
112
113	If you told me you had 100 different production environments, or
114	needed to support 12 different OSes, I'd tell you to use containers
115	(or similar)
116	If you told me you didn't control your production environment (because
117	users installed the software wherever) I'd tell you use containers (or
118	similar)
119
120	>
121	> (4) Flatpak: No experience, does it work well?
122
123	Flatpak is conceptually similar to your (3). I know you are basically
124	asking "does it work" and the answer is "probably", but see the other
125	questions for (3). I suspect it's less about "does it work" and more
126	about "is some container deployment thing really a great idea."
127
128	>
129	> (5) Inventing a full fledged OTA Gentoo OS updater and distribute
130	> that together with the apps... Nah.
131
132	This sounds like a very expensive solution that is likely rife with
133	very exciting security problems, fwiw.
134
135	>
136	> Hm... Comments welcome.
137	>
138
139	Peter's comment about basically running your own fork of gentoo.git
140	and sort of 'importing the updates' is workable. Google did this for
141	debian testing (called project Rodete)[2]. I can't say it's a
142	particularly cheap solution (significant automation and testing
143	required) but I think as long as you are keeping up (I would advise
144	never falling more than 365d behind time.now() in your fork) then I
145	think it provides some benefits.
146	- You control when you take updates.
147	- You want to stay "close" to time.now() in the tree, since a
148	rolling distro is how things are tested.
149	- This buys you 365d or so to fix any problem you find.
150	- It nominally requires that you test against ::gentoo and
151	::your-gentoo-fork, so you find problems in ::gentoo before they are
152	pulled into your fork, giving you a heads up that you need to put work
153	in.
154
155	[0] FWIW this is basically what #gentoo-infra does on our boxes and
156	it's terrible and I would not recommend it to most people in the
157	modern era. Upgrade your stuff regularly.
158	[1] When I was at Google we had a hilarious outage because someone
159	switched login managers (gdm vs kdm) and kdm had a different default
160	umask somehow? Anyway it resulted in a critical component having the
161	wrong permissions and it caused a massive outage (luckily we had
162	sufficient redundancy that it was not user visible) but it was one of
163	the scariest outages I had ever seen. I was in charge of investigating
164	(being on the dev OS team at the time) and it was definitely very
165	difficult to figure out "what changed" to produce the bad build. We
166	stopped building on developer workstations soon after, FWIW.
167	[2] https://cloud.google.com/blog/topics/developers-practitioners/how-google-got-to-rolling-linux-releases-for-desktops
168
169	> Thanks
170	>
171	>