1 |
Many thanks for your detailed thoughs for sharing the rich |
2 |
experiences on this! See below: |
3 |
|
4 |
antarus: |
5 |
|
6 |
> On Mon, Jan 2, 2023 at 4:48 AM m1027 <m1027@××××××.net> wrote: |
7 |
> > |
8 |
> > Hi and happy new year. |
9 |
> > |
10 |
> > When we create apps on Gentoo they become easily incompatible for |
11 |
> > older Gentoo systems in production where unattended remote world |
12 |
> > updates are risky. This is due to new glibc, openssl-3 etc. |
13 |
> |
14 |
> I wrote a very long reply, but I've removed most of it: I basically |
15 |
> have a few questions, and then some comments: |
16 |
> |
17 |
> I don't quite grasp your problem statement, so I will repeat what I |
18 |
> think it is and you can confirm / deny. |
19 |
> |
20 |
> - Your devs build using gentoo synced against some recent tree, they |
21 |
> have recent packages, and they build some software that you deploy to |
22 |
> prod. |
23 |
|
24 |
Yes. |
25 |
|
26 |
> - Your prod machines are running gentoo synced against some recent |
27 |
> tree, but not upgraded (maybe only glsa-check runs) and so they are |
28 |
> running 'old' packages because you are afraid to update them[0] |
29 |
|
30 |
Well, we did sync (without updading packages) in the past but today we |
31 |
even fear to sync against recent trees. Without going into details, |
32 |
as a rule of thumb, weekly or monthly sync + package updates work |
33 |
near to perfect. (It's cool to see what a good job emerge does on our |
34 |
own internal production systems.) Updating systems older than 12 |
35 |
months or so may, however, be a hugh task. And too risky for remote |
36 |
production systems of customers. |
37 |
|
38 |
|
39 |
> - Your software builds OK in dev, but when you deploy it in prod it |
40 |
> breaks, because prod is really old, and your developments are using |
41 |
> packages that are too new. |
42 |
|
43 |
Exactly. |
44 |
|
45 |
|
46 |
> My main feedback here is: |
47 |
> - Your "build" environment should be like prod. You said you didn't |
48 |
> want to build "developer VMs" but I am unsure why. For example I run |
49 |
> Ubuntu and I do all my gentoo development (admittedly very little |
50 |
> these days) |
51 |
> in a systemd-nspawn container, and I have a few shell scripts to |
52 |
> mount everything and set it up (so it has a tree snapshot, some git |
53 |
> repos, some writable space etc.) |
54 |
|
55 |
Okay, yes. That is way (1) I mentioned in my OP. It works indeed but |
56 |
has the mentioned drawbacks: VMs and maintenance pile up, and for |
57 |
each developer. And you don't know when there is the moment to |
58 |
create a new VM. But yes it seems to me one of the ways to go: |
59 |
*Before* creating a production system you need to freeze portage, |
60 |
create dev VMs, and prevent updates on the VMs, too. (Freezing aka |
61 |
not updating has many disadvantages, of course.) |
62 |
|
63 |
|
64 |
> - Your "prod" environment is too risky to upgrade, and you have |
65 |
> difficulty crafting builds that run in every prod environment. I think |
66 |
> this is fixable by making a build environment more like the prod |
67 |
> environment. |
68 |
> The challenge here is that if you have not done that (kept the |
69 |
> copies of ebuilds around, the distfiles, etc) it can be challenging to |
70 |
> "recreate" the existing older prod environments. |
71 |
> But if you do the above thing (where devs build in a container) |
72 |
> and you can make that container like the prod environments, then you |
73 |
> can enable devs to build for the prod environment (in a container on |
74 |
> their local machine) and get the outcome you want. |
75 |
|
76 |
Not sure I got your point here. But yes, it comes down to what was |
77 |
said above. |
78 |
|
79 |
|
80 |
> - Understand that not upgrading prod is like, to use a finance term, |
81 |
> picking up pennies in front of a steamroller. It's a great strategy, |
82 |
> but eventually you will actually *need* to upgrade something. Maybe |
83 |
> for a critical security issue, maybe for a feature. Having a build |
84 |
> environment that matches prod is good practice, you should do it, but |
85 |
> you should also really schedule maintenance for these prod nodes to |
86 |
> get them upgraded. (For physical machines, I've often seen businesses |
87 |
> just eat the risk and assume the machine will physically fail before |
88 |
> the steamroller comes, but this is less true with virtualized |
89 |
> environments that have longer real lifetimes.) |
90 |
|
91 |
Yes, haha, I agree. And yes, I totally ignored backporting security |
92 |
here, as well as the need that we might *require* a dependend |
93 |
package upgrade (e.g. to fix a known memory leak). I left that out |
94 |
for simlicity only. |
95 |
|
96 |
|
97 |
> > So, what we've thought of so far is: |
98 |
> > |
99 |
> > (1) Keeping outdated developer boxes around and compile there. We |
100 |
> > would freeze portage against accidental emerge sync by creating a |
101 |
> > git branch in /var/db/repos/gentoo. This feels hacky and requires a |
102 |
> > increating number of develper VMs. And sometimes we are hit by a |
103 |
> > silent incompatibility we were not aware of. |
104 |
> |
105 |
> In general when you build binaries for some target, you should build |
106 |
> on that target when possible. To me, this is the crux of your issue |
107 |
> (that you do not) and one of the main causes of your pain. |
108 |
> You will need to figure out a way to either: |
109 |
> - Upgrade the older environments to new packages. |
110 |
> - Build in copies of the older environments. |
111 |
> |
112 |
> I actually expect the second one to take 1-2 sprints (so like 1 engineer month?) |
113 |
> - One sprint to make some scripts that makes a new production 'container' |
114 |
> - One sprint to sort of integrate that container into your dev |
115 |
> workflow, so devs build in the container instead of what they build in |
116 |
> now. |
117 |
> |
118 |
> It might be more or less daunting depending on how many distinct |
119 |
> (unique?) prod environments you have (how many containers will you |
120 |
> actually need for good build coverage?), how experienced in Gentoo |
121 |
> your developers are, and how many artifacts from prod you have. |
122 |
> - A few crazy ideas are like: |
123 |
> - Snapshot an existing prod machine, strip of it machine-specific |
124 |
> bits, and use that as your container. |
125 |
> - Use quickpkg to generate a bunch of bin pkgs from a prod machine, |
126 |
> use that to bootstrap a container. |
127 |
> - Probably some other exciting ideas on the list ;) |
128 |
|
129 |
Thanks for the enthusiasm on it. ;-) Well: |
130 |
|
131 |
We cannot build (develop) on that exact target. Imagine hardware |
132 |
being sold to customers. They just want/need a software update of |
133 |
our app. |
134 |
|
135 |
And, unfortunatelly we don't have hardware clones of all the |
136 |
different customer's hardware at ours to build, test etc. |
137 |
|
138 |
So, we come back on the question how to have a solid LTS-like |
139 |
software OS / stack onto which newly compiled developer apps can be |
140 |
distributed and just work. And all this in Gentoo. :-) |
141 |
|
142 |
|
143 |
> > (2) Using Ubuntu LTS for production and Gentoo for development is |
144 |
> > hit by subtile libjpeg incompatibilites and such. |
145 |
> |
146 |
> I would advise, if possible, to make dev and prod as similar as |
147 |
> possible[1]. I'd be curious what blockers you think there are to this |
148 |
> pattern. |
149 |
> Remember that "dev" is not "whatever your devs are using" but is |
150 |
> ideally some maintained environment; segmented from their daily driver |
151 |
> computer (somehow). |
152 |
|
153 |
That is again VMs per "release" and per dev, right? See above "way |
154 |
(1)". |
155 |
|
156 |
|
157 |
> > (3) Distributing apps as VMs or docker: Even those tools advance and |
158 |
> > become incompatible, right? And not suitable when for smaller Arm |
159 |
> > devices. |
160 |
> |
161 |
> I think if your apps are small and self-contained and easily rebuilt, |
162 |
> your (3) and (4) can be workable. |
163 |
> |
164 |
> If you need 1000 dependencies at runtime, your containers are going to |
165 |
> be expensive to build, expensive to maintain, you are gonna have to |
166 |
> build them often (for security issues), it can be challenging to |
167 |
> support incremental builds and incremental updates...you generally |
168 |
> want a clearer problem statement to adopt this pain. Two problem |
169 |
> statements that might be worth it are below ;) |
170 |
> |
171 |
> If you told me you had 100 different production environments, or |
172 |
> needed to support 12 different OSes, I'd tell you to use containers |
173 |
> (or similar) |
174 |
> If you told me you didn't control your production environment (because |
175 |
> users installed the software wherever) I'd tell you use containers (or |
176 |
> similar) |
177 |
> |
178 |
> > |
179 |
> > (4) Flatpak: No experience, does it work well? |
180 |
> |
181 |
> Flatpak is conceptually similar to your (3). I know you are basically |
182 |
> asking "does it work" and the answer is "probably", but see the other |
183 |
> questions for (3). I suspect it's less about "does it work" and more |
184 |
> about "is some container deployment thing really a great idea." |
185 |
|
186 |
Well thanks for your comments on containers and flatpak. It's |
187 |
motivating to investigate that further. |
188 |
|
189 |
Admittedly, we've been sticking to natively built apps for reasons |
190 |
that might not be relevant these days. (Hardware bound apps, bus |
191 |
systems etc, performance reasons on IoT like devices, no real |
192 |
experience in lean containers yet, only Qemu.) |
193 |
|
194 |
|
195 |
> Peter's comment about basically running your own fork of gentoo.git |
196 |
> and sort of 'importing the updates' is workable. Google did this for |
197 |
> debian testing (called project Rodete)[2]. I can't say it's a |
198 |
> particularly cheap solution (significant automation and testing |
199 |
> required) but I think as long as you are keeping up (I would advise |
200 |
> never falling more than 365d behind time.now() in your fork) then I |
201 |
> think it provides some benefits. |
202 |
> - You control when you take updates. |
203 |
> - You want to stay "close" to time.now() in the tree, since a |
204 |
> rolling distro is how things are tested. |
205 |
> - This buys you 365d or so to fix any problem you find. |
206 |
> - It nominally requires that you test against ::gentoo and |
207 |
> ::your-gentoo-fork, so you find problems in ::gentoo before they are |
208 |
> pulled into your fork, giving you a heads up that you need to put work |
209 |
> in. |
210 |
|
211 |
I haven't commented on Peter yet but yes I'll have a look on what he |
212 |
added. Something tells me that distributing apps in a container |
213 |
might be the cheaper way for us. We'll see. |
214 |
|
215 |
|
216 |
> [0] FWIW this is basically what #gentoo-infra does on our boxes and |
217 |
> it's terrible and I would not recommend it to most people in the |
218 |
> modern era. Upgrade your stuff regularly. |
219 |
> [1] When I was at Google we had a hilarious outage because someone |
220 |
> switched login managers (gdm vs kdm) and kdm had a different default |
221 |
> umask somehow? Anyway it resulted in a critical component having the |
222 |
> wrong permissions and it caused a massive outage (luckily we had |
223 |
> sufficient redundancy that it was not user visible) but it was one of |
224 |
> the scariest outages I had ever seen. I was in charge of investigating |
225 |
> (being on the dev OS team at the time) and it was definitely very |
226 |
> difficult to figure out "what changed" to produce the bad build. We |
227 |
> stopped building on developer workstations soon after, FWIW. |
228 |
> [2] https://cloud.google.com/blog/topics/developers-practitioners/how-google-got-to-rolling-linux-releases-for-desktops |
229 |
|
230 |
Thanks for sharing! Very interesting insights. |
231 |
|
232 |
To sum up: |
233 |
|
234 |
You described interesting ways to create and control own releases of |
235 |
Gentoo. So production and developer systems could be aligned on |
236 |
that. The effort depends. |
237 |
|
238 |
Another way is containers. |