From: Daniel Cordero <gentoo.catalyst@xxoo.ws>
To: gentoo-catalyst@lists.gentoo.org
Cc: gentoo-releng@lists.gentoo.org
Subject: Re: [gentoo-catalyst] catalyst changes for improving automation
Date: Wed, 4 Nov 2020 10:46:54 +0000 [thread overview]
Message-ID: <20201104104654.GB3468@dysnomia.localdomain> (raw)
In-Reply-To: <CAEdQ38GLe0o9bnRVauuUkAjkCtmqb8MveZ4HFpmFvg3EgORWZA@mail.gmail.com>
On Tue, Nov 03, 2020 at 01:19:51PM -0500, Matt Turner wrote:
> On Tue, Nov 3, 2020 at 5:56 AM Daniel Cordero wrote:
> >
> > On Mon, Nov 02, 2020 at 10:44:07PM -0500, Matt Turner wrote:
> > > The catalyst-auto automation scripts live in a repo separate from
> > > catalyst. That increases the difficulty of changing catalyst's
> > > interface, and it doesn't seem to offer any advantages otherwise.
> > > (Keeping build specs in a separate repo allows them to be updated
> > > independent of catalyst and that is valuable). Additionally, since the
> > > primary way catalyst is used is via this automation, it makes sense to
> > > support this workflow in catalyst directly.
> > >
> >
> > What would be more heavily impacted are those users who may not already
> > have infra set up to do builds or just starting out using catalyst for
> > the first time and haven't written their own automation.
> >
> > I suggest prioritising the collection of up-to-date documentation,
> > especially regarding running catalyst manually, since it'll be
> > completely different to the literature that's currently out there.
>
> I'm a bit unsure what you mean. Do you suggest prioritizing
> documenting the current method of running catalyst before changing it?
>
I'm suggesting that documentation is more important than any trivial changes
to catalyst, especially with the large amount of changes that have
happened recently. We'll still be running scripts on top of catalyst
that can handle these tasks on a day-to-day basis.
> > > But to get there, there are some changes to catalyst that I think are
> > > improvements on their own and simplify the path to integrating
> > > automation capabilities directly into catalyst. That's what I'd like
> > > to discuss here.
> > >
> > > I'd like to:
> > >
> > > 1) Replace the custom .spec file format with TOML
> > >
> >
> > Fine. Aside from the extra quotes and commas, I'd be happy with any well
> > defined format that can handle strings and lists properly.
> >
> > > 2) Combine .spec file sequences (e.g., stage1 -> stage2 -> stage3 ->
> > > livecd-stage1 -> livecd-stage2) into a single file. I suggest naming
> > > this a ".build" file. This will also allow us to remove the redundant
> > > information that currently has to be specified in stage1.spec,
> > > stage2.spec, stage3.spec, like rel_type, version, profile, etc. It
> > > also means that we remove the nonsensical ability to change settings
> > > from one stage to the next that should not change (e.g., rel_type,
> > > version).
> > >
> >
> > How would a target that depends on a different rel_type work? Forks in
> > the dependency tree.
> >
>
> I haven't given that a lot of thought yet, but it's something I would
> like to have a plan for.
>
> We build 32-bit and 64-bit systemd and non-systemd stages on SPARC, as
> well as a bootable ISO.
>
> 32-bit systemd: stage1 -> stage3
> 32-bit non-systemd: stage1 -> stage3
> 64-bit systemd: stage1 -> stage3
> 64-bit non-systemd: stage1 -> stage3 -> livecd-stage1 -> livecd-stage2
> (We skip stage2)
>
> This means that we have some build chains that are entirely
> independent from one another and could actually run in parallel. E.g.,
> a 32-bit build could happen at the same time a 64-bit build runs
> without any conflicts. Our SPARC system has 256 threads, so it would
> like to build in parallel if possible.
>
> Similarly, a stage1 build from one of the 32-bit build chains could
> happen in parallel with a stage3 build from the other. We wouldn't
> want to run the same type of build concurrently if they share a binary
> package cache, because we would inevitably spend CPU cycles doing
> duplicate work. E.g., the systemd stage3 build running in parallel
> with the non-systemd stage3.
>
> Whether all of those build chains should be specified in the same
> ".build" file... I don't know. It seems like it could get a bit
> unwieldy.
>
> Maybe we could have a top-level ".build" file that references each of
> these build chains, described in other files? If we did that, that
> would certainly allow us to specify a different rel_type per chain.
>
> I'm not aware of cases where we'd want different rel_types in the same
> chain. Do you know of such a case?
>
Well, rel_type is just a text field. I use it to create a server
(non-GUI) systemd stage4 and also a full KDE Plasma/systemd stage4.
They're both systemd stages, but they would otherwise use the same
output tarball name, so they get separated out into their own rel_type.
https://wiki.gentoo.org/wiki/File:Substrate_Stage_Paths.svg
Do both target chains define the stage1/3 without rebuilding it multiple
times? I imagine that a singular .spec file will still be runnable, but
I am not really in a position to implement a dependency graph calculator
into catalyst.
> > > 3) Add ability to denote which stage builds produce artifacts we care
> > > about (and want to save and/or upload) and which are just temporary.
> > > If they're temporary (e.g., a stage1 build) we can delete the artifact
> > > after the build sequence has no further use of it, and we can skip
> > > compressing the result, etc.
> > >
> >
> > This feature should (haven't tested) already exist - it's just not
> > documented.
> >
> > compression_mode: rsync
> > options=['seedcache']
>
> Hah! I was completely unaware of this. Thanks.
>
I only figured this out because I've been so deep into the compression
code.
> > or don't call 'capture' and/or 'remove_chroot' in action_/finish_sequence.
> >
> > >
> > > To that end, I'm starting by figuring out what I would like the new
> > > spec file format to look like. Below are some open questions and then
> > > a strawman new-style spec file.
> > >
> > > • The .spec files in releng.git are really templates that are not
> > > directly usable without sed'ing @REPO_DIR@ and @TIMESTAMP@. It would
> > > be nice if they were directly usable as that would reduce confusion
> > > from users.
> > > • Can we make them directly usable?
> > > • Perhaps we can make catalyst handle the replacements directly?
> > > • Calculating @TIMESTAMP@ is trivially doable—we do it today (see below)
> >
> > Maybe a strftime() template, or even fstring-like tokens?
> > (e.g. "{year}-{month}-{day}")
>
> One goal I have is to make it more transparent what is actually in a
> particular stage tarball or ISO and along with that to make it easier
> to reproduce the result.
>
> Obviously we'll want to keep the ability to specify a particular
> version, as you describe, but I think for Gentoo releases we will want
> to continue using a timestamp that's unambiguously tied to the git
> SHA1 of gentoo.git as is possible.
>
> > > • We could configure @REPO_DIR@ in catalyst.conf and let catalyst
> > > do the replacement, or we could just make the field relative to some
> > > path specified in catalyst.conf?
> > >
> >
> > While nice to have, I don't agree with locking users into a particular
> > repository layout.
>
> Can you explain what you mean? I don't know how what I said would
> require a particular repository layout.
>
> Perhaps you're confused by the @REPO_DIR@ name? It is the path to the
> releng.git repository (containing the .specs and the /etc/portage/
> files) on the build machine and is not in any way connected with the
> ebuild repositories.
>
I was just thinking that there could be more files outside of @REPO_DIR@
or /var/tmp/catalyst (or whereever) that may need to be referenced.
In practice, this might be limited; I have been wanting a feature like
this to exist - as long at it's configurable enough.
For me, I'd really just like paths to be relative to the current working
directory...
> The name predates my involvement, so don't blame me :)
>
> > > • In the current automation scripts, we generate a value for
> > > @TIMESTAMP@ from the git HEAD used in creating the snapshot.
> > > • Would be nice to remove the dependence on the squashfs snapshot
> > > generation—not difficult to do
> > >
> >
> > I have no comment on this.
> >
> > > • Can we generate and upload a .build file with replacements done to
> > > make stage builds more easily reproducible? Seems easy.
> > >
> >
> > These can just be artifacts from the build.
>
> Yes, that's what I'm thinking too.
>
next prev parent reply other threads:[~2020-11-04 10:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-03 3:44 [gentoo-catalyst] catalyst changes for improving automation Matt Turner
2020-11-03 10:54 ` Daniel Cordero
2020-11-03 18:19 ` Matt Turner
2020-11-04 10:46 ` Daniel Cordero [this message]
2020-11-03 18:36 ` Matt Turner
2020-11-03 13:04 ` Brian Dolbec
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201104104654.GB3468@dysnomia.localdomain \
--to=gentoo.catalyst@xxoo.ws \
--cc=gentoo-catalyst@lists.gentoo.org \
--cc=gentoo-releng@lists.gentoo.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox