Gentoo Archives: gentoo-user

From: Martin Vaeth <martin@×××××.de>
To: gentoo-user@l.g.o
Subject: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning
Date: Sun, 08 Jul 2018 08:31:18
Message-Id: slrnpk3itt.u61.martin@clover.invalid
In Reply to: Re: [gentoo-user] Re: Re[4]: Re: Portage, git and shallow cloning by Rich Freeman
1 Rich Freeman <rich0@g.o> wrote:
2 >> I was speaking about gentoo's git repository, of course
3 >> (the one which was attacked on github), not about a Frankensteined one
4 >> with metadata history filling megabytes of disk space unnecessarily.
5 >> Who has that much disk space to waste?
6 >
7 > Doesn't portage create that metadata anyway when you run it
8
9 You should better have it created by egencache in portage-postsyncd;
10 and even more you should download some other repositories as well
11 (news announcements, GLSA, dtd, xml-schema) which are maintained
12 independently, see e.g.
13 https://github.com/vaeth/portage-postsyncd-mv
14
15 It is the Gentoo way: Download only the sources and build it from there.
16 That's also a question of mentality and why I think most gentoo users
17 who use git would prefer that way.
18
19 > negating any space savings at the cost of CPU to regenerate the cache?
20
21 It's the *history* of the metadata which matters here:
22 Since every changed metadata file requires a fraction of a second,
23 one can estimate rather well that several ten thousand files are
24 changed hourly/daily/weekly (the frequency depending mainly on eclass
25 changes: One change in some eclass requires a change for practically
26 every version of every package) so that the history of metadata changed
27 produced by this over time is enormous. This history, of course,
28 is completely useless and stored completely in vain.
29 One of the weaknesses of git is that it is impossible, by design,
30 to omit such superfluous history selectively (once the files *are*
31 maintained by git).
32
33 >> For the official git repository your assertions are simply false,
34 >> as you apprently admit: It is currently not possible to use the
35 >> official git repo (or the github clone of it which was attacked)
36 >> in a secure manner.
37 >
38 > Sure, but this also doesn't support signature verification at all
39 > [...] so your points still don't apply.
40
41 Hu? This actually *was* my point.
42
43 BTW, portage might easily support signature verification if just
44 distribution of the developers' public keys would be properly
45 maintained (e.g. via gkeys or simpler via some package):
46 After all, gentoo infra should always have an up-to-date list of
47 these keys anyway.
48 (If they don't, it would make it even more important to use the
49 source repo instead of trusting a signature which is given
50 without sufficient verification)
51
52 >> Your implicit claim is untrue. rsync - as used by portage - always
53 >> transfers whole files, only.
54 >
55 > rsync is capable of transferring partial files.
56
57 Yes, and portage is explicitly disabling this. (It costs a lot of
58 server CPU time and does not save much transfer data if the files
59 are small, because a lot of hashes have to be transferred
60 (and calculated - CPU-time!) instead.)
61
62 > However, this is based on offsets from the start of the file
63
64 There are new algorithms which support also detection of insertions
65 and deletions via rolling hashes (e.g. for deduplicating filesystems).
66 Rsync is using quite an advanced algorithm as well, but I would
67 need to recheck its features.
68
69 Anyway, it plays no role for our discussion, because for such
70 small files it hardly matters, and portage is disabling
71 said algorithm anyway.
72
73 > "The council does not require that ChangeLogs be generated or
74 > distributed through the rsync system. It is at the discretion of our
75 > infrastructure team whether or not this service continues."
76
77 The formulation already makes it clear that one did not want to
78 put pressure on infra, and at that time it was expected that
79 every user would switch to git anyway.
80 At that time also the gkeys project was very active, and git was
81 (besides webrsync) the only expected way to get checksums for the
82 full tree. In particular, rsync was inherently insecure.
83
84 The situation has changed meanwhile on both sides: gkeys was
85 apparently practically abandoned, and instead gemato was introduced
86 and is actively supported. That suddenly the gentoo-mirror repository
87 is more secure than the git repository is also a side effect of
88 gemato, because only for this the infra keys are now suddenly
89 distributed in a package.
90
91 > If you're using squashfs git pull probably isn't the right solution for you.
92
93 Exactly. That's why I completely disagree with portage's regression
94 of replacing the previously working solution by the only partially
95 working "git pull".
96
97 >> 4. Even if the user made the mistake to edit a file, portage should
98 >> not just die on syncing.
99 >
100 > emerge --sync won't die in a situation like in general.
101
102 It does: git push refuses to start if there are uncommitted changes.
103
104 > but I don't think the correct default in this case should be
105 > to just wipe out the user's changes.
106
107 I do: Like for rsync a user should not do changes to the distributed
108 tree (unless he makes a PR) but in an overlay; otherwise he will
109 permanently have outdated files which are not correctly updated.
110 *If* a user wants such changes, he should correctly use git and commit.
111
112 But I am not against to make this an opt-in option for enabling it
113 by a developer (or advanced user) who is afraid to eventually lose
114 a change for the case that he forgot to commit before syncing.
115
116 Anyway, this has nothing to do with "git pull" vs. "git fetch + git reset",
117 but is only a question whether the option "--hard" or "--merge" should be
118 used for "git reset".
119
120 One certainly could also live with "reset --merge" as the default (or even
121 only option), as it was previously in portage, but the change to "pull"
122 was IMHO a regression.

Replies