1 |
Rich Freeman <rich0@g.o> wrote: |
2 |
>> I was speaking about gentoo's git repository, of course |
3 |
>> (the one which was attacked on github), not about a Frankensteined one |
4 |
>> with metadata history filling megabytes of disk space unnecessarily. |
5 |
>> Who has that much disk space to waste? |
6 |
> |
7 |
> Doesn't portage create that metadata anyway when you run it |
8 |
|
9 |
You should better have it created by egencache in portage-postsyncd; |
10 |
and even more you should download some other repositories as well |
11 |
(news announcements, GLSA, dtd, xml-schema) which are maintained |
12 |
independently, see e.g. |
13 |
https://github.com/vaeth/portage-postsyncd-mv |
14 |
|
15 |
It is the Gentoo way: Download only the sources and build it from there. |
16 |
That's also a question of mentality and why I think most gentoo users |
17 |
who use git would prefer that way. |
18 |
|
19 |
> negating any space savings at the cost of CPU to regenerate the cache? |
20 |
|
21 |
It's the *history* of the metadata which matters here: |
22 |
Since every changed metadata file requires a fraction of a second, |
23 |
one can estimate rather well that several ten thousand files are |
24 |
changed hourly/daily/weekly (the frequency depending mainly on eclass |
25 |
changes: One change in some eclass requires a change for practically |
26 |
every version of every package) so that the history of metadata changed |
27 |
produced by this over time is enormous. This history, of course, |
28 |
is completely useless and stored completely in vain. |
29 |
One of the weaknesses of git is that it is impossible, by design, |
30 |
to omit such superfluous history selectively (once the files *are* |
31 |
maintained by git). |
32 |
|
33 |
>> For the official git repository your assertions are simply false, |
34 |
>> as you apprently admit: It is currently not possible to use the |
35 |
>> official git repo (or the github clone of it which was attacked) |
36 |
>> in a secure manner. |
37 |
> |
38 |
> Sure, but this also doesn't support signature verification at all |
39 |
> [...] so your points still don't apply. |
40 |
|
41 |
Hu? This actually *was* my point. |
42 |
|
43 |
BTW, portage might easily support signature verification if just |
44 |
distribution of the developers' public keys would be properly |
45 |
maintained (e.g. via gkeys or simpler via some package): |
46 |
After all, gentoo infra should always have an up-to-date list of |
47 |
these keys anyway. |
48 |
(If they don't, it would make it even more important to use the |
49 |
source repo instead of trusting a signature which is given |
50 |
without sufficient verification) |
51 |
|
52 |
>> Your implicit claim is untrue. rsync - as used by portage - always |
53 |
>> transfers whole files, only. |
54 |
> |
55 |
> rsync is capable of transferring partial files. |
56 |
|
57 |
Yes, and portage is explicitly disabling this. (It costs a lot of |
58 |
server CPU time and does not save much transfer data if the files |
59 |
are small, because a lot of hashes have to be transferred |
60 |
(and calculated - CPU-time!) instead.) |
61 |
|
62 |
> However, this is based on offsets from the start of the file |
63 |
|
64 |
There are new algorithms which support also detection of insertions |
65 |
and deletions via rolling hashes (e.g. for deduplicating filesystems). |
66 |
Rsync is using quite an advanced algorithm as well, but I would |
67 |
need to recheck its features. |
68 |
|
69 |
Anyway, it plays no role for our discussion, because for such |
70 |
small files it hardly matters, and portage is disabling |
71 |
said algorithm anyway. |
72 |
|
73 |
> "The council does not require that ChangeLogs be generated or |
74 |
> distributed through the rsync system. It is at the discretion of our |
75 |
> infrastructure team whether or not this service continues." |
76 |
|
77 |
The formulation already makes it clear that one did not want to |
78 |
put pressure on infra, and at that time it was expected that |
79 |
every user would switch to git anyway. |
80 |
At that time also the gkeys project was very active, and git was |
81 |
(besides webrsync) the only expected way to get checksums for the |
82 |
full tree. In particular, rsync was inherently insecure. |
83 |
|
84 |
The situation has changed meanwhile on both sides: gkeys was |
85 |
apparently practically abandoned, and instead gemato was introduced |
86 |
and is actively supported. That suddenly the gentoo-mirror repository |
87 |
is more secure than the git repository is also a side effect of |
88 |
gemato, because only for this the infra keys are now suddenly |
89 |
distributed in a package. |
90 |
|
91 |
> If you're using squashfs git pull probably isn't the right solution for you. |
92 |
|
93 |
Exactly. That's why I completely disagree with portage's regression |
94 |
of replacing the previously working solution by the only partially |
95 |
working "git pull". |
96 |
|
97 |
>> 4. Even if the user made the mistake to edit a file, portage should |
98 |
>> not just die on syncing. |
99 |
> |
100 |
> emerge --sync won't die in a situation like in general. |
101 |
|
102 |
It does: git push refuses to start if there are uncommitted changes. |
103 |
|
104 |
> but I don't think the correct default in this case should be |
105 |
> to just wipe out the user's changes. |
106 |
|
107 |
I do: Like for rsync a user should not do changes to the distributed |
108 |
tree (unless he makes a PR) but in an overlay; otherwise he will |
109 |
permanently have outdated files which are not correctly updated. |
110 |
*If* a user wants such changes, he should correctly use git and commit. |
111 |
|
112 |
But I am not against to make this an opt-in option for enabling it |
113 |
by a developer (or advanced user) who is afraid to eventually lose |
114 |
a change for the case that he forgot to commit before syncing. |
115 |
|
116 |
Anyway, this has nothing to do with "git pull" vs. "git fetch + git reset", |
117 |
but is only a question whether the option "--hard" or "--merge" should be |
118 |
used for "git reset". |
119 |
|
120 |
One certainly could also live with "reset --merge" as the default (or even |
121 |
only option), as it was previously in portage, but the change to "pull" |
122 |
was IMHO a regression. |