1 |
Hi, |
2 |
|
3 |
I'm quite tired of promises and all that perfectionist non-sense which |
4 |
locks us up with CVS for next 10 years of bikeshed. Therefore, I have |
5 |
prepared a plan how to do git migration, and I believe it's doable in |
6 |
less than 2 weeks (plus the testing). Of course, that assumes infra is |
7 |
going to cooperate quickly or someone else is willing to provide the |
8 |
infra for it. |
9 |
|
10 |
I can provide some testing repos once someone is willing to provide |
11 |
the hardware. |
12 |
|
13 |
|
14 |
What needs to be done |
15 |
--------------------- |
16 |
|
17 |
I can do most of the scripting. What I need others to do is provide |
18 |
the hosting for git repos. We can't use public services like github |
19 |
since they don't allow us to set our own update hook, so we can't |
20 |
enforce signing policies etc. |
21 |
|
22 |
Once basic infra is ready, I think the following is the best way to |
23 |
switch: |
24 |
|
25 |
1. send announcement to devs to explain how to use git, |
26 |
|
27 |
2. lock CVS out to read-only, |
28 |
|
29 |
3. create all the git repos, get hooks rolling, |
30 |
|
31 |
4. enable R/W access to the repos. |
32 |
|
33 |
With some luck, no more than 2 hours downtime. |
34 |
|
35 |
|
36 |
The infra |
37 |
--------- |
38 |
|
39 |
The general idea is based on 3-level structure that's extension of how |
40 |
Funtoo works. The following ultimately pretty picture explains that: |
41 |
|
42 |
+----------------+ |
43 |
| developer repo | - - - - - - - - - - -, |
44 |
+----------------+ v |
45 |
| +------------------------------+ |
46 |
| | cache, DTDs and other extras | |
47 |
v +------------------------------+ |
48 |
+----------------+ | |
49 |
| user sync repo | <--------------------' |
50 |
+----------------+ - - - - - - - - - - , |
51 |
| v |
52 |
| +-----------------------------+ |
53 |
| | ChangeLogs, thick Manifests | |
54 |
v +-----------------------------+ |
55 |
+----------------+ | |
56 |
| rsync | <-------------------' |
57 |
+----------------+ |
58 |
|
59 |
Text version: |
60 |
|
61 |
We have main developer repo where developers work & commit and are |
62 |
relatively happy. For every push into developer repo, automated magic |
63 |
thingie merges stuff into user sync repo and updates the metadata cache |
64 |
there. |
65 |
|
66 |
User sync repo is for power users than want to fetch via git. It's quite |
67 |
fast and efficient for frequent updates, and also saves space by being free |
68 |
of ChangeLogs. |
69 |
|
70 |
On top of user sync repo rsync is propagated. The rsync tree is populated |
71 |
with all old ChangeLogs copied from CVS (stored in 30M git repo), new |
72 |
ChangeLogs are generated from git logs and Manifests are expanded. |
73 |
|
74 |
|
75 |
Main developer repo |
76 |
------------------- |
77 |
|
78 |
I was able to create a start git repository that takes around 66M |
79 |
as a git pack (this is how much you will have to fetch to start working |
80 |
with it). The repository is stripped clean of history and ChangeLogs, |
81 |
and has thin Manifests only. |
82 |
|
83 |
This means we don't have to wait till someone figures out the perfect |
84 |
way of converting the old CVS repository. You don't need that history |
85 |
most of the time, and you can play with CVS to get it if you really do. |
86 |
In any case, we would likely strip the history anyway to get a small |
87 |
repo to work with. |
88 |
|
89 |
I have prepared a basic git update hook that keeps master clean |
90 |
and attached it to the bug [1]. It enforces basic policies, prevents |
91 |
forced updates and checks GPG signatures on left-most history line. It |
92 |
can also be extended to do more extensive tree checks. |
93 |
|
94 |
For GPG signing, I relied upon gpg to do the right thing. That is, git |
95 |
checks the signatures and we accept only trusted signatures. So |
96 |
an external tool (gentoo-keys) need to play with gpg to import, trust |
97 |
and revoke developer keys. |
98 |
|
99 |
I think we should also merge gentoo-news & glsa & herds.xml into |
100 |
the repository. They all reference Gentoo packages at a particular |
101 |
state in time, and it would be much nicer to have them synced properly. |
102 |
|
103 |
[1]:https://bugs.gentoo.org/show_bug.cgi?id=502060 |
104 |
|
105 |
|
106 |
User syncing repo |
107 |
----------------- |
108 |
|
109 |
IMO this will be the most useful syncing method. The user syncing repo |
110 |
is updated automatically for developer repo commits, and afterwards |
111 |
md5-cache is regenerated and committed. Also other repositories (like |
112 |
DTDs, glsas and others if you dislike the previous idea) are merged |
113 |
into it. |
114 |
|
115 |
This repo is still free of ChangeLogs (since git logs are more |
116 |
efficient) and has thin Manifests. It's the space-efficient Gentoo |
117 |
variant. And commits are signed so users can verify the trust. |
118 |
|
119 |
|
120 |
The rsync tree |
121 |
-------------- |
122 |
|
123 |
We'd also propagate things to rsync. We'd have to populate it with old |
124 |
ChangeLogs, new ChangeLog entries (autogenerated from git) and thick |
125 |
Manifests. So users won't notice much of a change. |
126 |
|
127 |
The remaining issue is signing of stuff. We could supposedly sign |
128 |
Manifests but IMO it's a waste of resources considered how poor |
129 |
the signing system is for non-git repos. |
130 |
|
131 |
-- |
132 |
Best regards, |
133 |
Michał Górny |