1 |
On Sat, Apr 11, 2009 at 02:40:27PM +0200, Maciej Mrozowski wrote: |
2 |
> As git is generally not best suited for large repositories, there are some |
3 |
> ideas how to make it perform better with gentoo-x86. |
4 |
> - horizontal partitioning - a'ka cutting history |
5 |
> - vertical partitioning - splitting based on some categorization (not sure |
6 |
> whether this was considered here, hence my mail). |
7 |
Both of these come under the aegis of partial trees. |
8 |
On the Git side, I'd like to deliberately direct you to this GSoC |
9 |
project: |
10 |
http://git.or.cz/gitwiki/SoC2009Ideas#head-2cdf2f7bd7667427d1e20c714ca33bd92aaa4905 |
11 |
It's been in the works for a couple of years in Git, and is well fleshed |
12 |
out as a proposal for now. When it does come to fruition, it will make |
13 |
the entire matter of having to split the repository irrelevant. |
14 |
|
15 |
> As cutting history have already been proposed here and received with mixed |
16 |
> opinions, I'd like to suggest category based partitioning (divide and conquer |
17 |
> approach FTW). |
18 |
> Well, not really category-based... |
19 |
I take it by this that you've looked at the past discussions. |
20 |
I'd also like to bring your attension to the thread I started on the Git |
21 |
mailing list, about memory usage, but in which I also gave some of our |
22 |
current numbers: |
23 |
http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115611 |
24 |
http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115636 |
25 |
The first discusses the general case of overhead per VCS (there's a nice |
26 |
summary at the bottom). |
27 |
The second gives overall numbers for Git with single-repo and |
28 |
repo-per-package. The overhead imposed by repo-per-package makes it a |
29 |
non-starter. |
30 |
|
31 |
> Let's look at tree - one thing can be said about each package - it belongs to |
32 |
> some herd or (doesn't, and it's with status maintainer wanted or maintained by |
33 |
> individual developers). |
34 |
> So creating separate repository for each herd is the most obvious (and naive) |
35 |
> idea. |
36 |
Packages do change herds, and there are certainly cross-dependencies |
37 |
between herds. Some packages even belong to multiple herds. |
38 |
|
39 |
> Pros are the following: |
40 |
> - project members taking care of some herd (or belonging to herd?) receive |
41 |
> (and have access) (only) to repository they are interested in, resulting in |
42 |
> smaller pulls/pushes |
43 |
No, the push/pull beyond the initial clone only contains changes anyway, |
44 |
so there is NO change in size. |
45 |
|
46 |
> - some level of isolation - gives possibility to restrict access (for example: |
47 |
> "only toolchain and arch teams allowed here") |
48 |
There is only a single category (sec-policy) in the ENTIRE tree that's |
49 |
restricted right now, and that restriction is probably going to go away |
50 |
in future once we have guaranteed signed commits. |
51 |
|
52 |
> - some testing overlays could now just track their tree counterparts - merging |
53 |
> stuff from testing to tree could be semi-automatic and trivial |
54 |
Overlays are a lot more of a mish-mash than that. dberkholz for example |
55 |
had his own git.eclass and dev-util/git until recently, mixed right in |
56 |
with other packages. |
57 |
|
58 |
> - alternative projects - like hardened - can just have separate branches when |
59 |
> appropriate - for easy merges with "main tree" |
60 |
> - profile can be (should be actually) separated in another repository and |
61 |
> developed easier |
62 |
I don't know how you consider separate to be better. One of the major |
63 |
reasons for designing the multi-parent stackable profiles was for the |
64 |
unusual mixed cases. Say you wanted to do hardened on a machine that |
65 |
hardened doesn't presently support, all you have to do is pick both in |
66 |
your own make.profile/parent file. |
67 |
|
68 |
> Some cons: |
69 |
> - projects are now more dependant on other projects and its responsiveness, |
70 |
> unless access is granted to all repositories for every developer |
71 |
As noted by rbu, we explicitly trust every developer to modify most of |
72 |
the tree. |
73 |
|
74 |
> - needs some basic tools to 'glue' final repository and ready it for rsync |
75 |
> - to fully benefit from git - robbat2 would need to propose his slim manifest |
76 |
> format as GLEP (or in case of lack of time - quite possible - get someone else |
77 |
> to do it) and get it implemented by someone. |
78 |
We need said tools for the thin Manifest stuff anyway. |
79 |
|
80 |
> - possibly needs better multiple repositories support in Portage (not sure |
81 |
> though) |
82 |
> - profile no longer there |
83 |
> - probably not easy way to migrate from monolithic gentoo-x86 to split sub- |
84 |
> repositories retaining complete history |
85 |
> - not settled yet what to do with orphaned/proxy maintained packages and herd- |
86 |
> switching |
87 |
More cons: |
88 |
- If a package moves between repos, it consumes space in the history on |
89 |
both sides, forever. |
90 |
- Less opportunities for Git's amazing compression. |
91 |
- If you wanted to track the entire tree, you need multiple repositories |
92 |
now, vs. being able to clone just a single repo, and maintain your own |
93 |
branch on top of it. |
94 |
|
95 |
> Zac, I'm CC-ing you here, I hope you don't mind. Sorry, but your input is too |
96 |
> valuable here :) |
97 |
He's on the list ;-). |
98 |
|
99 |
-- |
100 |
Robin Hugh Johnson |
101 |
Gentoo Linux Developer & Infra Guy |
102 |
E-Mail : robbat2@g.o |
103 |
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 |