1 |
On Saturday 11 of April 2009 18:38:05 Robin H. Johnson wrote: |
2 |
|
3 |
>> - horizontal partitioning - a'ka cutting history |
4 |
>> - vertical partitioning - splitting based on some categorization (not |
5 |
[...] |
6 |
|
7 |
> Both of these come under the aegis of partial trees. |
8 |
> On the Git side, I'd like to deliberately direct you to this GSoC |
9 |
> project: |
10 |
> http://git.or.cz/gitwiki/SoC2009Ideas#head-2cdf2f7bd7667427d1e20c714ca33bd9 |
11 |
>2aaa4905 It's been in the works for a couple of years in Git, and is well |
12 |
> fleshed out as a proposal for now. When it does come to fruition, it will |
13 |
> make the entire matter of having to split the repository irrelevant. |
14 |
|
15 |
I samewhat agree, at least in Gentoo case splitting repository would be just |
16 |
workaround to git issues with full repack upon initial clone (as it seems |
17 |
already existing and carefully created packs on server are just for some |
18 |
reason discarded and regenerated from scratch). |
19 |
Thanks for links to your git ML thread btw. |
20 |
While there are some noticeable improvements (one memory leak fixed) and by |
21 |
keeping some objects just temporarily (Linus patches), whole issue with memory |
22 |
and packet regeneration seems far from being really solved yet. |
23 |
About git bundle - in any case it seems to suit Gentoo needs well as at least |
24 |
memory and CPU burning caused by "cloning repo" would be no longer triggered |
25 |
from outside (thus minimizing potential service abuse causing denial of it), |
26 |
Still disabling git clone method may need patches anyway. |
27 |
|
28 |
>> As cutting history have already been proposed here and received with |
29 |
>> mixed opinions, I'd like to suggest category based partitioning (divide |
30 |
>> and conquer approach FTW). |
31 |
>> Well, not really category-based... |
32 |
|
33 |
> I take it by this that you've looked at the past discussions. |
34 |
> I'd also like to bring your attension to the thread I started on the Git |
35 |
> mailing list, about memory usage, but in which I also gave some of our |
36 |
> current numbers: |
37 |
> http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115611 |
38 |
> http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115636 |
39 |
> The first discusses the general case of overhead per VCS (there's a nice |
40 |
> summary at the bottom). |
41 |
> The second gives overall numbers for Git with single-repo and |
42 |
> repo-per-package. The overhead imposed by repo-per-package makes it a |
43 |
> non-starter. |
44 |
|
45 |
package-per-repo is way too fine grained even when those overheads were not |
46 |
present. What really is a pain, is current category based package-space |
47 |
partitioning. It doesn't provide neither completeness nor sufficient |
48 |
separation. And because of that unfortunately is often subject to change |
49 |
(judging from profiles/updates) - but that only means that from software |
50 |
engineering point of view it's just bad design - unfortunately it's too |
51 |
convenient to be dropped just right away probably. |
52 |
|
53 |
But, if for example packages were given unique names or ID's (app-emacs is |
54 |
causing most name collisions, with unique names, categories could be replaced |
55 |
by tags, and provided only for searching and not dependency resolving - not |
56 |
thread for such discussion anyway), then they could be stored in some |
57 |
predictable and what's the most important - invariant manner - giving |
58 |
possibility to exploit that feature in some data partitioning, storage |
59 |
optimalization, etc (for example grouping by first letter). |
60 |
In most cases, having one monolithic repository suits best - it's just the |
61 |
problem that atomicity of huge objects can't be provided in efficient manner - |
62 |
and unfortunately whole Portage tree is atomic by definition. |
63 |
|
64 |
>> - alternative projects - like hardened - can just have separate branches |
65 |
>> when appropriate - for easy merges with "main tree" |
66 |
>> - profile can be (should be actually) separated in another repository and |
67 |
>> developed easier |
68 |
|
69 |
> I don't know how you consider separate to be better. |
70 |
|
71 |
Well, I guess first I'd like it to be possible.. |
72 |
For now, hardened is scattered in many places - usually developed outside from |
73 |
tree (like hardened sources) then merged manually or just patched in tree |
74 |
(which for example doesn't make it easy to track hardened patches separately - |
75 |
for example to see diffs against "vanilla" periodically if needed etc). |
76 |
Bur yeah, at least in this case monolithic hardened seems to suit better. |
77 |
|
78 |
> One of the major |
79 |
> reasons for designing the multi-parent stackable profiles was for the |
80 |
> unusual mixed cases. Say you wanted to do hardened on a machine that |
81 |
> hardened doesn't presently support, all you have to do is pick both in |
82 |
> your own make.profile/parent file. |
83 |
|
84 |
And I'm not after dropping this. |
85 |
Btw, I judging from responses, I definitely haven't make it clear with profile |
86 |
- what I meant was to have one repository only exclusively for developing main |
87 |
tree profile - not to split the profile across repositories. |
88 |
|
89 |
>> - needs some basic tools to 'glue' final repository and ready it for |
90 |
>> rsync - to fully benefit from git - robbat2 would need to propose his |
91 |
>> slim manifest format as GLEP (or in case of lack of time - quite possible |
92 |
>> - get someone else to do it) and get it implemented by someone. |
93 |
|
94 |
> More cons: |
95 |
> - If a package moves between repos, it consumes space in the history on |
96 |
> both sides, forever. |
97 |
|
98 |
Agreed. Decision to move something between repositories need to be well |
99 |
justified. If repo-per-herd is suffering from many justified refactoring, then |
100 |
of course cannot be used. |
101 |
|
102 |
> - Less opportunities for Git's amazing compression. |
103 |
> - If you wanted to track the entire tree, you need multiple repositories |
104 |
> now, vs. being able to clone just a single repo, and maintain your own |
105 |
> branch on top of it. |
106 |
|
107 |
This of course can be scripted (as git I believe doesn't have anything similar |
108 |
to svn externals - as it's what CVS/SVN is not after all). |
109 |
|
110 |
If git clone is blocker here, maybe some working workaround could be |
111 |
implemented, like distributing mentioned git bundle (it's something like SVN |
112 |
snapshot after all) and forbidding git clone method. |
113 |
|
114 |
-- |
115 |
regards |
116 |
MM |