Gentoo Archives: gentoo-scm

From: Maciej Mrozowski <reavertm@××××××.fm>
To: gentoo-scm@l.g.o
Subject: Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption
Date: Mon, 13 Apr 2009 22:54:27
Message-Id: 200904140054.19214.reavertm@poczta.fm
In Reply to: Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption by "Robin H. Johnson"
1 On Saturday 11 of April 2009 18:38:05 Robin H. Johnson wrote:
2
3 >> - horizontal partitioning - a'ka cutting history
4 >> - vertical partitioning - splitting based on some categorization (not
5 [...]
6
7 > Both of these come under the aegis of partial trees.
8 > On the Git side, I'd like to deliberately direct you to this GSoC
9 > project:
10 > http://git.or.cz/gitwiki/SoC2009Ideas#head-2cdf2f7bd7667427d1e20c714ca33bd9
11 >2aaa4905 It's been in the works for a couple of years in Git, and is well
12 > fleshed out as a proposal for now. When it does come to fruition, it will
13 > make the entire matter of having to split the repository irrelevant.
14
15 I samewhat agree, at least in Gentoo case splitting repository would be just
16 workaround to git issues with full repack upon initial clone (as it seems
17 already existing and carefully created packs on server are just for some
18 reason discarded and regenerated from scratch).
19 Thanks for links to your git ML thread btw.
20 While there are some noticeable improvements (one memory leak fixed) and by
21 keeping some objects just temporarily (Linus patches), whole issue with memory
22 and packet regeneration seems far from being really solved yet.
23 About git bundle - in any case it seems to suit Gentoo needs well as at least
24 memory and CPU burning caused by "cloning repo" would be no longer triggered
25 from outside (thus minimizing potential service abuse causing denial of it),
26 Still disabling git clone method may need patches anyway.
27
28 >> As cutting history have already been proposed here and received with
29 >> mixed opinions, I'd like to suggest category based partitioning (divide
30 >> and conquer approach FTW).
31 >> Well, not really category-based...
32
33 > I take it by this that you've looked at the past discussions.
34 > I'd also like to bring your attension to the thread I started on the Git
35 > mailing list, about memory usage, but in which I also gave some of our
36 > current numbers:
37 > http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115611
38 > http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115636
39 > The first discusses the general case of overhead per VCS (there's a nice
40 > summary at the bottom).
41 > The second gives overall numbers for Git with single-repo and
42 > repo-per-package. The overhead imposed by repo-per-package makes it a
43 > non-starter.
44
45 package-per-repo is way too fine grained even when those overheads were not
46 present. What really is a pain, is current category based package-space
47 partitioning. It doesn't provide neither completeness nor sufficient
48 separation. And because of that unfortunately is often subject to change
49 (judging from profiles/updates) - but that only means that from software
50 engineering point of view it's just bad design - unfortunately it's too
51 convenient to be dropped just right away probably.
52
53 But, if for example packages were given unique names or ID's (app-emacs is
54 causing most name collisions, with unique names, categories could be replaced
55 by tags, and provided only for searching and not dependency resolving - not
56 thread for such discussion anyway), then they could be stored in some
57 predictable and what's the most important - invariant manner - giving
58 possibility to exploit that feature in some data partitioning, storage
59 optimalization, etc (for example grouping by first letter).
60 In most cases, having one monolithic repository suits best - it's just the
61 problem that atomicity of huge objects can't be provided in efficient manner -
62 and unfortunately whole Portage tree is atomic by definition.
63
64 >> - alternative projects - like hardened - can just have separate branches
65 >> when appropriate - for easy merges with "main tree"
66 >> - profile can be (should be actually) separated in another repository and
67 >> developed easier
68
69 > I don't know how you consider separate to be better.
70
71 Well, I guess first I'd like it to be possible..
72 For now, hardened is scattered in many places - usually developed outside from
73 tree (like hardened sources) then merged manually or just patched in tree
74 (which for example doesn't make it easy to track hardened patches separately -
75 for example to see diffs against "vanilla" periodically if needed etc).
76 Bur yeah, at least in this case monolithic hardened seems to suit better.
77
78 > One of the major
79 > reasons for designing the multi-parent stackable profiles was for the
80 > unusual mixed cases. Say you wanted to do hardened on a machine that
81 > hardened doesn't presently support, all you have to do is pick both in
82 > your own make.profile/parent file.
83
84 And I'm not after dropping this.
85 Btw, I judging from responses, I definitely haven't make it clear with profile
86 - what I meant was to have one repository only exclusively for developing main
87 tree profile - not to split the profile across repositories.
88
89 >> - needs some basic tools to 'glue' final repository and ready it for
90 >> rsync - to fully benefit from git - robbat2 would need to propose his
91 >> slim manifest format as GLEP (or in case of lack of time - quite possible
92 >> - get someone else to do it) and get it implemented by someone.
93
94 > More cons:
95 > - If a package moves between repos, it consumes space in the history on
96 > both sides, forever.
97
98 Agreed. Decision to move something between repositories need to be well
99 justified. If repo-per-herd is suffering from many justified refactoring, then
100 of course cannot be used.
101
102 > - Less opportunities for Git's amazing compression.
103 > - If you wanted to track the entire tree, you need multiple repositories
104 > now, vs. being able to clone just a single repo, and maintain your own
105 > branch on top of it.
106
107 This of course can be scripted (as git I believe doesn't have anything similar
108 to svn externals - as it's what CVS/SVN is not after all).
109
110 If git clone is blocker here, maybe some working workaround could be
111 implemented, like distributing mentioned git bundle (it's something like SVN
112 snapshot after all) and forbidding git clone method.
113
114 --
115 regards
116 MM

Attachments

File name MIME type
signature.asc application/pgp-signature