Gentoo Archives: gentoo-scm

From:	Maciej Mrozowski <reavertm@××××××.fm>
To:	gentoo-scm@l.g.o
Subject:	Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption
Date:	Mon, 13 Apr 2009 22:54:27
Message-Id:	`200904140054.19214.reavertm@poczta.fm`
In Reply to:	Re: [gentoo-scm] Splitting gentoo-x86 repository for easier consumption by "Robin H. Johnson"

1	On Saturday 11 of April 2009 18:38:05 Robin H. Johnson wrote:
2
3	>> - horizontal partitioning - a'ka cutting history
4	>> - vertical partitioning - splitting based on some categorization (not
5	[...]
6
7	> Both of these come under the aegis of partial trees.
8	> On the Git side, I'd like to deliberately direct you to this GSoC
9	> project:
10	> http://git.or.cz/gitwiki/SoC2009Ideas#head-2cdf2f7bd7667427d1e20c714ca33bd9
11	>2aaa4905 It's been in the works for a couple of years in Git, and is well
12	> fleshed out as a proposal for now. When it does come to fruition, it will
13	> make the entire matter of having to split the repository irrelevant.
14
15	I samewhat agree, at least in Gentoo case splitting repository would be just
16	workaround to git issues with full repack upon initial clone (as it seems
17	already existing and carefully created packs on server are just for some
18	reason discarded and regenerated from scratch).
19	Thanks for links to your git ML thread btw.
20	While there are some noticeable improvements (one memory leak fixed) and by
21	keeping some objects just temporarily (Linus patches), whole issue with memory
22	and packet regeneration seems far from being really solved yet.
23	About git bundle - in any case it seems to suit Gentoo needs well as at least
24	memory and CPU burning caused by "cloning repo" would be no longer triggered
25	from outside (thus minimizing potential service abuse causing denial of it),
26	Still disabling git clone method may need patches anyway.
27
28	>> As cutting history have already been proposed here and received with
29	>> mixed opinions, I'd like to suggest category based partitioning (divide
30	>> and conquer approach FTW).
31	>> Well, not really category-based...
32
33	> I take it by this that you've looked at the past discussions.
34	> I'd also like to bring your attension to the thread I started on the Git
35	> mailing list, about memory usage, but in which I also gave some of our
36	> current numbers:
37	> http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115611
38	> http://thread.gmane.org/gmane.comp.version-control.git/115600/focus=115636
39	> The first discusses the general case of overhead per VCS (there's a nice
40	> summary at the bottom).
41	> The second gives overall numbers for Git with single-repo and
42	> repo-per-package. The overhead imposed by repo-per-package makes it a
43	> non-starter.
44
45	package-per-repo is way too fine grained even when those overheads were not
46	present. What really is a pain, is current category based package-space
47	partitioning. It doesn't provide neither completeness nor sufficient
48	separation. And because of that unfortunately is often subject to change
49	(judging from profiles/updates) - but that only means that from software
50	engineering point of view it's just bad design - unfortunately it's too
51	convenient to be dropped just right away probably.
52
53	But, if for example packages were given unique names or ID's (app-emacs is
54	causing most name collisions, with unique names, categories could be replaced
55	by tags, and provided only for searching and not dependency resolving - not
56	thread for such discussion anyway), then they could be stored in some
57	predictable and what's the most important - invariant manner - giving
58	possibility to exploit that feature in some data partitioning, storage
59	optimalization, etc (for example grouping by first letter).
60	In most cases, having one monolithic repository suits best - it's just the
61	problem that atomicity of huge objects can't be provided in efficient manner -
62	and unfortunately whole Portage tree is atomic by definition.
63
64	>> - alternative projects - like hardened - can just have separate branches
65	>> when appropriate - for easy merges with "main tree"
66	>> - profile can be (should be actually) separated in another repository and
67	>> developed easier
68
69	> I don't know how you consider separate to be better.
70
71	Well, I guess first I'd like it to be possible..
72	For now, hardened is scattered in many places - usually developed outside from
73	tree (like hardened sources) then merged manually or just patched in tree
74	(which for example doesn't make it easy to track hardened patches separately -
75	for example to see diffs against "vanilla" periodically if needed etc).
76	Bur yeah, at least in this case monolithic hardened seems to suit better.
77
78	> One of the major
79	> reasons for designing the multi-parent stackable profiles was for the
80	> unusual mixed cases. Say you wanted to do hardened on a machine that
81	> hardened doesn't presently support, all you have to do is pick both in
82	> your own make.profile/parent file.
83
84	And I'm not after dropping this.
85	Btw, I judging from responses, I definitely haven't make it clear with profile
86	- what I meant was to have one repository only exclusively for developing main
87	tree profile - not to split the profile across repositories.
88
89	>> - needs some basic tools to 'glue' final repository and ready it for
90	>> rsync - to fully benefit from git - robbat2 would need to propose his
91	>> slim manifest format as GLEP (or in case of lack of time - quite possible
92	>> - get someone else to do it) and get it implemented by someone.
93
94	> More cons:
95	> - If a package moves between repos, it consumes space in the history on
96	> both sides, forever.
97
98	Agreed. Decision to move something between repositories need to be well
99	justified. If repo-per-herd is suffering from many justified refactoring, then
100	of course cannot be used.
101
102	> - Less opportunities for Git's amazing compression.
103	> - If you wanted to track the entire tree, you need multiple repositories
104	> now, vs. being able to clone just a single repo, and maintain your own
105	> branch on top of it.
106
107	This of course can be scripted (as git I believe doesn't have anything similar
108	to svn externals - as it's what CVS/SVN is not after all).
109
110	If git clone is blocker here, maybe some working workaround could be
111	implemented, like distributing mentioned git bundle (it's something like SVN
112	snapshot after all) and forbidding git clone method.
113
114	--
115	regards
116	MM

Attachments

File name	MIME type
signature.asc	application/pgp-signature

Report Message

Find on MARC Find on Google Groups