Gentoo Archives: gentoo-dev

From: "Michał Górny" <mgorny@g.o>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] [RFC git*.eclass] Do we need user-friendly egit-src/?
Date: Wed, 28 Aug 2013 17:40:01
Message-Id: 20130828193959.7387e399@gentoo.org
1 Hello,
2
3 My previous mail didn't focus on the most important thing, so I'd like
4 to start another thread with a simple question: do we need to provide
5 a user-friendly ${DISTDIR}/egit-src/?
6
7 Currently the repository stores consists of either bare or non-bare
8 clones of the remote repository. We do not support committing to those
9 local clones but people can easily clone them in order to obtain
10 a local development repository that can be used to work with the code
11 and push patches upstream.
12
13 However, supporting that increases the complexity of eclass
14 and decreases space efficiency. For example, if we started to do
15 shallow clones people would no longer be able to clone the repo
16 directly. We also need to worry about clone location collisions
17 and reusing the same location when multiple packages use the same repo.
18 As you can guess, git hostings don't make this easy on us.
19
20 The question would be: do you feel like we should really provide
21 a verbatim clone of upstream's repository? Or should we focus on
22 the eclass' main goal, that is fetching the remote sources in the most
23 bandwith and space-efficient manner?
24
25
26 If we decide to go for 'sane' clones, we need the eclass to be able to
27 provide sane paths for local copies. Those paths need to suit
28 the following points:
29
30 1. multiple remote repos (e.g. forks) may need to reuse the same local
31 clone,
32
33 2. multiple packages may reuse the same repo and then they should
34 create just one local clone,
35
36 3. a package may use multiple repos :),
37
38 4. submodules may reuse the same repo as other package, and then they
39 should use the same local clone.
40
41 Honestly, I have no idea how to achieve that. The best idea that comes
42 to my mind is to use the whole 'path' part of the URI. That is, like:
43
44 git://git.overlays.gentoo.org/proj/foo.git
45
46 would map to a path like:
47
48 proj <something> foo.git
49
50 where <something> may be '/', '-', '_', '%2F', whatever.
51
52 This solves 2.-4. but won't help with 1. Plus the incoming bikeshed
53 about which character should be used, bikeshed that people really want
54 to override this and probably one more bikeshed. Oh, and some git
55 hostings put some prefix like '/git', '/p' or '/pub/scm/whatever' that
56 would be part of the checkout directory as well.
57
58 We could also supposedly use some unique identifier like root commit
59 identifier but I doubt users will like having hashes in egit-src.
60
61
62 An alternative is to create a semi-obfuscated yet space-efficient store
63 for all the repositories. That is, fetch *all* git repositories into
64 a single location.
65
66 Since git uses hashes to identify everything, this will work better
67 than you'd think first. Most importantly, we can avoid fetching
68 duplicates with no real effort since git simply reuses local objects
69 with the same ids.
70
71 This involves both duplicates in case of repos used by multiple
72 ebuilds, forked repos and identical files that are used by different
73 projects. I doubt you could make git more space efficient than that.
74
75 We no longer have to worry about EGIT_PROJECT, about submodules, about
76 bikesheds. However, the local store structure would no longer be
77 familiar to our users. We are basically switching from using git as VCS
78 to using git as efficient file fetching tool.
79
80 There's also some increased risk wrt hash collisions but I doubt that
81 should be considered a problem at the moment.
82
83
84 What are your thoughts?
85
86 --
87 Best regards,
88 Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies