1 |
Hello, |
2 |
|
3 |
My previous mail didn't focus on the most important thing, so I'd like |
4 |
to start another thread with a simple question: do we need to provide |
5 |
a user-friendly ${DISTDIR}/egit-src/? |
6 |
|
7 |
Currently the repository stores consists of either bare or non-bare |
8 |
clones of the remote repository. We do not support committing to those |
9 |
local clones but people can easily clone them in order to obtain |
10 |
a local development repository that can be used to work with the code |
11 |
and push patches upstream. |
12 |
|
13 |
However, supporting that increases the complexity of eclass |
14 |
and decreases space efficiency. For example, if we started to do |
15 |
shallow clones people would no longer be able to clone the repo |
16 |
directly. We also need to worry about clone location collisions |
17 |
and reusing the same location when multiple packages use the same repo. |
18 |
As you can guess, git hostings don't make this easy on us. |
19 |
|
20 |
The question would be: do you feel like we should really provide |
21 |
a verbatim clone of upstream's repository? Or should we focus on |
22 |
the eclass' main goal, that is fetching the remote sources in the most |
23 |
bandwith and space-efficient manner? |
24 |
|
25 |
|
26 |
If we decide to go for 'sane' clones, we need the eclass to be able to |
27 |
provide sane paths for local copies. Those paths need to suit |
28 |
the following points: |
29 |
|
30 |
1. multiple remote repos (e.g. forks) may need to reuse the same local |
31 |
clone, |
32 |
|
33 |
2. multiple packages may reuse the same repo and then they should |
34 |
create just one local clone, |
35 |
|
36 |
3. a package may use multiple repos :), |
37 |
|
38 |
4. submodules may reuse the same repo as other package, and then they |
39 |
should use the same local clone. |
40 |
|
41 |
Honestly, I have no idea how to achieve that. The best idea that comes |
42 |
to my mind is to use the whole 'path' part of the URI. That is, like: |
43 |
|
44 |
git://git.overlays.gentoo.org/proj/foo.git |
45 |
|
46 |
would map to a path like: |
47 |
|
48 |
proj <something> foo.git |
49 |
|
50 |
where <something> may be '/', '-', '_', '%2F', whatever. |
51 |
|
52 |
This solves 2.-4. but won't help with 1. Plus the incoming bikeshed |
53 |
about which character should be used, bikeshed that people really want |
54 |
to override this and probably one more bikeshed. Oh, and some git |
55 |
hostings put some prefix like '/git', '/p' or '/pub/scm/whatever' that |
56 |
would be part of the checkout directory as well. |
57 |
|
58 |
We could also supposedly use some unique identifier like root commit |
59 |
identifier but I doubt users will like having hashes in egit-src. |
60 |
|
61 |
|
62 |
An alternative is to create a semi-obfuscated yet space-efficient store |
63 |
for all the repositories. That is, fetch *all* git repositories into |
64 |
a single location. |
65 |
|
66 |
Since git uses hashes to identify everything, this will work better |
67 |
than you'd think first. Most importantly, we can avoid fetching |
68 |
duplicates with no real effort since git simply reuses local objects |
69 |
with the same ids. |
70 |
|
71 |
This involves both duplicates in case of repos used by multiple |
72 |
ebuilds, forked repos and identical files that are used by different |
73 |
projects. I doubt you could make git more space efficient than that. |
74 |
|
75 |
We no longer have to worry about EGIT_PROJECT, about submodules, about |
76 |
bikesheds. However, the local store structure would no longer be |
77 |
familiar to our users. We are basically switching from using git as VCS |
78 |
to using git as efficient file fetching tool. |
79 |
|
80 |
There's also some increased risk wrt hash collisions but I doubt that |
81 |
should be considered a problem at the moment. |
82 |
|
83 |
|
84 |
What are your thoughts? |
85 |
|
86 |
-- |
87 |
Best regards, |
88 |
Michał Górny |