Gentoo Archives: gentoo-portage-dev

From: Zac Medico <zmedico@g.o>
To: "Michał Górny" <mgorny@g.o>
Cc: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH] sync: call git prune before shallow fetch (bug 599008)
Date: Sun, 06 Nov 2016 18:46:38
Message-Id: ffa66820-d3db-8aaf-b6f2-70bc7a2a796e@gentoo.org
In Reply to: Re: [gentoo-portage-dev] [PATCH] sync: call git prune before shallow fetch (bug 599008) by "Michał Górny"
1 On 11/06/2016 01:59 AM, Michał Górny wrote:
2 > On Sat, 5 Nov 2016 15:56:20 -0700
3 > Zac Medico <zmedico@g.o> wrote:
4 >
5 >> On 11/05/2016 03:22 PM, Michał Górny wrote:
6 >>> On Sat, 5 Nov 2016 15:11:10 -0700
7 >>> Zac Medico <zmedico@g.o> wrote:
8 >>>
9 >>>> On 11/05/2016 02:50 PM, Michał Górny wrote:
10 >>>>> On Sat, 5 Nov 2016 13:43:15 -0700
11 >>>>> Zac Medico <zmedico@g.o> wrote:
12 >>>>>
13 >>>>>> This is necessary in order to avoid "There are too many unreachable
14 >>>>>> loose objects" warnings from automatic git gc calls.
15 >>>>>>
16 >>>>>> X-Gentoo-Bug: 599008
17 >>>>>> X-Gentoo-Bug-URL: https://bugs.gentoo.org/show_bug.cgi?id=599008
18 >>>>>> ---
19 >>>>>> pym/portage/sync/modules/git/git.py | 6 ++++++
20 >>>>>> 1 file changed, 6 insertions(+)
21 >>>>>>
22 >>>>>> diff --git a/pym/portage/sync/modules/git/git.py b/pym/portage/sync/modules/git/git.py
23 >>>>>> index f288733..c90cf88 100644
24 >>>>>> --- a/pym/portage/sync/modules/git/git.py
25 >>>>>> +++ b/pym/portage/sync/modules/git/git.py
26 >>>>>> @@ -101,6 +101,12 @@ class GitSync(NewBase):
27 >>>>>> writemsg_level(msg + "\n", level=logging.ERROR, noiselevel=-1)
28 >>>>>> return (e.returncode, False)
29 >>>>>>
30 >>>>>> + # For shallow fetch, unreachable objects must be pruned
31 >>>>>> + # manually, since otherwise automatic git gc calls will
32 >>>>>> + # eventually warn about them (see bug 599008).
33 >>>>>> + subprocess.call(['git', 'prune'],
34 >>>>>> + cwd=portage._unicode_encode(self.repo.location))
35 >>>>>> +
36 >>>>>> git_cmd_opts += " --depth %d" % self.repo.sync_depth
37 >>>>>> git_cmd = "%s fetch %s%s" % (self.bin_command,
38 >>>>>> remote_branch.partition('/')[0], git_cmd_opts)
39 >>>>>
40 >>>>> Does it have a performance impact?
41 >>>>
42 >>>> Yes, it takes about 20 seconds on my laptop. I suppose we could make
43 >>>> this an optional thing, so that those people can do it manually if they
44 >>>> want.
45 >>>
46 >>> So we have improvement from at most few seconds for normal 'git pull'
47 >>> to around a minute for shallow pull?
48 >>
49 >> Well we've got a least 3 resources to consider:
50 >>
51 >> 1) network bandwidth
52 >> 2) disk usage
53 >> 3) sync time
54 >>
55 >> For me, sync time doesn't really matter that much, but I suppose it
56 >> might for some people.
57 >
58 > For a common user, network bandwidth is not a problem with git (except
59 > maybe for the huge initial clone). Especially when syncing frequently,
60 > the gain from subsequent --depth=1 is negligible. When syncing rarely,
61 > you probably prefer snapshots anyway.
62 >
63 > I doubt this could be of benefit even to dial-up users; that is,
64 > that more time would be saved on fetching than lost on all the ops
65 > needed to make things continue to work. The additional data won't
66 > affect the data plan users much probably either.
67 >
68 > Especially that Gentoo is all about fetching distfiles that are huge
69 > compared to the git updates for the repository.
70 >
71 > As for the disk usage, again, the difference should be negligible.
72 > The major difference is done on initial fetch. Of course, regularly
73 > pruning the repository will reduce its size. But then, pruning it will
74 > non-shallow fetches would probably achieve a similar effect thanks to
75 > delta compression.
76 >
77 > That leaves the sync time. Which is becoming worse than rsync.
78
79 Maybe this will be a reasonable default:
80
81 * add a separate clone-depth setting which defaults to 1
82 * set the default sync-depth setting to 0 (unlimited)
83
84 If the user enables shallow fetch by setting sync-depth to something
85 other than 0, they I think we should call whatever commands are
86 necessary to keep the repository healthy (including `git prune` if
87 necessary).
88 --
89 Thanks,
90 Zac

Replies