Gentoo Archives: gentoo-portage-dev

From: "Michał Górny" <mgorny@g.o>
To: Zac Medico <zmedico@g.o>
Cc: gentoo-portage-dev@l.g.o
Subject: Re: [gentoo-portage-dev] [PATCH] sync: call git prune before shallow fetch (bug 599008)
Date: Sun, 06 Nov 2016 09:59:18
Message-Id: 20161106105907.5440858d.mgorny@gentoo.org
In Reply to: Re: [gentoo-portage-dev] [PATCH] sync: call git prune before shallow fetch (bug 599008) by Zac Medico
1 On Sat, 5 Nov 2016 15:56:20 -0700
2 Zac Medico <zmedico@g.o> wrote:
3
4 > On 11/05/2016 03:22 PM, Michał Górny wrote:
5 > > On Sat, 5 Nov 2016 15:11:10 -0700
6 > > Zac Medico <zmedico@g.o> wrote:
7 > >
8 > >> On 11/05/2016 02:50 PM, Michał Górny wrote:
9 > >>> On Sat, 5 Nov 2016 13:43:15 -0700
10 > >>> Zac Medico <zmedico@g.o> wrote:
11 > >>>
12 > >>>> This is necessary in order to avoid "There are too many unreachable
13 > >>>> loose objects" warnings from automatic git gc calls.
14 > >>>>
15 > >>>> X-Gentoo-Bug: 599008
16 > >>>> X-Gentoo-Bug-URL: https://bugs.gentoo.org/show_bug.cgi?id=599008
17 > >>>> ---
18 > >>>> pym/portage/sync/modules/git/git.py | 6 ++++++
19 > >>>> 1 file changed, 6 insertions(+)
20 > >>>>
21 > >>>> diff --git a/pym/portage/sync/modules/git/git.py b/pym/portage/sync/modules/git/git.py
22 > >>>> index f288733..c90cf88 100644
23 > >>>> --- a/pym/portage/sync/modules/git/git.py
24 > >>>> +++ b/pym/portage/sync/modules/git/git.py
25 > >>>> @@ -101,6 +101,12 @@ class GitSync(NewBase):
26 > >>>> writemsg_level(msg + "\n", level=logging.ERROR, noiselevel=-1)
27 > >>>> return (e.returncode, False)
28 > >>>>
29 > >>>> + # For shallow fetch, unreachable objects must be pruned
30 > >>>> + # manually, since otherwise automatic git gc calls will
31 > >>>> + # eventually warn about them (see bug 599008).
32 > >>>> + subprocess.call(['git', 'prune'],
33 > >>>> + cwd=portage._unicode_encode(self.repo.location))
34 > >>>> +
35 > >>>> git_cmd_opts += " --depth %d" % self.repo.sync_depth
36 > >>>> git_cmd = "%s fetch %s%s" % (self.bin_command,
37 > >>>> remote_branch.partition('/')[0], git_cmd_opts)
38 > >>>
39 > >>> Does it have a performance impact?
40 > >>
41 > >> Yes, it takes about 20 seconds on my laptop. I suppose we could make
42 > >> this an optional thing, so that those people can do it manually if they
43 > >> want.
44 > >
45 > > So we have improvement from at most few seconds for normal 'git pull'
46 > > to around a minute for shallow pull?
47 >
48 > Well we've got a least 3 resources to consider:
49 >
50 > 1) network bandwidth
51 > 2) disk usage
52 > 3) sync time
53 >
54 > For me, sync time doesn't really matter that much, but I suppose it
55 > might for some people.
56
57 For a common user, network bandwidth is not a problem with git (except
58 maybe for the huge initial clone). Especially when syncing frequently,
59 the gain from subsequent --depth=1 is negligible. When syncing rarely,
60 you probably prefer snapshots anyway.
61
62 I doubt this could be of benefit even to dial-up users; that is,
63 that more time would be saved on fetching than lost on all the ops
64 needed to make things continue to work. The additional data won't
65 affect the data plan users much probably either.
66
67 Especially that Gentoo is all about fetching distfiles that are huge
68 compared to the git updates for the repository.
69
70 As for the disk usage, again, the difference should be negligible.
71 The major difference is done on initial fetch. Of course, regularly
72 pruning the repository will reduce its size. But then, pruning it will
73 non-shallow fetches would probably achieve a similar effect thanks to
74 delta compression.
75
76 That leaves the sync time. Which is becoming worse than rsync.
77
78 --
79 Best regards,
80 Michał Górny
81 <http://dev.gentoo.org/~mgorny/>

Replies