On 01:11 Tue 28 Oct , Robin H. Johnson wrote:
> On Tue, Oct 28, 2008 at 07:21:10AM +0000, Ciaran McCreesh wrote:
> > More likely you're just having problems with the connection lasting for
> > too long. 'git fetch' will resume (on a per-object basis, which is
> > fine so long as there aren't too many huge packs).
>
> It's a single ~910MiB pack. I'll play with it and see how we can make it
> better, maybe limiting the pack sizes to 50Mb/ea.
I did a few repacks and found that there's a linear relationship between
the log of the pack size and the repo size. If repo size is in GB and
pack size is in MB:
repo_size=m*(log(pack_size))+c, where m=-0.35, c=3.3
Using that, it's straightforward to optimize on whichever parameter you
care about while still considering the other one. I also attached two
graphs showing this more visually, one in log form and the other not.
The clear preference is the largest pack size where we rarely encounter
clone problems. Perhaps we should just cut the pack into fractions
(900/2, 900/3, etc) until we no longer hear about problems cloning.
One major issue is that apparently partial downloads of a single pack
don't resume. Ideally, we should just get that fixed upstream and use a
single huge pack. Thoughts?
--
Thanks,
Donnie
Donnie Berkholz
Developer, Gentoo Linux
Blog: http://dberkholz.wordpress.com
|