1 |
On 01:11 Tue 28 Oct , Robin H. Johnson wrote: |
2 |
> On Tue, Oct 28, 2008 at 07:21:10AM +0000, Ciaran McCreesh wrote: |
3 |
> > More likely you're just having problems with the connection lasting for |
4 |
> > too long. 'git fetch' will resume (on a per-object basis, which is |
5 |
> > fine so long as there aren't too many huge packs). |
6 |
> |
7 |
> It's a single ~910MiB pack. I'll play with it and see how we can make it |
8 |
> better, maybe limiting the pack sizes to 50Mb/ea. |
9 |
|
10 |
I did a few repacks and found that there's a linear relationship between |
11 |
the log of the pack size and the repo size. If repo size is in GB and |
12 |
pack size is in MB: |
13 |
|
14 |
repo_size=m*(log(pack_size))+c, where m=-0.35, c=3.3 |
15 |
|
16 |
Using that, it's straightforward to optimize on whichever parameter you |
17 |
care about while still considering the other one. I also attached two |
18 |
graphs showing this more visually, one in log form and the other not. |
19 |
|
20 |
The clear preference is the largest pack size where we rarely encounter |
21 |
clone problems. Perhaps we should just cut the pack into fractions |
22 |
(900/2, 900/3, etc) until we no longer hear about problems cloning. |
23 |
|
24 |
One major issue is that apparently partial downloads of a single pack |
25 |
don't resume. Ideally, we should just get that fixed upstream and use a |
26 |
single huge pack. Thoughts? |
27 |
|
28 |
-- |
29 |
Thanks, |
30 |
Donnie |
31 |
|
32 |
Donnie Berkholz |
33 |
Developer, Gentoo Linux |
34 |
Blog: http://dberkholz.wordpress.com |