1 |
On Fri, Nov 30, 2012 at 09:35:07AM -0800, Zac Medico wrote: |
2 |
> > However, I'm not aware of gnu tar's incremental archive. If it's much |
3 |
> > faster than the above, then it should probably replace |
4 |
> > emerge-delta-webrsync. |
5 |
> If it has benefits over the current diffball approach used by |
6 |
> emerge-delta-webrsync, then it seems like a good idea. It would be nice |
7 |
> to integrate it directly into emerge-webrsync, and eventually deprecate |
8 |
> emerge-delta-webrsync. |
9 |
I went and did a rough comparison of Tar incrementals vs the existing |
10 |
deltas. |
11 |
|
12 |
TL;DR: |
13 |
====== |
14 |
- Existing deltas are 8-9x better than other options. |
15 |
- We should consider retaining monthly snapshots, plus all the deltas. |
16 |
|
17 |
Results: |
18 |
======== |
19 |
1. |
20 |
Using bzip2 -9 compression: |
21 |
- Existing deltas are 9x smaller than tar-incremental. |
22 |
- Existing deltas are 8x smaller than rsync-batch. |
23 |
|
24 |
2. |
25 |
If you just want to save bandwidth, the average full snapshot, |
26 |
compressed w/ BZIP2, is 55M. The average delta is 269k. |
27 |
55M/269k = ~209. |
28 |
Ergo it is LESS bandwidth to download ~180 deltas and apply those than |
29 |
it is to download the full snapshot (assuming upstream side of the |
30 |
transaction accounts for ~30 snapshots worth of overhead). |
31 |
|
32 |
Notes: |
33 |
====== |
34 |
1. |
35 |
Extracting tar incrementals, you must be VERY careful to perform |
36 |
operations in the correct order, otherwise removed files will not |
37 |
actually be deleted. |
38 |
|
39 |
2. |
40 |
When the Git repo goes live, we should tag at the point we take the |
41 |
daily snapshot, and use this to also consider git bundles. |
42 |
|
43 |
Numbers: |
44 |
======== |
45 |
|
46 |
Baseline tarball: |
47 |
57919736 portage-20121123.0.tar.bz2 |
48 |
|
49 |
Tar incrementals, daily: |
50 |
2554334 portage-20121123-20121124.1.tar.bz2 |
51 |
2045216 portage-20121124-20121125.1.tar.bz2 |
52 |
1936313 portage-20121125-20121126.1.tar.bz2 |
53 |
2355342 portage-20121126-20121127.1.tar.bz2 |
54 |
2063612 portage-20121127-20121128.1.tar.bz2 |
55 |
2582600 portage-20121128-20121129.1.tar.bz2 |
56 |
2720135 portage-20121129-20121130.1.tar.bz2 |
57 |
|
58 |
Rsync incrementals, daily: |
59 |
2224311 portage-20121123-20121124.rsync-batch.bz2 |
60 |
1869241 portage-20121124-20121125.rsync-batch.bz2 |
61 |
1802648 portage-20121125-20121126.rsync-batch.bz2 |
62 |
1936937 portage-20121126-20121127.rsync-batch.bz2 |
63 |
1868771 portage-20121127-20121128.rsync-batch.bz2 |
64 |
2240386 portage-20121128-20121129.rsync-batch.bz2 |
65 |
2028207 portage-20121129-20121130.rsync-batch.bz2 |
66 |
|
67 |
Existing deltas, daily: |
68 |
252400 snapshot-20121123-20121124.patch.bz2 |
69 |
267094 snapshot-20121124-20121125.patch.bz2 |
70 |
161136 snapshot-20121125-20121126.patch.bz2 |
71 |
225349 snapshot-20121126-20121127.patch.bz2 |
72 |
245804 snapshot-20121127-20121128.patch.bz2 |
73 |
232549 snapshot-20121128-20121129.patch.bz2 |
74 |
332835 snapshot-20121129-20121130.patch.bz2 |
75 |
|
76 |
Rsync incrementals, from baseline: |
77 |
2224311 portage-20121123-20121124.rsync-batch.bz2 |
78 |
2536620 portage-20121123-20121125.rsync-batch.bz2 |
79 |
2700715 portage-20121123-20121126.rsync-batch.bz2 |
80 |
2986403 portage-20121123-20121127.rsync-batch.bz2 |
81 |
3258723 portage-20121123-20121128.rsync-batch.bz2 |
82 |
3824015 portage-20121123-20121129.rsync-batch.bz2 |
83 |
4232674 portage-20121123-20121130.rsync-batch.bz2 |
84 |
|
85 |
-- |
86 |
Robin Hugh Johnson |
87 |
Gentoo Linux: Developer, Trustee & Infrastructure Lead |
88 |
E-Mail : robbat2@g.o |
89 |
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 |