Gentoo Archives: gentoo-user

From: Alan McKinnon <alan.mckinnon@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Mirroring part of tinderbox.dev.gentoo.org
Date: Thu, 24 Oct 2013 06:12:11
Message-Id: 5268B8F4.2090101@gmail.com
In Reply to: [gentoo-user] Mirroring part of tinderbox.dev.gentoo.org by Noah McNallie
1 On 23/10/2013 20:40, Noah McNallie wrote:
2 > I'm trying to mirror
3 > 'http://tinderbox.dev.gentoo.org/default-linux/sparc' because I run a
4 > sparcv9 machine that is quite old and not good for compiling. This is
5 > the only gentoo sparc binary repo that I know of and I'd like to have a
6 > local copy that I would make available on a webserver.
7 >
8 > I'd like to know the best way to mirror it. I have been trying with the
9 > following command:
10 >
11 > 'wget -mkN -np http://tinderbox.dev.gentoo.org/default-linux/sparc/'
12 >
13 > -m to mirror -N to check timestamps and only update and -np not to
14 > access the parent directory.
15 >
16 > This will get the files but the second time around it does not update to
17 > only changed files but tries to grab everything again.
18 >
19 > Could someone point me in the direction for mirroring this lovely repo?
20 >
21 > Noah McNallie
22
23
24 You are downloading tbz files, not html files, so every file downloaded
25 has this in the output to the console:
26
27
28 Reusing existing connection to tinderbox.dev.gentoo.org:80.
29 HTTP request sent, awaiting response... 200 OK
30 Length: 104635 (102K) [application/octet-stream]
31 Last-modified header missing -- time-stamps turned off.
32 ^^^^^^^^^^^^^^^^^^^^^^
33
34 This leaves wget only one option - it cannot confirm that the file is
35 unchanged, so it has to download it newly just to be sure. I don't know
36 of any option to wget to assume that existing local files with the _same
37 size_ as remote files must be identical and to ignore them. That would
38 indeed be very dodgy and unsafe.
39
40 rsync was developed to amongst other things work around this kind of
41 problem - the protocol transmits the information needed to make this
42 decision instead of trying to rely on HTML headers. tinderbox.dev is
43 also not running rsyncd :-(
44
45 Re-downloading everything everytime seems to be your only option. I
46 don;t imagine that repo changes all that much with time though, why
47 don't you just re-sync infrequently, like once a week maybe?
48
49 --
50 Alan McKinnon
51 alan.mckinnon@×××××.com