1 |
On 23/10/2013 20:40, Noah McNallie wrote: |
2 |
> I'm trying to mirror |
3 |
> 'http://tinderbox.dev.gentoo.org/default-linux/sparc' because I run a |
4 |
> sparcv9 machine that is quite old and not good for compiling. This is |
5 |
> the only gentoo sparc binary repo that I know of and I'd like to have a |
6 |
> local copy that I would make available on a webserver. |
7 |
> |
8 |
> I'd like to know the best way to mirror it. I have been trying with the |
9 |
> following command: |
10 |
> |
11 |
> 'wget -mkN -np http://tinderbox.dev.gentoo.org/default-linux/sparc/' |
12 |
> |
13 |
> -m to mirror -N to check timestamps and only update and -np not to |
14 |
> access the parent directory. |
15 |
> |
16 |
> This will get the files but the second time around it does not update to |
17 |
> only changed files but tries to grab everything again. |
18 |
> |
19 |
> Could someone point me in the direction for mirroring this lovely repo? |
20 |
> |
21 |
> Noah McNallie |
22 |
|
23 |
|
24 |
You are downloading tbz files, not html files, so every file downloaded |
25 |
has this in the output to the console: |
26 |
|
27 |
|
28 |
Reusing existing connection to tinderbox.dev.gentoo.org:80. |
29 |
HTTP request sent, awaiting response... 200 OK |
30 |
Length: 104635 (102K) [application/octet-stream] |
31 |
Last-modified header missing -- time-stamps turned off. |
32 |
^^^^^^^^^^^^^^^^^^^^^^ |
33 |
|
34 |
This leaves wget only one option - it cannot confirm that the file is |
35 |
unchanged, so it has to download it newly just to be sure. I don't know |
36 |
of any option to wget to assume that existing local files with the _same |
37 |
size_ as remote files must be identical and to ignore them. That would |
38 |
indeed be very dodgy and unsafe. |
39 |
|
40 |
rsync was developed to amongst other things work around this kind of |
41 |
problem - the protocol transmits the information needed to make this |
42 |
decision instead of trying to rely on HTML headers. tinderbox.dev is |
43 |
also not running rsyncd :-( |
44 |
|
45 |
Re-downloading everything everytime seems to be your only option. I |
46 |
don;t imagine that repo changes all that much with time though, why |
47 |
don't you just re-sync infrequently, like once a week maybe? |
48 |
|
49 |
-- |
50 |
Alan McKinnon |
51 |
alan.mckinnon@×××××.com |