Gentoo Archives: gentoo-user

From: Walter Dnes <waltdnes@××××××××.org>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] [OT] Differences between wget and browser file retrieval?
Date: Fri, 15 Jan 2021 08:24:41
Message-Id: YAFROwwK2O45Rktx@waltdnes.org
In Reply to: Re: [gentoo-user] [OT] Differences between wget and browser file retrieval? by David Haller
1 On Thu, Jan 14, 2021 at 11:00:38PM +0100, David Haller wrote
2
3 > So, try:
4 >
5 > wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \
6 > https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
7
8 No luck. For DNS, I use my ISP's servers (Teksavvy) with fallback to
9 Google 8.8.8.8.
10
11 ########################################################################
12 [i3][waltdnes][/dev/shm] wget -S --no-check-certificate -U 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0' https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
13 --2021-01-15 02:15:30-- https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
14 Resolving files.ontario.ca... 13.33.160.117, 13.33.160.123, 13.33.160.45, ...
15 Connecting to files.ontario.ca|13.33.160.117|:443... connected.
16 HTTP request sent, awaiting response...
17 HTTP/1.1 200 OK
18 Content-Type: application/pdf
19 Content-Length: 0
20 Connection: keep-alive
21 Date: Thu, 14 Jan 2021 15:15:50 GMT
22 Last-Modified: Thu, 14 Jan 2021 15:15:50 GMT
23 ETag: "d41d8cd98f00b204e9800998ecf8427e"
24 x-amz-meta-ctime: 1610637349
25 x-amz-meta-mode: 33188
26 x-amz-meta-gid: 500
27 x-amz-meta-uid: 500
28 x-amz-meta-mtime: 1610637349
29 Accept-Ranges: bytes
30 Server: AmazonS3
31 X-Cache: Hit from cloudfront
32 Via: 1.1 47dbad48e25df8c5ccf2822e46c2aaa6.cloudfront.net (CloudFront)
33 X-Amz-Cf-Pop: YTO50-C3
34 X-Amz-Cf-Id: ARgHfF6QMVfUtkxqkr0AL5ljxIfE7Yd5xPmA4eDMx46NdPXOwIftnQ==
35 Age: 57573
36 Length: 0 [application/pdf]
37 Saving to: 'moh-covid-19-report-en-2021-01-14.pdf'
38
39 moh-covid-19-report [ <=> ] 0 --.-KB/s in 0s
40
41 2021-01-15 02:15:30 (0.00 B/s) - 'moh-covid-19-report-en-2021-01-14.pdf' saved [0/0]
42 ########################################################################
43
44
45 > BTW: you know that you can let date format that URL? e.g.:
46 >
47 > wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \
48 > "$(date '+https://files.ontario.ca/moh-covid-19-report-en-%Y-%m-%d.pdf')"
49
50 Nice, but civil servants get stat holidays off. I downloaded Dec 25th
51 and 26th PDFs on the 26th. Monday Dec 28th was a lieu day for Boxing
52 day, so I downloaded the 28th and 29th PDFs on the 29th. And of course
53 Jan 1st and 2nd PDFs on Jan 2nd. That's why I can't automate the date.
54 I have a script "getone"...
55
56 [i3][waltdnes][~/covid] cat getone
57 #!/bin/bash
58 wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-${1}.pdf
59
60 On the 14th it was invoked as "../getone 14" (called from the working
61 directory, one level below the main "covid" directory). I tweak the
62 script once a month to match year+month. In a worst-case scenario. I
63 can go to
64 https://covid-19.ontario.ca/covid-19-epidemiologic-summaries-public-health-ontario#daily
65 to manually retrieve a daily PDF. Note that on this page, they list
66 the date that the report is up to. The report issued 10:15 AM on the
67 14th shows up in the listing as "COVID-19 in Ontario: January 13, 2021".
68 That's because it contains data up to the 13th.
69
70 --
71 Walter Dnes <waltdnes@××××××××.org>
72 I don't run "desktop environments"; I run useful applications