1 |
Hello, |
2 |
|
3 |
On Thu, 14 Jan 2021, Walter Dnes wrote: |
4 |
> I'm bored, so I do a regular daily report at the DSL Reports "CanChat" |
5 |
>sub-forum, on the Covid-19 case counts for Ontario, using provincial |
6 |
>data. I download 2 files daily as source data. One of them is a PDF |
7 |
>file, which is run through "pdftotext" and then parsed by a bash script |
8 |
>(don't ask). Today, the command... |
9 |
> |
10 |
> wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf |
11 |
> |
12 |
>...returns a zero-byte file. *BUT*, sticking the URL into the URL bar |
13 |
>of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up the |
14 |
>PDF file just fine. Is "wget" being blocked? |
15 |
[..] |
16 |
> I've tried setting --user-agent= with my browser's string as shown by |
17 |
>https://www.whatismybrowser.com/detect/what-is-my-user-agent but no |
18 |
>luck. Is there some way to get around this? I have not updated this |
19 |
>past week, so I don't think the problem is at my end. |
20 |
|
21 |
I could download that file just fine just now[1]. Try running 'wget' |
22 |
with the '-S' option. Oh and: |
23 |
|
24 |
[..] |
25 |
WARNING: cannot verify files.ontario.ca's certificate, issued by |
26 |
[..] |
27 |
|
28 |
If you sent stderr to /dev/null ... |
29 |
|
30 |
So, try: |
31 |
|
32 |
wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \ |
33 |
https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf |
34 |
|
35 |
BTW: you know that you can let date format that URL? e.g.: |
36 |
|
37 |
wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \ |
38 |
"$(date '+https://files.ontario.ca/moh-covid-19-report-en-%Y-%m-%d.pdf')" |
39 |
|
40 |
There just are no unescaped '%' allowed besides the format strings for |
41 |
the date/time. So if an URL contains one, you need to escape those |
42 |
with another '%', as in e.g. |
43 |
$(date '+foo%%20bar-%Y-%m-%d.pdf') |
44 |
^^ this fella |
45 |
|
46 |
In your case, the URL is clean ;) |
47 |
|
48 |
HTH, |
49 |
-dnh |
50 |
|
51 |
[1] $ TZ=America/Toronto date |
52 |
Thu Jan 14 16:50:15 EST 2021 |
53 |
|
54 |
-- |
55 |
"Airplane travel is nature's way of making you look like your passport |
56 |
photo." -- Al Gore |