Gentoo Archives: gentoo-user

From: David Haller <gentoo@×××××××.de>
To: Gentoo Users List <gentoo-user@l.g.o>
Subject: Re: [gentoo-user] [OT] Differences between wget and browser file retrieval?
Date: Thu, 14 Jan 2021 22:01:30
Message-Id: 20210114220038.sg4lcwaew3xzt67t@grusum.endjinn.de
In Reply to: [gentoo-user] [OT] Differences between wget and browser file retrieval? by Walter Dnes
1 Hello,
2
3 On Thu, 14 Jan 2021, Walter Dnes wrote:
4 > I'm bored, so I do a regular daily report at the DSL Reports "CanChat"
5 >sub-forum, on the Covid-19 case counts for Ontario, using provincial
6 >data. I download 2 files daily as source data. One of them is a PDF
7 >file, which is run through "pdftotext" and then parsed by a bash script
8 >(don't ask). Today, the command...
9 >
10 > wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
11 >
12 >...returns a zero-byte file. *BUT*, sticking the URL into the URL bar
13 >of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up the
14 >PDF file just fine. Is "wget" being blocked?
15 [..]
16 > I've tried setting --user-agent= with my browser's string as shown by
17 >https://www.whatismybrowser.com/detect/what-is-my-user-agent but no
18 >luck. Is there some way to get around this? I have not updated this
19 >past week, so I don't think the problem is at my end.
20
21 I could download that file just fine just now[1]. Try running 'wget'
22 with the '-S' option. Oh and:
23
24 [..]
25 WARNING: cannot verify files.ontario.ca's certificate, issued by
26 [..]
27
28 If you sent stderr to /dev/null ...
29
30 So, try:
31
32 wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \
33 https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf
34
35 BTW: you know that you can let date format that URL? e.g.:
36
37 wget -S --no-check-certificate -U 'Mozilla/5.0 ...' \
38 "$(date '+https://files.ontario.ca/moh-covid-19-report-en-%Y-%m-%d.pdf')"
39
40 There just are no unescaped '%' allowed besides the format strings for
41 the date/time. So if an URL contains one, you need to escape those
42 with another '%', as in e.g.
43 $(date '+foo%%20bar-%Y-%m-%d.pdf')
44 ^^ this fella
45
46 In your case, the URL is clean ;)
47
48 HTH,
49 -dnh
50
51 [1] $ TZ=America/Toronto date
52 Thu Jan 14 16:50:15 EST 2021
53
54 --
55 "Airplane travel is nature's way of making you look like your passport
56 photo." -- Al Gore

Replies

Subject Author
Re: [gentoo-user] [OT] Differences between wget and browser file retrieval? Philip Webb <purslow@××××××××.net>
Re: [gentoo-user] [OT] Differences between wget and browser file retrieval? Walter Dnes <waltdnes@××××××××.org>