1 |
I'm bored, so I do a regular daily report at the DSL Reports "CanChat" |
2 |
sub-forum, on the Covid-19 case counts for Ontario, using provincial |
3 |
data. I download 2 files daily as source data. One of them is a PDF |
4 |
file, which is run through "pdftotext" and then parsed by a bash script |
5 |
(don't ask). Today, the command... |
6 |
|
7 |
wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf |
8 |
|
9 |
...returns a zero-byte file. *BUT*, sticking the URL into the URL bar |
10 |
of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up the |
11 |
PDF file just fine. Is "wget" being blocked? I have to do extra steps |
12 |
to get from the browser-invoked PDF to get the PDF file saved to the |
13 |
standard work area where my script expects it to be, so it can work its |
14 |
magic and parse out the daily breakdown by PHU (Public Health Unit). |
15 |
BTW, today's posts requiring the PDF file are... |
16 |
https://www.dslreports.com/forum/r33002718- |
17 |
https://www.dslreports.com/forum/r33002752- |
18 |
|
19 |
I've tried setting --user-agent= with my browser's string as shown by |
20 |
https://www.whatismybrowser.com/detect/what-is-my-user-agent but no |
21 |
luck. Is there some way to get around this? I have not updated this |
22 |
past week, so I don't think the problem is at my end. |
23 |
|
24 |
-- |
25 |
Walter Dnes <waltdnes@××××××××.org> |
26 |
I don't run "desktop environments"; I run useful applications |