1 |
On 2021.01.14 15:49, Walter Dnes wrote: |
2 |
> I'm bored, so I do a regular daily report at the DSL Reports |
3 |
> "CanChat" |
4 |
> sub-forum, on the Covid-19 case counts for Ontario, using provincial |
5 |
> data. I download 2 files daily as source data. One of them is a PDF |
6 |
> file, which is run through "pdftotext" and then parsed by a bash |
7 |
> script |
8 |
> (don't ask). Today, the command... |
9 |
> |
10 |
> wget https://files.ontario.ca/moh-covid-19-report-en-2021-01-14.pdf |
11 |
> |
12 |
> ...returns a zero-byte file. *BUT*, sticking the URL into the URL bar |
13 |
> of Pale Moon and Google Chrome (and I assume Firefox/etc) brings up |
14 |
> the |
15 |
> PDF file just fine. Is "wget" being blocked? I have to do extra |
16 |
> steps |
17 |
> to get from the browser-invoked PDF to get the PDF file saved to the |
18 |
> standard work area where my script expects it to be, so it can work |
19 |
> its |
20 |
> magic and parse out the daily breakdown by PHU (Public Health Unit). |
21 |
> BTW, today's posts requiring the PDF file are... |
22 |
> https://www.dslreports.com/forum/r33002718- |
23 |
> https://www.dslreports.com/forum/r33002752- |
24 |
> |
25 |
> I've tried setting --user-agent= with my browser's string as shown |
26 |
> by |
27 |
> https://www.whatismybrowser.com/detect/what-is-my-user-agent but no |
28 |
> luck. Is there some way to get around this? I have not updated this |
29 |
> past week, so I don't think the problem is at my end. |
30 |
|
31 |
I just copy/pasted that wget command into my terminal, and it got me a |
32 |
1.7M PDF doc. I'm in the US, but I have no idea if location/IP is an |
33 |
issue or not. |
34 |
|
35 |
Jack |