1 |
Hello, |
2 |
|
3 |
|
4 |
With a little search, I saw it (on MARC, software used by gentoo [1]) : |
5 |
|
6 |
" Robot policy |
7 |
|
8 |
In theory, we don't mind people snarfing down some MARC pages for |
9 |
off-line reading. (I travel a lot, and sometimes want to pull down long |
10 |
threads before hitting the road to read locally, etc.) |
11 |
|
12 |
On the other hand... first, if we think you are a spam-bot |
13 |
address-harvester, no death is slow or painful enough. Also, even if |
14 |
well-intentioned, a robot crawling MARC can sometimes create a DoS; if |
15 |
the robot sustains many parallel requests (or we happen to be hit by |
16 |
multiple different robots at the same time) and doesn't back off if the |
17 |
site starts to slow down, it can bog down the server. In a perfect world |
18 |
MARC would scale better, and would automatically recognize abusive |
19 |
robots 100% accurately, 100% of the time. But since it's not a perfect |
20 |
world... we may throttle traffic from you if your IP, user-agent, or |
21 |
IP/user-agent combination have misbehaved in the past. |
22 |
|
23 |
If you want to crawl MARC, please be sure you have a delay between |
24 |
requests, say one or two seconds. If you think we've mis-identified you |
25 |
as a robot, please feel free to contact us. Please include information |
26 |
we'll need to find your activity in our logs, such as the time you get |
27 |
this message, the IP address(es) you are browsing from, and the |
28 |
user-agent (web browser) you are using." |
29 |
|
30 |
|
31 |
With this warning, you can use the MARC interface to try to dump what |
32 |
you want. Try to start with at the mailing list page [2]. The on MARC |
33 |
contains date, and others mail headers. Use libcurl and sed should be |
34 |
sufficient for your needs. |
35 |
|
36 |
|
37 |
|
38 |
[1] : https://marc.info/?q=about |
39 |
|
40 |
[2] : https://marc.info/?l=gentoo-announce&r=1&w=2 |
41 |
|
42 |
|
43 |
|
44 |
Good luck :) |
45 |
|
46 |
|
47 |
Hogren |
48 |
|
49 |
|
50 |
On 04/01/2017 08:58, Floyd Anderson wrote: |
51 |
> On Tue, 03 Jan 20:12:05 -0500 |
52 |
> Philip Webb <purslow@××××××××.net> wrote: |
53 |
>> 170104 Floyd Anderson wrote: |
54 |
>>> Is it possible — and when how — to retrieve bounced mailing list |
55 |
>>> messages, |
56 |
>>> e.g. from <gentoo-announce@l.g.o> or this list) ? |
57 |
>> |
58 |
>> You can recover everything from the Gentoo lists' archive : |
59 |
>> http://archives.gentoo.org/ . |
60 |
> |
61 |
> Thanks for your response. I’ve already found [1] but it’s hard (even |
62 |
> impossible) to figure out the bounced message(s) from there. Notice, |
63 |
> the bouncing messages information from mailing list manager (mlmmj) |
64 |
> looks like: |
65 |
> |
66 |
>> Some messages to you could not be delivered. If you're seeing this |
67 |
>> message it means things are back to normal, and it′ merely for your |
68 |
>> information. |
69 |
>> |
70 |
>> Here is the list of the bounced messages: |
71 |
>> - 174956 |
72 |
>> - 174958 |
73 |
> |
74 |
> I see no change to find any message on [1] by its message number. |
75 |
> Although it were possible, my goal is to have the messages local |
76 |
> stored and searchable while being offline. Also [2] doesn’t help here |
77 |
> even though it offers a message download link — it’s not the raw email |
78 |
> (with header fields). |
79 |
> |
80 |
> Anyway, you push me in the right direction. After digging somewhat |
81 |
> deeper I found the ability to send a message request to e.g. |
82 |
> <gentoo-user+get-N@l.g.o> (where N is the message number). |
83 |
> |
84 |
> Now there is still one thing. How to get messages (better whole |
85 |
> threads) for offline usage from a period before the list subscription |
86 |
> when their message numbers is unknown? |
87 |
> |
88 |
> |
89 |
> [1] <https://archives.gentoo.org/gentoo-user/> |
90 |
> [2] <https://marc.info/?l=gentoo-user&m=148349234522442&w=2> |
91 |
> |