Gentoo Archives: gentoo-dev

From: Alec Warner <antarus@g.o>
To: Gentoo Dev <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] overlays.gentoo.org restoration & post-mortem
Date: Sat, 18 Jan 2014 05:58:26
Message-Id: CAAr7Pr9GGy3KsDQJ8bzcJ95NpyUM-q-DMHd2a4FVZ=+UcoDXgw@mail.gmail.com
In Reply to: Re: [gentoo-dev] overlays.gentoo.org restoration & post-mortem by Kent Fredric
1 On Fri, Jan 17, 2014 at 9:23 PM, Kent Fredric <kentfredric@×××××.com> wrote:
2
3 >
4 > On 18 January 2014 18:02, Robin H. Johnson <robbat2@g.o> wrote:
5 >
6 >> - More people need to use the infra-status page to learn about the state
7 >> of Gentoo services.
8 >>
9 >
10 >
11 > A service middle layer like fastly or cloudflare which could link to the
12 > infra page would be good here perhaps, so when an outage occurred ( at
13 > least on the web side ) appropriate links to infra could be given.
14 >
15
16 Cloudly stuff aside (most of infra is not super experienced or trusting of
17 cloud stuff) I think there was a lot of indecision during the outage.
18 Do we wait for the sponsor or restore from backup?
19 How good are the backups (turns out, they were decent?)
20 How much work is it to rebuild from them (turns out, one evening of Robin's
21 time + incidentals.)
22
23 Once we got the data back on the new machine, why did we post the all
24 clear? Then we knew there was corruption, but it took a long time to
25 disable git and http access. Some repos were missing, some were corrupt,
26 etc.
27
28 We don't have procedures for these sorts of things. I think we were
29 conservative in the changes we made. How do you disable a service like
30 gitolite? We deployed two fixes. One was to disable ssh for the 'git' user,
31 the second was to move the authorized keys files out of the way. We pursued
32 these avenues independently, and we did not check them into configuration
33 management, which I wish had happened. Later when we disabled the http part
34 (to make overlays throw 503's) that was checked in, which was nice.
35 Certainly I was afraid of breaking stuff for Robin, so I really tried to
36 avoid doing anything unless I was confident it would not impact him.
37
38
39 > And the infra status page is not exactly obvious. Its not listed on the
40 > "gentoo sites" list on the top right, and perhaps it aught to be.
41 >
42
43 I consider the page a great success in this story. I'm really happy about
44 it, and while you can always say 'hey we could have done better here' I
45 think we did pretty well.
46
47 -A
48
49
50 >
51 >
52 >
53 >
54 > --
55 > Kent
56 >
57 > perl -e "print substr( \"edrgmaM SPA NOcomil.ic\\@tfrken\", \$_ * 3, 3 )
58 > for ( 9,8,0,7,1,6,5,4,3,2 );"
59 >
60 > http://kent-fredric.fox.geek.nz
61 >