Gentoo Archives: gentoo-dev

From: Alec Warner <antarus@g.o>
To: Gentoo Dev <gentoo-dev@l.g.o>
Subject: Re: [gentoo-dev] overlays.gentoo.org restoration & post-mortem
Date: Sat, 18 Jan 2014 06:02:02
Message-Id: CAAr7Pr8N1bHVVYihpjVntZ_AdED-8Fh30e=Aknh8=-rFDkBi9g@mail.gmail.com
In Reply to: [gentoo-dev] overlays.gentoo.org restoration & post-mortem by "Robin H. Johnson"
1 On Fri, Jan 17, 2014 at 9:02 PM, Robin H. Johnson <robbat2@g.o>wrote:
2
3 > overlays.gentoo.org service has been restored on a new system.
4 > Some statistics and a post-mortem follow.
5 >
6 > Special thanks to antarus and a3li for all their interactions with our
7 > sponsor,
8 > and managing most of the details. I just did the final data recovery and
9 > this
10 > writeup.
11 >
12 > Please resume using the service, and if you see something weird that you
13 > think is different from before, please file a bug for Infrastructure.
14 >
15 > In the process, the service moved to a new machine. The SSH keys have
16 > changed
17 > as follows:
18 > DSA: d6:71:99:1f:46:c9:42:95:e1:9d:be:8e:f7:76:51:b5
19 > RSA: 92:b5:40:16:63:a3:61:9f:d7:63:64:ba:d5:51:41:b9
20 > ECDSA: 96:f0:29:e6:d4:85:58:46:31:ba:0e:17:0b:8c:fa:d8
21 >
22 > As this time, we will NOT be restoring Trac due to low demand. If you
23 > still require an web-based SVN browser for old SVN repos, please contact
24 > us at infra@g.o.
25 >
26
27 For Trac wiki users. The recommendation is to move to wiki.gentoo.org. If
28 you hadn't migrated, and you need a copy of your Trac wiki pages from
29 overlays.gentoo.org, please file a bug against infra and someone (me) will
30 restore them for on a request by request basis. I think the deal is that I
31 can pretty trivially give you a tarball of markup files (one per wiki page.)
32
33 -A
34
35
36 >
37 > If you have a dev/ repo under the list 'IMPORTANT' below, you MUST push
38 > to the server again.
39 >
40 > IMPORTANT: The following repos were damaged beyond repair, and were not
41 > available in backups. You'll need to push again, I have reset the repos to
42 > empty:
43 > dev/anarchy.git
44 > dev/dberkholz.git
45 > dev/dev-zero.git
46 > dev/dilfridge.git
47 > dev/fordfrog.git
48 > dev/graaff.git
49 > dev/maekke.git
50 > dev/mschiff.git
51 > dev/quantumsummers.git
52 > dev/zorry.git
53 >
54 > FYI: The following repos appeared to be empty:
55 > dev/b33fc0d3.git
56 > dev/moult.git
57 > dev/tomwij.git
58 > user/blueicefield.git
59 > user/disinbox.git
60 > user/palatis.git
61 > user/paragon.git
62 > user/vmalov.git
63 > user/xray.git
64 >
65 > FYI: The following repos contained dangling commits/tags/blobs, and this
66 > should not be considered new breakage; if you have a newer copy, you are
67 > encouraged to push again:
68 > dev/blueness.git
69 > dev/maksbotan.git
70 > dev/mgorny.git
71 > dev/qiaomuf.git
72 > dev/xmw.git
73 > proj/betagarden.git
74 > proj/catalyst.git (+tags)
75 > proj/devmanual.git
76 > proj/dotnet.git
77 > proj/elfix.git (+tags)
78 > proj/emacs-tools.git
79 > proj/gamerlay.git
80 > proj/hardened-dev.git
81 > proj/hardened-patchset.git
82 > proj/kde.git
83 > proj/lisp.git
84 > proj/openrc.git (+tags)
85 > proj/portage.git
86 > proj/ruby-overlay.git
87 > proj/sci.git
88 > proj/sunrise.git
89 > proj/webapp-config.git
90 > proj/x11.git
91 > user/gmt.git
92 > user/mv.git (+blobs)
93 > user/palmer.git
94 >
95 > Statistics:
96 > -----------
97 > 354 repos total
98 > - 10 repos unrecoverable (all in /dev)
99 > = 344 repos recovered/available
100 >
101 > 9 repos that seem to empty
102 > 26 repos with dangling commits/tags/blobs
103 > 2 repos recovered from external sources.
104 >
105 > Breakdown by path:
106 > ------------------
107 > 193 proj/ repos
108 > 69 dev/ repos
109 > 91 user/ repos
110 > 1 other repo
111 >
112 > Post-mortem
113 > -----------
114 > Hornbill went offline around: 2014-01-10 13:13 UTC
115 > Hornbill last started a backup of VCS: 2014-01-10 07:59:04 UTC
116 > Hornbill last completed a backup of VCS: 2014-01-10 08:20:54 UTC
117 >
118 > Between the backup starting, and the server going offline, we were able
119 > to confirm writes to the following Git repos:
120 > dev/fordfrog.git
121 > proj/kde.git
122 > gitolite-admin.git
123 >
124 > We believe that there were no writes to user/ repos, but are not 100%
125 > certain, as the logging was insufficient for this purpose.
126 >
127 > Hornbill went offline just over a week ago: Mid-afternoon on a Friday
128 > for the timezone where it's located. Due staff turnover and business
129 > changes at the previous sponsor, we were not able to contact anybody
130 > until regular office hours on Monday, January 13th.
131 >
132 > The server in question, while previously functioning, was not
133 > recoverable after a remote hands reboot on Monday afternoon (UTC).
134 > On Tuesday, more the sponsor was able to examine in it more depth, and
135 > it was not recoverable. More concealingly, it turned out to be one of
136 > the few remaining Gentoo infrastructure systems with IDE drives. The
137 > data was recovered, however it seemed to have a lot of corruption.
138 >
139 > It was noted that our backups were missing all of the dev/ repos, due to
140 > a system-wide rule to exclude /dev/ from backups (the rule should only
141 > be the real /dev, not any directory simply named "dev"). For this
142 > reason, we decided to try and get the data from the old server.
143 >
144 > Verification/recovery of the remaining data was also hampered by
145 > confirming that some of the Git repos in the backup were not entirely
146 > clean, containing legacy errors that turned out to be false positives
147 > from their CVS/SVN conversions, or dangling commits/blobs/tags.
148 >
149 > What could we do better next time:
150 > ----------------------------------
151 > - Have backups of all repos!
152 > - Compare the age of the backup immediately, and consider going live
153 > with the backup. Only 5 hours of work would have been lost, and even
154 > then possibly only temporarily, due to the distributed nature of Git.
155 > - More people need to use the infra-status page to learn about the state
156 > of Gentoo services.
157 >
158 > Actions for Infra
159 > -----------------
160 > - Include dev/ repos were not in the backup
161 > - Set up Gitolite mirroring
162 > - Review gitolite logging (needs to be easier to confirm when writes
163 > took place)
164 >
165 > --
166 > Robin Hugh Johnson
167 > Gentoo Linux: Developer, Infrastructure Lead
168 > E-Mail : robbat2@g.o
169 > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
170 >