Gentoo Archives: gentoo-dev

From: "Robin H. Johnson" <robbat2@g.o>
To: gentoo-dev@l.g.o
Cc: gentoo-dev-announce@l.g.o
Subject: [gentoo-dev] overlays.gentoo.org restoration & post-mortem
Date: Sat, 18 Jan 2014 05:03:11
Message-Id: 20140118050256.GF3378@orbis-terrarum.net
1 overlays.gentoo.org service has been restored on a new system.
2 Some statistics and a post-mortem follow.
3
4 Special thanks to antarus and a3li for all their interactions with our sponsor,
5 and managing most of the details. I just did the final data recovery and this
6 writeup.
7
8 Please resume using the service, and if you see something weird that you
9 think is different from before, please file a bug for Infrastructure.
10
11 In the process, the service moved to a new machine. The SSH keys have changed
12 as follows:
13 DSA: d6:71:99:1f:46:c9:42:95:e1:9d:be:8e:f7:76:51:b5
14 RSA: 92:b5:40:16:63:a3:61:9f:d7:63:64:ba:d5:51:41:b9
15 ECDSA: 96:f0:29:e6:d4:85:58:46:31:ba:0e:17:0b:8c:fa:d8
16
17 As this time, we will NOT be restoring Trac due to low demand. If you
18 still require an web-based SVN browser for old SVN repos, please contact
19 us at infra@g.o.
20
21 If you have a dev/ repo under the list 'IMPORTANT' below, you MUST push
22 to the server again.
23
24 IMPORTANT: The following repos were damaged beyond repair, and were not
25 available in backups. You'll need to push again, I have reset the repos to
26 empty:
27 dev/anarchy.git
28 dev/dberkholz.git
29 dev/dev-zero.git
30 dev/dilfridge.git
31 dev/fordfrog.git
32 dev/graaff.git
33 dev/maekke.git
34 dev/mschiff.git
35 dev/quantumsummers.git
36 dev/zorry.git
37
38 FYI: The following repos appeared to be empty:
39 dev/b33fc0d3.git
40 dev/moult.git
41 dev/tomwij.git
42 user/blueicefield.git
43 user/disinbox.git
44 user/palatis.git
45 user/paragon.git
46 user/vmalov.git
47 user/xray.git
48
49 FYI: The following repos contained dangling commits/tags/blobs, and this
50 should not be considered new breakage; if you have a newer copy, you are
51 encouraged to push again:
52 dev/blueness.git
53 dev/maksbotan.git
54 dev/mgorny.git
55 dev/qiaomuf.git
56 dev/xmw.git
57 proj/betagarden.git
58 proj/catalyst.git (+tags)
59 proj/devmanual.git
60 proj/dotnet.git
61 proj/elfix.git (+tags)
62 proj/emacs-tools.git
63 proj/gamerlay.git
64 proj/hardened-dev.git
65 proj/hardened-patchset.git
66 proj/kde.git
67 proj/lisp.git
68 proj/openrc.git (+tags)
69 proj/portage.git
70 proj/ruby-overlay.git
71 proj/sci.git
72 proj/sunrise.git
73 proj/webapp-config.git
74 proj/x11.git
75 user/gmt.git
76 user/mv.git (+blobs)
77 user/palmer.git
78
79 Statistics:
80 -----------
81 354 repos total
82 - 10 repos unrecoverable (all in /dev)
83 = 344 repos recovered/available
84
85 9 repos that seem to empty
86 26 repos with dangling commits/tags/blobs
87 2 repos recovered from external sources.
88
89 Breakdown by path:
90 ------------------
91 193 proj/ repos
92 69 dev/ repos
93 91 user/ repos
94 1 other repo
95
96 Post-mortem
97 -----------
98 Hornbill went offline around: 2014-01-10 13:13 UTC
99 Hornbill last started a backup of VCS: 2014-01-10 07:59:04 UTC
100 Hornbill last completed a backup of VCS: 2014-01-10 08:20:54 UTC
101
102 Between the backup starting, and the server going offline, we were able
103 to confirm writes to the following Git repos:
104 dev/fordfrog.git
105 proj/kde.git
106 gitolite-admin.git
107
108 We believe that there were no writes to user/ repos, but are not 100%
109 certain, as the logging was insufficient for this purpose.
110
111 Hornbill went offline just over a week ago: Mid-afternoon on a Friday
112 for the timezone where it's located. Due staff turnover and business
113 changes at the previous sponsor, we were not able to contact anybody
114 until regular office hours on Monday, January 13th.
115
116 The server in question, while previously functioning, was not
117 recoverable after a remote hands reboot on Monday afternoon (UTC).
118 On Tuesday, more the sponsor was able to examine in it more depth, and
119 it was not recoverable. More concealingly, it turned out to be one of
120 the few remaining Gentoo infrastructure systems with IDE drives. The
121 data was recovered, however it seemed to have a lot of corruption.
122
123 It was noted that our backups were missing all of the dev/ repos, due to
124 a system-wide rule to exclude /dev/ from backups (the rule should only
125 be the real /dev, not any directory simply named "dev"). For this
126 reason, we decided to try and get the data from the old server.
127
128 Verification/recovery of the remaining data was also hampered by
129 confirming that some of the Git repos in the backup were not entirely
130 clean, containing legacy errors that turned out to be false positives
131 from their CVS/SVN conversions, or dangling commits/blobs/tags.
132
133 What could we do better next time:
134 ----------------------------------
135 - Have backups of all repos!
136 - Compare the age of the backup immediately, and consider going live
137 with the backup. Only 5 hours of work would have been lost, and even
138 then possibly only temporarily, due to the distributed nature of Git.
139 - More people need to use the infra-status page to learn about the state
140 of Gentoo services.
141
142 Actions for Infra
143 -----------------
144 - Include dev/ repos were not in the backup
145 - Set up Gitolite mirroring
146 - Review gitolite logging (needs to be easier to confirm when writes
147 took place)
148
149 --
150 Robin Hugh Johnson
151 Gentoo Linux: Developer, Infrastructure Lead
152 E-Mail : robbat2@g.o
153 GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies