1 |
overlays.gentoo.org service has been restored on a new system. |
2 |
Some statistics and a post-mortem follow. |
3 |
|
4 |
Special thanks to antarus and a3li for all their interactions with our sponsor, |
5 |
and managing most of the details. I just did the final data recovery and this |
6 |
writeup. |
7 |
|
8 |
Please resume using the service, and if you see something weird that you |
9 |
think is different from before, please file a bug for Infrastructure. |
10 |
|
11 |
In the process, the service moved to a new machine. The SSH keys have changed |
12 |
as follows: |
13 |
DSA: d6:71:99:1f:46:c9:42:95:e1:9d:be:8e:f7:76:51:b5 |
14 |
RSA: 92:b5:40:16:63:a3:61:9f:d7:63:64:ba:d5:51:41:b9 |
15 |
ECDSA: 96:f0:29:e6:d4:85:58:46:31:ba:0e:17:0b:8c:fa:d8 |
16 |
|
17 |
As this time, we will NOT be restoring Trac due to low demand. If you |
18 |
still require an web-based SVN browser for old SVN repos, please contact |
19 |
us at infra@g.o. |
20 |
|
21 |
If you have a dev/ repo under the list 'IMPORTANT' below, you MUST push |
22 |
to the server again. |
23 |
|
24 |
IMPORTANT: The following repos were damaged beyond repair, and were not |
25 |
available in backups. You'll need to push again, I have reset the repos to |
26 |
empty: |
27 |
dev/anarchy.git |
28 |
dev/dberkholz.git |
29 |
dev/dev-zero.git |
30 |
dev/dilfridge.git |
31 |
dev/fordfrog.git |
32 |
dev/graaff.git |
33 |
dev/maekke.git |
34 |
dev/mschiff.git |
35 |
dev/quantumsummers.git |
36 |
dev/zorry.git |
37 |
|
38 |
FYI: The following repos appeared to be empty: |
39 |
dev/b33fc0d3.git |
40 |
dev/moult.git |
41 |
dev/tomwij.git |
42 |
user/blueicefield.git |
43 |
user/disinbox.git |
44 |
user/palatis.git |
45 |
user/paragon.git |
46 |
user/vmalov.git |
47 |
user/xray.git |
48 |
|
49 |
FYI: The following repos contained dangling commits/tags/blobs, and this |
50 |
should not be considered new breakage; if you have a newer copy, you are |
51 |
encouraged to push again: |
52 |
dev/blueness.git |
53 |
dev/maksbotan.git |
54 |
dev/mgorny.git |
55 |
dev/qiaomuf.git |
56 |
dev/xmw.git |
57 |
proj/betagarden.git |
58 |
proj/catalyst.git (+tags) |
59 |
proj/devmanual.git |
60 |
proj/dotnet.git |
61 |
proj/elfix.git (+tags) |
62 |
proj/emacs-tools.git |
63 |
proj/gamerlay.git |
64 |
proj/hardened-dev.git |
65 |
proj/hardened-patchset.git |
66 |
proj/kde.git |
67 |
proj/lisp.git |
68 |
proj/openrc.git (+tags) |
69 |
proj/portage.git |
70 |
proj/ruby-overlay.git |
71 |
proj/sci.git |
72 |
proj/sunrise.git |
73 |
proj/webapp-config.git |
74 |
proj/x11.git |
75 |
user/gmt.git |
76 |
user/mv.git (+blobs) |
77 |
user/palmer.git |
78 |
|
79 |
Statistics: |
80 |
----------- |
81 |
354 repos total |
82 |
- 10 repos unrecoverable (all in /dev) |
83 |
= 344 repos recovered/available |
84 |
|
85 |
9 repos that seem to empty |
86 |
26 repos with dangling commits/tags/blobs |
87 |
2 repos recovered from external sources. |
88 |
|
89 |
Breakdown by path: |
90 |
------------------ |
91 |
193 proj/ repos |
92 |
69 dev/ repos |
93 |
91 user/ repos |
94 |
1 other repo |
95 |
|
96 |
Post-mortem |
97 |
----------- |
98 |
Hornbill went offline around: 2014-01-10 13:13 UTC |
99 |
Hornbill last started a backup of VCS: 2014-01-10 07:59:04 UTC |
100 |
Hornbill last completed a backup of VCS: 2014-01-10 08:20:54 UTC |
101 |
|
102 |
Between the backup starting, and the server going offline, we were able |
103 |
to confirm writes to the following Git repos: |
104 |
dev/fordfrog.git |
105 |
proj/kde.git |
106 |
gitolite-admin.git |
107 |
|
108 |
We believe that there were no writes to user/ repos, but are not 100% |
109 |
certain, as the logging was insufficient for this purpose. |
110 |
|
111 |
Hornbill went offline just over a week ago: Mid-afternoon on a Friday |
112 |
for the timezone where it's located. Due staff turnover and business |
113 |
changes at the previous sponsor, we were not able to contact anybody |
114 |
until regular office hours on Monday, January 13th. |
115 |
|
116 |
The server in question, while previously functioning, was not |
117 |
recoverable after a remote hands reboot on Monday afternoon (UTC). |
118 |
On Tuesday, more the sponsor was able to examine in it more depth, and |
119 |
it was not recoverable. More concealingly, it turned out to be one of |
120 |
the few remaining Gentoo infrastructure systems with IDE drives. The |
121 |
data was recovered, however it seemed to have a lot of corruption. |
122 |
|
123 |
It was noted that our backups were missing all of the dev/ repos, due to |
124 |
a system-wide rule to exclude /dev/ from backups (the rule should only |
125 |
be the real /dev, not any directory simply named "dev"). For this |
126 |
reason, we decided to try and get the data from the old server. |
127 |
|
128 |
Verification/recovery of the remaining data was also hampered by |
129 |
confirming that some of the Git repos in the backup were not entirely |
130 |
clean, containing legacy errors that turned out to be false positives |
131 |
from their CVS/SVN conversions, or dangling commits/blobs/tags. |
132 |
|
133 |
What could we do better next time: |
134 |
---------------------------------- |
135 |
- Have backups of all repos! |
136 |
- Compare the age of the backup immediately, and consider going live |
137 |
with the backup. Only 5 hours of work would have been lost, and even |
138 |
then possibly only temporarily, due to the distributed nature of Git. |
139 |
- More people need to use the infra-status page to learn about the state |
140 |
of Gentoo services. |
141 |
|
142 |
Actions for Infra |
143 |
----------------- |
144 |
- Include dev/ repos were not in the backup |
145 |
- Set up Gitolite mirroring |
146 |
- Review gitolite logging (needs to be easier to confirm when writes |
147 |
took place) |
148 |
|
149 |
-- |
150 |
Robin Hugh Johnson |
151 |
Gentoo Linux: Developer, Infrastructure Lead |
152 |
E-Mail : robbat2@g.o |
153 |
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 |