1 |
On Fri, Jan 17, 2014 at 9:02 PM, Robin H. Johnson <robbat2@g.o>wrote: |
2 |
|
3 |
> overlays.gentoo.org service has been restored on a new system. |
4 |
> Some statistics and a post-mortem follow. |
5 |
> |
6 |
> Special thanks to antarus and a3li for all their interactions with our |
7 |
> sponsor, |
8 |
> and managing most of the details. I just did the final data recovery and |
9 |
> this |
10 |
> writeup. |
11 |
> |
12 |
> Please resume using the service, and if you see something weird that you |
13 |
> think is different from before, please file a bug for Infrastructure. |
14 |
> |
15 |
> In the process, the service moved to a new machine. The SSH keys have |
16 |
> changed |
17 |
> as follows: |
18 |
> DSA: d6:71:99:1f:46:c9:42:95:e1:9d:be:8e:f7:76:51:b5 |
19 |
> RSA: 92:b5:40:16:63:a3:61:9f:d7:63:64:ba:d5:51:41:b9 |
20 |
> ECDSA: 96:f0:29:e6:d4:85:58:46:31:ba:0e:17:0b:8c:fa:d8 |
21 |
> |
22 |
> As this time, we will NOT be restoring Trac due to low demand. If you |
23 |
> still require an web-based SVN browser for old SVN repos, please contact |
24 |
> us at infra@g.o. |
25 |
> |
26 |
|
27 |
For Trac wiki users. The recommendation is to move to wiki.gentoo.org. If |
28 |
you hadn't migrated, and you need a copy of your Trac wiki pages from |
29 |
overlays.gentoo.org, please file a bug against infra and someone (me) will |
30 |
restore them for on a request by request basis. I think the deal is that I |
31 |
can pretty trivially give you a tarball of markup files (one per wiki page.) |
32 |
|
33 |
-A |
34 |
|
35 |
|
36 |
> |
37 |
> If you have a dev/ repo under the list 'IMPORTANT' below, you MUST push |
38 |
> to the server again. |
39 |
> |
40 |
> IMPORTANT: The following repos were damaged beyond repair, and were not |
41 |
> available in backups. You'll need to push again, I have reset the repos to |
42 |
> empty: |
43 |
> dev/anarchy.git |
44 |
> dev/dberkholz.git |
45 |
> dev/dev-zero.git |
46 |
> dev/dilfridge.git |
47 |
> dev/fordfrog.git |
48 |
> dev/graaff.git |
49 |
> dev/maekke.git |
50 |
> dev/mschiff.git |
51 |
> dev/quantumsummers.git |
52 |
> dev/zorry.git |
53 |
> |
54 |
> FYI: The following repos appeared to be empty: |
55 |
> dev/b33fc0d3.git |
56 |
> dev/moult.git |
57 |
> dev/tomwij.git |
58 |
> user/blueicefield.git |
59 |
> user/disinbox.git |
60 |
> user/palatis.git |
61 |
> user/paragon.git |
62 |
> user/vmalov.git |
63 |
> user/xray.git |
64 |
> |
65 |
> FYI: The following repos contained dangling commits/tags/blobs, and this |
66 |
> should not be considered new breakage; if you have a newer copy, you are |
67 |
> encouraged to push again: |
68 |
> dev/blueness.git |
69 |
> dev/maksbotan.git |
70 |
> dev/mgorny.git |
71 |
> dev/qiaomuf.git |
72 |
> dev/xmw.git |
73 |
> proj/betagarden.git |
74 |
> proj/catalyst.git (+tags) |
75 |
> proj/devmanual.git |
76 |
> proj/dotnet.git |
77 |
> proj/elfix.git (+tags) |
78 |
> proj/emacs-tools.git |
79 |
> proj/gamerlay.git |
80 |
> proj/hardened-dev.git |
81 |
> proj/hardened-patchset.git |
82 |
> proj/kde.git |
83 |
> proj/lisp.git |
84 |
> proj/openrc.git (+tags) |
85 |
> proj/portage.git |
86 |
> proj/ruby-overlay.git |
87 |
> proj/sci.git |
88 |
> proj/sunrise.git |
89 |
> proj/webapp-config.git |
90 |
> proj/x11.git |
91 |
> user/gmt.git |
92 |
> user/mv.git (+blobs) |
93 |
> user/palmer.git |
94 |
> |
95 |
> Statistics: |
96 |
> ----------- |
97 |
> 354 repos total |
98 |
> - 10 repos unrecoverable (all in /dev) |
99 |
> = 344 repos recovered/available |
100 |
> |
101 |
> 9 repos that seem to empty |
102 |
> 26 repos with dangling commits/tags/blobs |
103 |
> 2 repos recovered from external sources. |
104 |
> |
105 |
> Breakdown by path: |
106 |
> ------------------ |
107 |
> 193 proj/ repos |
108 |
> 69 dev/ repos |
109 |
> 91 user/ repos |
110 |
> 1 other repo |
111 |
> |
112 |
> Post-mortem |
113 |
> ----------- |
114 |
> Hornbill went offline around: 2014-01-10 13:13 UTC |
115 |
> Hornbill last started a backup of VCS: 2014-01-10 07:59:04 UTC |
116 |
> Hornbill last completed a backup of VCS: 2014-01-10 08:20:54 UTC |
117 |
> |
118 |
> Between the backup starting, and the server going offline, we were able |
119 |
> to confirm writes to the following Git repos: |
120 |
> dev/fordfrog.git |
121 |
> proj/kde.git |
122 |
> gitolite-admin.git |
123 |
> |
124 |
> We believe that there were no writes to user/ repos, but are not 100% |
125 |
> certain, as the logging was insufficient for this purpose. |
126 |
> |
127 |
> Hornbill went offline just over a week ago: Mid-afternoon on a Friday |
128 |
> for the timezone where it's located. Due staff turnover and business |
129 |
> changes at the previous sponsor, we were not able to contact anybody |
130 |
> until regular office hours on Monday, January 13th. |
131 |
> |
132 |
> The server in question, while previously functioning, was not |
133 |
> recoverable after a remote hands reboot on Monday afternoon (UTC). |
134 |
> On Tuesday, more the sponsor was able to examine in it more depth, and |
135 |
> it was not recoverable. More concealingly, it turned out to be one of |
136 |
> the few remaining Gentoo infrastructure systems with IDE drives. The |
137 |
> data was recovered, however it seemed to have a lot of corruption. |
138 |
> |
139 |
> It was noted that our backups were missing all of the dev/ repos, due to |
140 |
> a system-wide rule to exclude /dev/ from backups (the rule should only |
141 |
> be the real /dev, not any directory simply named "dev"). For this |
142 |
> reason, we decided to try and get the data from the old server. |
143 |
> |
144 |
> Verification/recovery of the remaining data was also hampered by |
145 |
> confirming that some of the Git repos in the backup were not entirely |
146 |
> clean, containing legacy errors that turned out to be false positives |
147 |
> from their CVS/SVN conversions, or dangling commits/blobs/tags. |
148 |
> |
149 |
> What could we do better next time: |
150 |
> ---------------------------------- |
151 |
> - Have backups of all repos! |
152 |
> - Compare the age of the backup immediately, and consider going live |
153 |
> with the backup. Only 5 hours of work would have been lost, and even |
154 |
> then possibly only temporarily, due to the distributed nature of Git. |
155 |
> - More people need to use the infra-status page to learn about the state |
156 |
> of Gentoo services. |
157 |
> |
158 |
> Actions for Infra |
159 |
> ----------------- |
160 |
> - Include dev/ repos were not in the backup |
161 |
> - Set up Gitolite mirroring |
162 |
> - Review gitolite logging (needs to be easier to confirm when writes |
163 |
> took place) |
164 |
> |
165 |
> -- |
166 |
> Robin Hugh Johnson |
167 |
> Gentoo Linux: Developer, Infrastructure Lead |
168 |
> E-Mail : robbat2@g.o |
169 |
> GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 |
170 |
> |