1 |
Firstly, I'll say I'm not experienced, but knowing a fair bit about raid |
2 |
and recovering corrupted arrays ... |
3 |
|
4 |
On 01/03/2021 22:25, John Blinka wrote: |
5 |
> HI, Gentooers! |
6 |
> |
7 |
> So, I typed dd if=/dev/zero of=/dev/sd<wrong letter>, and despite |
8 |
> hitting ctrl-c quite quickly, zeroed out some portion of the initial |
9 |
> part of a disk. Which did this to my zfs raidz3 array: |
10 |
> |
11 |
> NAME STATE READ WRITE CKSUM |
12 |
> zfs DEGRADED 0 0 0 |
13 |
> raidz3-0 DEGRADED 0 0 0 |
14 |
> ata-HGST_HUS724030ALE640_PK1234P8JJJVKP ONLINE 0 0 0 |
15 |
> ata-HGST_HUS724030ALE640_PK1234P8JJP3AP ONLINE 0 0 0 |
16 |
> ata-ST4000NM0033-9ZM170_Z1Z80P4C ONLINE 0 0 0 |
17 |
> ata-ST4000NM0033-9ZM170_Z1ZAZ8F1 ONLINE 0 0 0 |
18 |
> 14296253848142792483 UNAVAIL 0 0 |
19 |
> 0 was /dev/disk/by-id/ata-ST4000NM0033-9ZM170_Z1ZAZDJ0-part1 |
20 |
> ata-ST4000NM0033-9ZM170_Z1Z80KG0 ONLINE 0 0 0 |
21 |
> |
22 |
> Could have been worse. I do have backups, and it is raid3, so all |
23 |
> I've injured is my pride, but I do want to fix things. I'd |
24 |
> appreciate some guidance before I attempt doing this - I have no |
25 |
> experience at it myself. |
26 |
> |
27 |
> The steps I envision are |
28 |
> |
29 |
> 1) zpool offline zfs 14296253848142792483 (What's that number?) |
30 |
> 2) do something to repair the damaged disk |
31 |
> 3) zpool online zfs <repaired disk> |
32 |
> |
33 |
> Right now, the device name for the damaged disk is /dev/sda. Gdisk |
34 |
> says this about it: |
35 |
> |
36 |
> Caution: invalid main GPT header, but valid backup; regenerating main header |
37 |
> from backup! |
38 |
|
39 |
The GPT table is stored at least twice, this is telling you the primary |
40 |
copy is trashed, but the backup seems okay ... |
41 |
> |
42 |
> Warning: Invalid CRC on main header data; loaded backup partition table. |
43 |
> Warning! Main and backup partition tables differ! Use the 'c' and 'e' options |
44 |
> on the recovery & transformation menu to examine the two tables. |
45 |
> |
46 |
> Warning! Main partition table CRC mismatch! Loaded backup partition table |
47 |
> instead of main partition table! |
48 |
> |
49 |
> Warning! One or more CRCs don't match. You should repair the disk! |
50 |
> Main header: ERROR |
51 |
> Backup header: OK |
52 |
> Main partition table: ERROR |
53 |
> Backup partition table: OK |
54 |
> |
55 |
> Partition table scan: |
56 |
> MBR: not present |
57 |
> BSD: not present |
58 |
> APM: not present |
59 |
> GPT: damaged |
60 |
> |
61 |
> Found invalid MBR and corrupt GPT. What do you want to do? (Using the |
62 |
> GPT MAY permit recovery of GPT data.) |
63 |
> 1 - Use current GPT |
64 |
> 2 - Create blank GPT |
65 |
> |
66 |
> Your answer: ( I haven't given one yet) |
67 |
> |
68 |
> I'm not exactly sure what this is telling me. But I'm guessing it |
69 |
> means that the main partition table is gone, but there's a good |
70 |
> backup. |
71 |
|
72 |
Yup. I don't understand that prompt, but I THINK it's saying that if you |
73 |
do choose choice 1, it will recover your partition table for you. |
74 |
|
75 |
> In addition, some, but not all disk id info is gone: |
76 |
> 1) /dev/disk/by-id still shows ata-ST4000NM0033-9ZM170_Z1ZAZDJ0 (the |
77 |
> damaged disk) but none of its former partitions |
78 |
|
79 |
Because this is the disk, and you've damaged the contents, so this is |
80 |
completely unaffected. |
81 |
|
82 |
> 2) /dev/disk/by-partlabel shows entries for the undamaged disks in the |
83 |
> pool, but not the damaged one |
84 |
> 3) /dev/disk/by-partuuid similar to /dev/disk/by-partlabel |
85 |
|
86 |
For both of these, "part" is short for partition, and you've just |
87 |
trashed them ... |
88 |
|
89 |
> 4) /dev/disk/by-uuid does not show the damaged disk |
90 |
> |
91 |
Because the uuid is part of the partition table. |
92 |
|
93 |
> This particular disk is from a batch of 4 I bought with the same make |
94 |
> and specification and very similar ids (/dev/disk/by-id). Can I |
95 |
> repair this disk by copying something off one of those other disks |
96 |
> onto this one? |
97 |
|
98 |
GOD NO! You'll start copying uuids, so they'll no longer be unique, and |
99 |
things really will be broken! |
100 |
|
101 |
> Is repair just repartitioning - as in the Gentoo |
102 |
> handbook? Is it as simple as running gdisk and typing 1 to accept |
103 |
> gdisk's attempt at recovering the gpt? Is running gdisk's recovery |
104 |
> and transformation facilities the way to go (the b option looks like |
105 |
> it's made for exactly this situation)? |
106 |
> |
107 |
> Anybody experienced at this and willing to guide me? |
108 |
> |
109 |
Make sure that option 1 really does recover the GPT, then use it. Of |
110 |
course, the question then becomes what further damage will rear its head. |
111 |
|
112 |
You need to make sure that your raid 3 array can recover from a corrupt |
113 |
disk. THIS IS IMPORTANT. If you tried to recover an md-raid-5 array from |
114 |
this situation you'd almost certainly trash it completely. |
115 |
|
116 |
|
117 |
Actually, if your setup is raid, I'd just blow out the trashed disk |
118 |
completely. Take it out of your system, replace it, and let zfs repair |
119 |
itself onto the new disk. |
120 |
|
121 |
You can then zero out the old disk and it's now a spare. |
122 |
|
123 |
Just be careful here, because I don't know what zfs does, but btrfs by |
124 |
default mirrors metadata but not data, so with that you'd think a |
125 |
mirrored filesystem could repair itself but it can't ... if you want to |
126 |
repair the filesystem without rebuilding from scratch, you need to know |
127 |
rather more about zfs than I do ... |
128 |
|
129 |
Cheers, |
130 |
Wol |