Re: [gentoo-user] zfs repair needed (due to fingers being faster than brain) - gentoo-user

From:	Grant Taylor <gtaylor@×××××××××××××××××××××.net>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] zfs repair needed (due to fingers being faster than brain)
Date:	Tue, 02 Mar 2021 02:30:49
Message-Id:	`5fa8152a-1dcd-857d-af7e-c988065734c4@spamtrap.tnetconsulting.net`
In Reply to:	[gentoo-user] zfs repair needed (due to fingers being faster than brain) by John Blinka

1

On 3/1/21 3:25 PM, John Blinka wrote:

2

> HI, Gentooers!

3

4

Hi,

5

6

> So, I typed dd if=/dev/zero of=/dev/sd<wrong letter>, and despite

7

> hitting ctrl-c quite quickly, zeroed out some portion of the initial

8

> part of a disk.  Which did this to my zfs raidz3 array:

9

10

OOPS!!!

11

12

>      NAME                                         STATE     READ WRITE CKSUM

13

>      zfs                                          DEGRADED     0     0     0

14

>        raidz3-0                                   DEGRADED     0     0     0

15

>          ata-HGST_HUS724030ALE640_PK1234P8JJJVKP  ONLINE       0     0     0

16

>          ata-HGST_HUS724030ALE640_PK1234P8JJP3AP  ONLINE       0     0     0

17

>          ata-ST4000NM0033-9ZM170_Z1Z80P4C         ONLINE       0     0     0

18

>          ata-ST4000NM0033-9ZM170_Z1ZAZ8F1         ONLINE       0     0     0

19

>          14296253848142792483                     UNAVAIL      0     0

20

>     0  was /dev/disk/by-id/ata-ST4000NM0033-9ZM170_Z1ZAZDJ0-part1

21

>          ata-ST4000NM0033-9ZM170_Z1Z80KG0         ONLINE       0     0     0

22

23

Okay.  So the pool is online and the data is accessible.  That's 

24

actually better than I originally thought.  --  I thought you had 

25

accidentally damaged part of the ZFS partition that existed on a single 

26

disk.  --  I've been able to repair this with minimal data loss (zeros) 

27

with Oracle's help on Solaris in the past.

28

29

Aside:  My understanding is that ZFS stores multiple copies of it's 

30

metadata on the disk (assuming single disk) and that it is possible to 

31

recover a pool if any one (or maybe two for consistency checks) are 

32

viable.  Though doing so is further into the weeds than you normally 

33

want to be.

34

35

> Could have been worse.  I do have backups, and it is raid3, so all I've

36

> injured is my pride, but I do want to fix things.    I'd appreciate

37

> some guidance before I attempt doing this - I have no experience at

38

> it myself.

39

40

First, your pool / it's raidz3 is only 'DEGRADED', which means that the 

41

data is still accessible.  'OFFLINE' would be more problematic.

42

43

> The steps I envision are

44

>

45

> 1) zpool offline zfs 14296253848142792483 (What's that number?)

46

47

I'm guessing it's an internal ZFS serial number.  You will probably need 

48

to reference it.

49

50

I see no reason to take the pool offline.

51

52

> 2) do something to repair the damaged disk

53

54

I don't think you need to do anything at the individual disk level yet.

55

56

> 3) zpool online zfs <repaired disk>

57

58

I think you can fix this with the pool online.

59

60

> Right now, the device name for the damaged disk is /dev/sda.

61

> Gdisk says this about it:

62

>

63

> Caution: invalid main GPT header,

64

65

This is to be expected.

66

67

> but valid backup; regenerating main header from backup!

68

69

This looks promising.

70

71

> Warning: Invalid CRC on main header data; loaded backup partition table.

72

> Warning! Main and backup partition tables differ! Use the 'c' and 'e' options

73

> on the recovery & transformation menu to examine the two tables.

74

75

I'm assuming that the main partition table is at the start of the disk 

76

and that it's what got wiped out.

77

78

So I'd think that you can look at the 'c' and 'e' options on the 

79

recovery & transformation menu for options to repair the main partition 

80

table.

81

82

> Warning! Main partition table CRC mismatch! Loaded backup partition table

83

> instead of main partition table!

84

85

I know.  Thank you for using the backup partition table.

86

87

> Warning! One or more CRCs don't match. You should repair the disk!

88

89

I'm guessing that this is a direct result of the dd oops.  I would want 

90

more evidence to support it being a larger problem.

91

92

The CRC may be calculated over a partially zeroed chunk of disk.  (Chunk 

93

because I don't know what term is best here and I want to avoid implying 

94

anything specific or incorrectly.)

95

96

> Main header: ERROR

97

> Backup header: OK

98

> Main partition table: ERROR

99

> Backup partition table: OK

100

101

ACK

102

103

> Partition table scan:

104

>    MBR: not present

105

>    BSD: not present

106

>    APM: not present

107

>    GPT: damaged

108

>

109

> Found invalid MBR and corrupt GPT. What do you want to do? (Using the

110

> GPT MAY permit recovery of GPT data.)

111

>   1 - Use current GPT

112

>   2 - Create blank GPT

113

>

114

> Your answer: ( I haven't given one yet)

115

116

I'd assume #1, Use current GPT.

117

118

> I'm not exactly sure what this is telling me.  But I'm guessing it

119

> means that the main partition table is gone, but there's a good

120

> backup.

121

122

That's my interpretation too.

123

124

It jives with the description of what happened.

125

126

> In addition, some, but not all disk id info is gone:

127

> 1) /dev/disk/by-id still shows ata-ST4000NM0033-9ZM170_Z1ZAZDJ0

128

> (the damaged disk) but none of its former partitions

129

130

The disk ID still being there may be a symptom / side effect of when 

131

udev creates the links.  I would expect it to not be there post-reboot.

132

133

Well, maybe.  The disk serial number is independent of any data on the disk.

134

135

Partitions by ID would probably be gone post reboot (or eject and 

136

re-insertion).

137

138

> 2) /dev/disk/by-partlabel shows entries for the undamaged disks in

139

> the pool, but not the damaged one

140

141

Okay.  That means that udev is recognizing the change faster than I 

142

would have expected.

143

144

That probably means that the ID in #1 has survived any such update.

145

146

> 3) /dev/disk/by-partuuid similar to /dev/disk/by-partlabel

147

148

Given #2, I'm not surprised at #3.

149

150

> 4) /dev/disk/by-uuid does not show the damaged disk

151

152

Hum.

153

154

> This particular disk is from a batch of 4 I bought with the same make

155

> and specification and very similar ids (/dev/disk/by-id).  Can I

156

> repair this disk by copying something off one of those other disks

157

> onto this one?

158

159

Maybe.  But I would not bother.  (See below.)

160

161

> Is repair just repartitioning - as in the Gentoo handbook?  Is it

162

> as simple as running gdisk and typing 1 to accept gdisk's attempt at

163

> recovering the gpt?  Is running gdisk's recovery and transformation

164

> facilities the way to go (the b option looks like it's made for

165

> exactly this situation)?

166

167

gdisk will address the partition problem.  But that doesn't do anything 

168

for ZFS.

169

170

> Anybody experienced at this and willing to guide me?

171

172

I've not dealt with this particular problem.  But I have dealt with a 

173

few different things.

174

175

My course of action would be:

176

177

0)  Copy the entire disk to another disk if possible and if you are 

178

sufficiently paranoid.

179

1)  Let gdisk repair the main partition table using the data from the 

180

backup partition table.

181

2)  Leverage ZFS's ZRAID functionality to recover the ZFS data.

182

183

I /think/ that #2 can be done with one command.  Do your homework to 

184

understand, check, and validate this.  You are responsible for your own 

185

actions, despite what some random on the Internet says.  ;-)

186

187

    # zpool replace 14296253848142792483 sda

188

189

Assuming that /dev/sda is the corrupted disk.

190

191

This will cause ZFS to remove the 14296253848142792483 disk from the 

192

pool and rebuild onto the (/dev/)sda disk.  --  ZFS doesn't care that 

193

they are the same disk.

194

195

You can keep track of the resilver with something like the following:

196

197

    # while true; do zpool status zfs; sleep 60; done

198

199

Since your pool is only 'DEGRADED', you are probably in an okay 

200

position.  It's just a matter of not making things worse while trying to 

201

make them better.

202

203

Given that you have a RAIDZ3 and all of the other disks are ONLINE, your 

204

data should currently be safe.

--

209

Grant. . . .

210

unix || die

1	On 3/1/21 3:25 PM, John Blinka wrote:
2	> HI, Gentooers!
3
4	Hi,
5
6	> So, I typed dd if=/dev/zero of=/dev/sd<wrong letter>, and despite
7	> hitting ctrl-c quite quickly, zeroed out some portion of the initial
8	> part of a disk. Which did this to my zfs raidz3 array:
9
10	OOPS!!!
11
12	> NAME STATE READ WRITE CKSUM
13	> zfs DEGRADED 0 0 0
14	> raidz3-0 DEGRADED 0 0 0
15	> ata-HGST_HUS724030ALE640_PK1234P8JJJVKP ONLINE 0 0 0
16	> ata-HGST_HUS724030ALE640_PK1234P8JJP3AP ONLINE 0 0 0
17	> ata-ST4000NM0033-9ZM170_Z1Z80P4C ONLINE 0 0 0
18	> ata-ST4000NM0033-9ZM170_Z1ZAZ8F1 ONLINE 0 0 0
19	> 14296253848142792483 UNAVAIL 0 0
20	> 0 was /dev/disk/by-id/ata-ST4000NM0033-9ZM170_Z1ZAZDJ0-part1
21	> ata-ST4000NM0033-9ZM170_Z1Z80KG0 ONLINE 0 0 0
22
23	Okay. So the pool is online and the data is accessible. That's
24	actually better than I originally thought. -- I thought you had
25	accidentally damaged part of the ZFS partition that existed on a single
26	disk. -- I've been able to repair this with minimal data loss (zeros)
27	with Oracle's help on Solaris in the past.
28
29	Aside: My understanding is that ZFS stores multiple copies of it's
30	metadata on the disk (assuming single disk) and that it is possible to
31	recover a pool if any one (or maybe two for consistency checks) are
32	viable. Though doing so is further into the weeds than you normally
33	want to be.
34
35	> Could have been worse. I do have backups, and it is raid3, so all I've
36	> injured is my pride, but I do want to fix things. I'd appreciate
37	> some guidance before I attempt doing this - I have no experience at
38	> it myself.
39
40	First, your pool / it's raidz3 is only 'DEGRADED', which means that the
41	data is still accessible. 'OFFLINE' would be more problematic.
42
43	> The steps I envision are
44	>
45	> 1) zpool offline zfs 14296253848142792483 (What's that number?)
46
47	I'm guessing it's an internal ZFS serial number. You will probably need
48	to reference it.
49
50	I see no reason to take the pool offline.
51
52	> 2) do something to repair the damaged disk
53
54	I don't think you need to do anything at the individual disk level yet.
55
56	> 3) zpool online zfs <repaired disk>
57
58	I think you can fix this with the pool online.
59
60	> Right now, the device name for the damaged disk is /dev/sda.
61	> Gdisk says this about it:
62	>
63	> Caution: invalid main GPT header,
64
65	This is to be expected.
66
67	> but valid backup; regenerating main header from backup!
68
69	This looks promising.
70
71	> Warning: Invalid CRC on main header data; loaded backup partition table.
72	> Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
73	> on the recovery & transformation menu to examine the two tables.
74
75	I'm assuming that the main partition table is at the start of the disk
76	and that it's what got wiped out.
77
78	So I'd think that you can look at the 'c' and 'e' options on the
79	recovery & transformation menu for options to repair the main partition
80	table.
81
82	> Warning! Main partition table CRC mismatch! Loaded backup partition table
83	> instead of main partition table!
84
85	I know. Thank you for using the backup partition table.
86
87	> Warning! One or more CRCs don't match. You should repair the disk!
88
89	I'm guessing that this is a direct result of the dd oops. I would want
90	more evidence to support it being a larger problem.
91
92	The CRC may be calculated over a partially zeroed chunk of disk. (Chunk
93	because I don't know what term is best here and I want to avoid implying
94	anything specific or incorrectly.)
95
96	> Main header: ERROR
97	> Backup header: OK
98	> Main partition table: ERROR
99	> Backup partition table: OK
100
101	ACK
102
103	> Partition table scan:
104	> MBR: not present
105	> BSD: not present
106	> APM: not present
107	> GPT: damaged
108	>
109	> Found invalid MBR and corrupt GPT. What do you want to do? (Using the
110	> GPT MAY permit recovery of GPT data.)
111	> 1 - Use current GPT
112	> 2 - Create blank GPT
113	>
114	> Your answer: ( I haven't given one yet)
115
116	I'd assume #1, Use current GPT.
117
118	> I'm not exactly sure what this is telling me. But I'm guessing it
119	> means that the main partition table is gone, but there's a good
120	> backup.
121
122	That's my interpretation too.
123
124	It jives with the description of what happened.
125
126	> In addition, some, but not all disk id info is gone:
127	> 1) /dev/disk/by-id still shows ata-ST4000NM0033-9ZM170_Z1ZAZDJ0
128	> (the damaged disk) but none of its former partitions
129
130	The disk ID still being there may be a symptom / side effect of when
131	udev creates the links. I would expect it to not be there post-reboot.
132
133	Well, maybe. The disk serial number is independent of any data on the disk.
134
135	Partitions by ID would probably be gone post reboot (or eject and
136	re-insertion).
137
138	> 2) /dev/disk/by-partlabel shows entries for the undamaged disks in
139	> the pool, but not the damaged one
140
141	Okay. That means that udev is recognizing the change faster than I
142	would have expected.
143
144	That probably means that the ID in #1 has survived any such update.
145
146	> 3) /dev/disk/by-partuuid similar to /dev/disk/by-partlabel
147
148	Given #2, I'm not surprised at #3.
149
150	> 4) /dev/disk/by-uuid does not show the damaged disk
151
152	Hum.
153
154	> This particular disk is from a batch of 4 I bought with the same make
155	> and specification and very similar ids (/dev/disk/by-id). Can I
156	> repair this disk by copying something off one of those other disks
157	> onto this one?
158
159	Maybe. But I would not bother. (See below.)
160
161	> Is repair just repartitioning - as in the Gentoo handbook? Is it
162	> as simple as running gdisk and typing 1 to accept gdisk's attempt at
163	> recovering the gpt? Is running gdisk's recovery and transformation
164	> facilities the way to go (the b option looks like it's made for
165	> exactly this situation)?
166
167	gdisk will address the partition problem. But that doesn't do anything
168	for ZFS.
169
170	> Anybody experienced at this and willing to guide me?
171
172	I've not dealt with this particular problem. But I have dealt with a
173	few different things.
174
175	My course of action would be:
176
177	0) Copy the entire disk to another disk if possible and if you are
178	sufficiently paranoid.
179	1) Let gdisk repair the main partition table using the data from the
180	backup partition table.
181	2) Leverage ZFS's ZRAID functionality to recover the ZFS data.
182
183	I /think/ that #2 can be done with one command. Do your homework to
184	understand, check, and validate this. You are responsible for your own
185	actions, despite what some random on the Internet says. ;-)
186
187	# zpool replace 14296253848142792483 sda
188
189	Assuming that /dev/sda is the corrupted disk.
190
191	This will cause ZFS to remove the 14296253848142792483 disk from the
192	pool and rebuild onto the (/dev/)sda disk. -- ZFS doesn't care that
193	they are the same disk.
194
195	You can keep track of the resilver with something like the following:
196
197	# while true; do zpool status zfs; sleep 60; done
198
199	Since your pool is only 'DEGRADED', you are probably in an okay
200	position. It's just a matter of not making things worse while trying to
201	make them better.
202
203	Given that you have a RAIDZ3 and all of the other disks are ONLINE, your
204	data should currently be safe.
205
206
207
208	--
209	Grant. . . .
210	unix \|\| die

Gentoo Archives: gentoo-user