Re: [gentoo-user] zfs repair needed (due to fingers being faster than brain) - gentoo-user

From:	antlists <antlists@××××××××××××.uk>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] zfs repair needed (due to fingers being faster than brain)
Date:	Tue, 02 Mar 2021 01:15:18
Message-Id:	`f633e7bf-0e8c-16a5-cb57-d21311be9415@youngman.org.uk`
In Reply to:	[gentoo-user] zfs repair needed (due to fingers being faster than brain) by John Blinka

1

Firstly, I'll say I'm not experienced, but knowing a fair bit about raid 

2

and recovering corrupted arrays ...

3

4

On 01/03/2021 22:25, John Blinka wrote:

5

> HI, Gentooers!

6

>

7

> So, I typed dd if=/dev/zero of=/dev/sd<wrong letter>, and despite

8

> hitting ctrl-c quite quickly, zeroed out some portion of the initial

9

> part of a disk.  Which did this to my zfs raidz3 array:

10

>

11

>      NAME                                         STATE     READ WRITE CKSUM

12

>      zfs                                          DEGRADED     0     0     0

13

>        raidz3-0                                   DEGRADED     0     0     0

14

>          ata-HGST_HUS724030ALE640_PK1234P8JJJVKP  ONLINE       0     0     0

15

>          ata-HGST_HUS724030ALE640_PK1234P8JJP3AP  ONLINE       0     0     0

16

>          ata-ST4000NM0033-9ZM170_Z1Z80P4C         ONLINE       0     0     0

17

>          ata-ST4000NM0033-9ZM170_Z1ZAZ8F1         ONLINE       0     0     0

18

>          14296253848142792483                     UNAVAIL      0     0

19

>     0  was /dev/disk/by-id/ata-ST4000NM0033-9ZM170_Z1ZAZDJ0-part1

20

>          ata-ST4000NM0033-9ZM170_Z1Z80KG0         ONLINE       0     0     0

21

>

22

> Could have been worse.  I do have backups, and it is raid3, so all

23

> I've injured is my pride, but I do want to fix things.    I'd

24

> appreciate some guidance before I attempt doing this - I have no

25

> experience at it myself.

26

>

27

> The steps I envision are

28

>

29

> 1) zpool offline zfs 14296253848142792483 (What's that number?)

30

> 2) do something to repair the damaged disk

31

> 3) zpool online zfs <repaired disk>

32

>

33

> Right now, the device name for the damaged disk is /dev/sda.  Gdisk

34

> says this about it:

35

>

36

> Caution: invalid main GPT header, but valid backup; regenerating main header

37

> from backup!

38

39

The GPT table is stored at least twice, this is telling you the primary 

40

copy is trashed, but the backup seems okay ...

41

>

42

> Warning: Invalid CRC on main header data; loaded backup partition table.

43

> Warning! Main and backup partition tables differ! Use the 'c' and 'e' options

44

> on the recovery & transformation menu to examine the two tables.

45

>

46

> Warning! Main partition table CRC mismatch! Loaded backup partition table

47

> instead of main partition table!

48

>

49

> Warning! One or more CRCs don't match. You should repair the disk!

50

> Main header: ERROR

51

> Backup header: OK

52

> Main partition table: ERROR

53

> Backup partition table: OK

54

>

55

> Partition table scan:

56

>    MBR: not present

57

>    BSD: not present

58

>    APM: not present

59

>    GPT: damaged

60

>

61

> Found invalid MBR and corrupt GPT. What do you want to do? (Using the

62

> GPT MAY permit recovery of GPT data.)

63

>   1 - Use current GPT

64

>   2 - Create blank GPT

65

>

66

> Your answer: ( I haven't given one yet)

67

>

68

> I'm not exactly sure what this is telling me.  But I'm guessing it

69

> means that the main partition table is gone, but there's a good

70

> backup.

71

72

Yup. I don't understand that prompt, but I THINK it's saying that if you 

73

do choose choice 1, it will recover your partition table for you.

74

75

>  In addition, some, but not all disk id info is gone:

76

> 1) /dev/disk/by-id still shows ata-ST4000NM0033-9ZM170_Z1ZAZDJ0 (the

77

> damaged disk) but none of its former partitions

78

79

Because this is the disk, and you've damaged the contents, so this is 

80

completely unaffected.

81

82

> 2) /dev/disk/by-partlabel shows entries for the undamaged disks in the

83

> pool, but not the damaged one

84

> 3) /dev/disk/by-partuuid similar to /dev/disk/by-partlabel

85

86

For both of these, "part" is short for partition, and you've just 

87

trashed them ...

88

89

> 4) /dev/disk/by-uuid does not show the damaged disk

90

>

91

Because the uuid is part of the partition table.

92

93

> This particular disk is from a batch of 4 I bought with the same make

94

> and specification and very similar ids (/dev/disk/by-id).  Can I

95

> repair this disk by copying something off one of those other disks

96

> onto this one?

97

98

GOD NO! You'll start copying uuids, so they'll no longer be unique, and 

99

things really will be broken!

100

101

> Is repair just repartitioning - as in the Gentoo

102

> handbook?  Is it as simple as running gdisk and typing 1 to accept

103

> gdisk's attempt at recovering the gpt?  Is running gdisk's recovery

104

> and transformation facilities the way to go (the b option looks like

105

> it's made for exactly this situation)?

106

>

107

> Anybody experienced at this and willing to guide me?

108

>

109

Make sure that option 1 really does recover the GPT, then use it. Of 

110

course, the question then becomes what further damage will rear its head.

111

112

You need to make sure that your raid 3 array can recover from a corrupt 

113

disk. THIS IS IMPORTANT. If you tried to recover an md-raid-5 array from 

114

this situation you'd almost certainly trash it completely.

115

116

117

Actually, if your setup is raid, I'd just blow out the trashed disk 

118

completely. Take it out of your system, replace it, and let zfs repair 

119

itself onto the new disk.

120

121

You can then zero out the old disk and it's now a spare.

122

123

Just be careful here, because I don't know what zfs does, but btrfs by 

124

default mirrors metadata but not data, so with that you'd think a 

125

mirrored filesystem could repair itself but it can't ... if you want to 

126

repair the filesystem without rebuilding from scratch, you need to know 

127

rather more about zfs than I do ...

128

129

Cheers,

130

Wol

1	Firstly, I'll say I'm not experienced, but knowing a fair bit about raid
2	and recovering corrupted arrays ...
3
4	On 01/03/2021 22:25, John Blinka wrote:
5	> HI, Gentooers!
6	>
7	> So, I typed dd if=/dev/zero of=/dev/sd<wrong letter>, and despite
8	> hitting ctrl-c quite quickly, zeroed out some portion of the initial
9	> part of a disk. Which did this to my zfs raidz3 array:
10	>
11	> NAME STATE READ WRITE CKSUM
12	> zfs DEGRADED 0 0 0
13	> raidz3-0 DEGRADED 0 0 0
14	> ata-HGST_HUS724030ALE640_PK1234P8JJJVKP ONLINE 0 0 0
15	> ata-HGST_HUS724030ALE640_PK1234P8JJP3AP ONLINE 0 0 0
16	> ata-ST4000NM0033-9ZM170_Z1Z80P4C ONLINE 0 0 0
17	> ata-ST4000NM0033-9ZM170_Z1ZAZ8F1 ONLINE 0 0 0
18	> 14296253848142792483 UNAVAIL 0 0
19	> 0 was /dev/disk/by-id/ata-ST4000NM0033-9ZM170_Z1ZAZDJ0-part1
20	> ata-ST4000NM0033-9ZM170_Z1Z80KG0 ONLINE 0 0 0
21	>
22	> Could have been worse. I do have backups, and it is raid3, so all
23	> I've injured is my pride, but I do want to fix things. I'd
24	> appreciate some guidance before I attempt doing this - I have no
25	> experience at it myself.
26	>
27	> The steps I envision are
28	>
29	> 1) zpool offline zfs 14296253848142792483 (What's that number?)
30	> 2) do something to repair the damaged disk
31	> 3) zpool online zfs <repaired disk>
32	>
33	> Right now, the device name for the damaged disk is /dev/sda. Gdisk
34	> says this about it:
35	>
36	> Caution: invalid main GPT header, but valid backup; regenerating main header
37	> from backup!
38
39	The GPT table is stored at least twice, this is telling you the primary
40	copy is trashed, but the backup seems okay ...
41	>
42	> Warning: Invalid CRC on main header data; loaded backup partition table.
43	> Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
44	> on the recovery & transformation menu to examine the two tables.
45	>
46	> Warning! Main partition table CRC mismatch! Loaded backup partition table
47	> instead of main partition table!
48	>
49	> Warning! One or more CRCs don't match. You should repair the disk!
50	> Main header: ERROR
51	> Backup header: OK
52	> Main partition table: ERROR
53	> Backup partition table: OK
54	>
55	> Partition table scan:
56	> MBR: not present
57	> BSD: not present
58	> APM: not present
59	> GPT: damaged
60	>
61	> Found invalid MBR and corrupt GPT. What do you want to do? (Using the
62	> GPT MAY permit recovery of GPT data.)
63	> 1 - Use current GPT
64	> 2 - Create blank GPT
65	>
66	> Your answer: ( I haven't given one yet)
67	>
68	> I'm not exactly sure what this is telling me. But I'm guessing it
69	> means that the main partition table is gone, but there's a good
70	> backup.
71
72	Yup. I don't understand that prompt, but I THINK it's saying that if you
73	do choose choice 1, it will recover your partition table for you.
74
75	> In addition, some, but not all disk id info is gone:
76	> 1) /dev/disk/by-id still shows ata-ST4000NM0033-9ZM170_Z1ZAZDJ0 (the
77	> damaged disk) but none of its former partitions
78
79	Because this is the disk, and you've damaged the contents, so this is
80	completely unaffected.
81
82	> 2) /dev/disk/by-partlabel shows entries for the undamaged disks in the
83	> pool, but not the damaged one
84	> 3) /dev/disk/by-partuuid similar to /dev/disk/by-partlabel
85
86	For both of these, "part" is short for partition, and you've just
87	trashed them ...
88
89	> 4) /dev/disk/by-uuid does not show the damaged disk
90	>
91	Because the uuid is part of the partition table.
92
93	> This particular disk is from a batch of 4 I bought with the same make
94	> and specification and very similar ids (/dev/disk/by-id). Can I
95	> repair this disk by copying something off one of those other disks
96	> onto this one?
97
98	GOD NO! You'll start copying uuids, so they'll no longer be unique, and
99	things really will be broken!
100
101	> Is repair just repartitioning - as in the Gentoo
102	> handbook? Is it as simple as running gdisk and typing 1 to accept
103	> gdisk's attempt at recovering the gpt? Is running gdisk's recovery
104	> and transformation facilities the way to go (the b option looks like
105	> it's made for exactly this situation)?
106	>
107	> Anybody experienced at this and willing to guide me?
108	>
109	Make sure that option 1 really does recover the GPT, then use it. Of
110	course, the question then becomes what further damage will rear its head.
111
112	You need to make sure that your raid 3 array can recover from a corrupt
113	disk. THIS IS IMPORTANT. If you tried to recover an md-raid-5 array from
114	this situation you'd almost certainly trash it completely.
115
116
117	Actually, if your setup is raid, I'd just blow out the trashed disk
118	completely. Take it out of your system, replace it, and let zfs repair
119	itself onto the new disk.
120
121	You can then zero out the old disk and it's now a spare.
122
123	Just be careful here, because I don't know what zfs does, but btrfs by
124	default mirrors metadata but not data, so with that you'd think a
125	mirrored filesystem could repair itself but it can't ... if you want to
126	repair the filesystem without rebuilding from scratch, you need to know
127	rather more about zfs than I do ...
128
129	Cheers,
130	Wol

Gentoo Archives: gentoo-user