Gentoo Archives: gentoo-user

From: "Stefan G. Weichinger" <lists@×××××.at>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] sg_map etc
Date: Thu, 02 Mar 2017 10:16:42
Message-Id: 0d89ad11-c110-c4b1-8f73-1e08cf570864@xunil.at
In Reply to: Re: [gentoo-user] sg_map etc by Daniel Frey
1 Am 2017-03-01 um 22:42 schrieb Daniel Frey:
2
3 > I'm not sure how the sg? -> sd? mapping is supposed to work. I find it
4 > odd that there seems to be two nodes reported for each sd? entry.
5 > However, this could be the way the controller driver reports it to the
6 > kernel...
7 >
8 >> 07:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030
9 >> PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
10 >> 0a:0e.0 RAID bus controller: Adaptec AAC-RAID
11 >>
12 >
13 > Well, if you are using a hw raid card in jbod mode the controller will
14 > generally not report that info. You'd have to install the controller's
15 > cli management tools and use that. You'd have to figure out which
16 > controller your drives are attached to.
17 >
18 > Adaptec uses sys-block/arcconf
19 > LSI uses sys-block/megacli
20 > 3ware uses sys-block/tw_cli
21
22 yes, thanks.
23 arcconf doesn't do much here ... tried some commands, but the controller
24 doesn't return info.
25
26 Maybe not the disks itself die but the controller gets flaky ... quite
27 old already and I had issues at warm boot lately that were only solved
28 by removing power completely.
29
30 See these lines in dmesg:
31
32 [74403.796012] aacraid: Host adapter abort request (1,0,0,0)
33 [74403.804011] aacraid: Host adapter abort request (1,0,1,0)
34 [74403.804033] aacraid: Host adapter reset request. SCSI hang ?
35 [74403.804040] AAC: Host adapter BLINK LED 0x7
36 [74403.804056] AAC0: adapter kernel panic'd 7.
37 [74509.788015] aacraid: Host adapter abort request (1,0,0,0)
38 [74511.804015] aacraid: Host adapter abort request (1,0,1,0)
39 [74511.804041] aacraid: Host adapter reset request. SCSI hang ?
40 [74511.804044] AAC: Host adapter BLINK LED 0x7
41 [74511.804068] AAC0: adapter kernel panic'd 7.
42
43 And sdi throws errors:
44
45 [31529.901711] md/raid:md3: read error corrected (8 sectors at 11190152
46 on sdi1)
47 [31529.901713] md/raid:md3: read error corrected (8 sectors at 11190160
48 on sdi1)
49 [31529.901715] md/raid:md3: read error corrected (8 sectors at 11190168
50 on sdi1)
51 [31529.901717] md/raid:md3: read error corrected (8 sectors at 11190176
52 on sdi1)
53 [31529.901718] md/raid:md3: read error corrected (8 sectors at 11190184
54 on sdi1)
55
56 I wonder if one or more disks do any kind of electrical "noise" on the
57 SATA bus and confuse the controller in a way.
58
59 This is why I would like to remove sdi ... and the reason why I want to
60 spot that specific hdd.
61
62 Back then I used the trick to stress that specific disk by dd or
63 something (read everything in for example) and let a person spot the
64 disk by looking at the LEDs on the drive cages ;-)
65
66 Maybe the faster way in this case.
67
68 > The management tools for the other cards should provide this sort of
69 > functionality.
70 >
71 > If you had used the raid card to create an array the management cli
72 > tools with show that a specific port is dead and you query it for the
73 > serial number.
74 >
75 > This doesn't help you with the sg mapping. The problem for you now will
76 > be figuring out why sg_map is reporting the way it is.
77
78 The disks were originally configured via StorMan under SLES10 or so,
79 that server was a SLES server back then and I moved it to gentoo later on.
80
81 I could boot into SLES to have StorMan again, but this leads to the
82 mentioned boot-failure, so I want to avoid that for now.
83
84 Something is wrong with this box and I have to spot if it's the disk(s)
85 or the controller. All this while I am >600km away from the server.
86
87 Thanks, Stefan