1 |
Am 2017-03-01 um 22:42 schrieb Daniel Frey: |
2 |
|
3 |
> I'm not sure how the sg? -> sd? mapping is supposed to work. I find it |
4 |
> odd that there seems to be two nodes reported for each sd? entry. |
5 |
> However, this could be the way the controller driver reports it to the |
6 |
> kernel... |
7 |
> |
8 |
>> 07:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030 |
9 |
>> PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08) |
10 |
>> 0a:0e.0 RAID bus controller: Adaptec AAC-RAID |
11 |
>> |
12 |
> |
13 |
> Well, if you are using a hw raid card in jbod mode the controller will |
14 |
> generally not report that info. You'd have to install the controller's |
15 |
> cli management tools and use that. You'd have to figure out which |
16 |
> controller your drives are attached to. |
17 |
> |
18 |
> Adaptec uses sys-block/arcconf |
19 |
> LSI uses sys-block/megacli |
20 |
> 3ware uses sys-block/tw_cli |
21 |
|
22 |
yes, thanks. |
23 |
arcconf doesn't do much here ... tried some commands, but the controller |
24 |
doesn't return info. |
25 |
|
26 |
Maybe not the disks itself die but the controller gets flaky ... quite |
27 |
old already and I had issues at warm boot lately that were only solved |
28 |
by removing power completely. |
29 |
|
30 |
See these lines in dmesg: |
31 |
|
32 |
[74403.796012] aacraid: Host adapter abort request (1,0,0,0) |
33 |
[74403.804011] aacraid: Host adapter abort request (1,0,1,0) |
34 |
[74403.804033] aacraid: Host adapter reset request. SCSI hang ? |
35 |
[74403.804040] AAC: Host adapter BLINK LED 0x7 |
36 |
[74403.804056] AAC0: adapter kernel panic'd 7. |
37 |
[74509.788015] aacraid: Host adapter abort request (1,0,0,0) |
38 |
[74511.804015] aacraid: Host adapter abort request (1,0,1,0) |
39 |
[74511.804041] aacraid: Host adapter reset request. SCSI hang ? |
40 |
[74511.804044] AAC: Host adapter BLINK LED 0x7 |
41 |
[74511.804068] AAC0: adapter kernel panic'd 7. |
42 |
|
43 |
And sdi throws errors: |
44 |
|
45 |
[31529.901711] md/raid:md3: read error corrected (8 sectors at 11190152 |
46 |
on sdi1) |
47 |
[31529.901713] md/raid:md3: read error corrected (8 sectors at 11190160 |
48 |
on sdi1) |
49 |
[31529.901715] md/raid:md3: read error corrected (8 sectors at 11190168 |
50 |
on sdi1) |
51 |
[31529.901717] md/raid:md3: read error corrected (8 sectors at 11190176 |
52 |
on sdi1) |
53 |
[31529.901718] md/raid:md3: read error corrected (8 sectors at 11190184 |
54 |
on sdi1) |
55 |
|
56 |
I wonder if one or more disks do any kind of electrical "noise" on the |
57 |
SATA bus and confuse the controller in a way. |
58 |
|
59 |
This is why I would like to remove sdi ... and the reason why I want to |
60 |
spot that specific hdd. |
61 |
|
62 |
Back then I used the trick to stress that specific disk by dd or |
63 |
something (read everything in for example) and let a person spot the |
64 |
disk by looking at the LEDs on the drive cages ;-) |
65 |
|
66 |
Maybe the faster way in this case. |
67 |
|
68 |
> The management tools for the other cards should provide this sort of |
69 |
> functionality. |
70 |
> |
71 |
> If you had used the raid card to create an array the management cli |
72 |
> tools with show that a specific port is dead and you query it for the |
73 |
> serial number. |
74 |
> |
75 |
> This doesn't help you with the sg mapping. The problem for you now will |
76 |
> be figuring out why sg_map is reporting the way it is. |
77 |
|
78 |
The disks were originally configured via StorMan under SLES10 or so, |
79 |
that server was a SLES server back then and I moved it to gentoo later on. |
80 |
|
81 |
I could boot into SLES to have StorMan again, but this leads to the |
82 |
mentioned boot-failure, so I want to avoid that for now. |
83 |
|
84 |
Something is wrong with this box and I have to spot if it's the disk(s) |
85 |
or the controller. All this while I am >600km away from the server. |
86 |
|
87 |
Thanks, Stefan |