1 |
"Brian Budge" writes: |
2 |
> |
3 |
> Hi all - |
4 |
> |
5 |
> I'm new to infiniband and still getting my feet wet. I am admining a very |
6 |
> small cluster of 5 nodes, and have recently installed infiniband HCAs. I |
7 |
> have the infiniband modules built into the kernel, and I am using the |
8 |
> openib-userspace package in the gentoo-science overlay. |
9 |
> |
10 |
> The strange thing with my situation is that I have infiniband working with |
11 |
> openmpi on 4 of my 5 nodes, but the 5th one is a mystery. |
12 |
> |
13 |
> All 4 working nodes have a /dev/infiniband directory that look roughly like |
14 |
> this: |
15 |
> |
16 |
> crw-rw---- 1 root root 231, 64 Dec 31 09:13 issm0 |
17 |
> crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0 |
18 |
> crw-rw---- 1 root root 231, 0 Dec 31 09:13 umad0 |
19 |
> crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0 |
20 |
> |
21 |
> |
22 |
> But the 5th node doesn't, which could indicate the problem (it isn't |
23 |
> completely the problem, as I tried making those nodes myself to match, but |
24 |
> it doesn't help). I'm just not sure what the difference is, because I |
25 |
> installed them all the same way, they all have the same hardware, and they |
26 |
> are all running the same kernel. |
27 |
|
28 |
The '/dev/infiniband' subdir is created by the udev rules in '/etc/udev/rules.d/40-ib.rules' |
29 |
|
30 |
Does the '/sys/class/infiniband' directory exist? |
31 |
If so, what does it contain? What loaded modules with an 'ib_' prefix does |
32 |
lsmod report? |
33 |
|
34 |
-bryan |
35 |
|
36 |
-- |
37 |
gentoo-cluster@g.o mailing list |