1 |
Hi Bryan - |
2 |
|
3 |
I don't seem to have a 40-ib.rules in any of my /etc/udev/rules.d on any |
4 |
node. |
5 |
|
6 |
My /sys/class/infiniband directory contains mthca0, which contains: |
7 |
> ls -la /sys/class/infiniband/mthca0/ |
8 |
total 0 |
9 |
drwxr-xr-x 3 root root 0 Jan 2 20:54 . |
10 |
drwxr-xr-x 3 root root 0 Jan 2 20:54 .. |
11 |
-r--r--r-- 1 root root 4096 Jan 2 21:07 board_id |
12 |
lrwxrwxrwx 1 root root 0 Jan 3 00:01 device -> |
13 |
../../../devices/pci0000:20/0000:20:0a.0/0000:21:00.0 |
14 |
-r--r--r-- 1 root root 4096 Jan 2 21:07 fw_ver |
15 |
-r--r--r-- 1 root root 4096 Jan 2 21:07 hca_type |
16 |
-r--r--r-- 1 root root 4096 Jan 2 21:07 hw_rev |
17 |
-rw-r--r-- 1 root root 4096 Jan 2 21:07 node_desc |
18 |
-r--r--r-- 1 root root 4096 Jan 2 21:07 node_guid |
19 |
-r--r--r-- 1 root root 4096 Jan 2 21:06 node_type |
20 |
drwxr-xr-x 3 root root 0 Jan 2 21:07 ports |
21 |
lrwxrwxrwx 1 root root 0 Jan 3 00:01 subsystem -> |
22 |
../../../class/infiniband |
23 |
-r--r--r-- 1 root root 4096 Jan 2 21:07 sys_image_guid |
24 |
--w------- 1 root root 4096 Jan 2 20:54 uevent |
25 |
|
26 |
I don't have any ib modules loaded at all on any node. All of my kernel |
27 |
modules are built into the kernel: |
28 |
|
29 |
CONFIG_INFINIBAND=y |
30 |
CONFIG_INFINIBAND_USER_MAD=y |
31 |
CONFIG_INFINIBAND_USER_ACCESS=y |
32 |
CONFIG_INFINIBAND_USER_MEM=y |
33 |
CONFIG_INFINIBAND_ADDR_TRANS=y |
34 |
CONFIG_INFINIBAND_MTHCA=y |
35 |
CONFIG_INFINIBAND_MTHCA_DEBUG=y |
36 |
# CONFIG_INFINIBAND_IPATH is not set |
37 |
CONFIG_INFINIBAND_AMSO1100=y |
38 |
# CONFIG_INFINIBAND_AMSO1100_DEBUG is not set |
39 |
CONFIG_MLX4_INFINIBAND=y |
40 |
CONFIG_INFINIBAND_IPOIB=y |
41 |
# CONFIG_INFINIBAND_IPOIB_CM is not set |
42 |
CONFIG_INFINIBAND_IPOIB_DEBUG=y |
43 |
# CONFIG_INFINIBAND_IPOIB_DEBUG_DATA is not set |
44 |
# CONFIG_INFINIBAND_SRP is not set |
45 |
# CONFIG_INFINIBAND_ISER is not set |
46 |
|
47 |
|
48 |
Thanks, |
49 |
Brian |
50 |
|
51 |
On Jan 2, 2008 2:11 PM, Bryan Green <bryan.d.green@××××.gov> wrote: |
52 |
|
53 |
> "Brian Budge" writes: |
54 |
> > |
55 |
> > Hi all - |
56 |
> > |
57 |
> > I'm new to infiniband and still getting my feet wet. I am admining a |
58 |
> very |
59 |
> > small cluster of 5 nodes, and have recently installed infiniband HCAs. |
60 |
> I |
61 |
> > have the infiniband modules built into the kernel, and I am using the |
62 |
> > openib-userspace package in the gentoo-science overlay. |
63 |
> > |
64 |
> > The strange thing with my situation is that I have infiniband working |
65 |
> with |
66 |
> > openmpi on 4 of my 5 nodes, but the 5th one is a mystery. |
67 |
> > |
68 |
> > All 4 working nodes have a /dev/infiniband directory that look roughly |
69 |
> like |
70 |
> > this: |
71 |
> > |
72 |
> > crw-rw---- 1 root root 231, 64 Dec 31 09:13 issm0 |
73 |
> > crw-rw-rw- 1 root root 231, 224 Dec 31 09:13 ucm0 |
74 |
> > crw-rw---- 1 root root 231, 0 Dec 31 09:13 umad0 |
75 |
> > crw-rw-rw- 1 root root 231, 192 Dec 31 09:13 uverbs0 |
76 |
> > |
77 |
> > |
78 |
> > But the 5th node doesn't, which could indicate the problem (it isn't |
79 |
> > completely the problem, as I tried making those nodes myself to match, |
80 |
> but |
81 |
> > it doesn't help). I'm just not sure what the difference is, because I |
82 |
> > installed them all the same way, they all have the same hardware, and |
83 |
> they |
84 |
> > are all running the same kernel. |
85 |
> |
86 |
> The '/dev/infiniband' subdir is created by the udev rules in |
87 |
> '/etc/udev/rules.d/40-ib.rules' |
88 |
> |
89 |
> Does the '/sys/class/infiniband' directory exist? |
90 |
> If so, what does it contain? What loaded modules with an 'ib_' prefix |
91 |
> does |
92 |
> lsmod report? |
93 |
> |
94 |
> -bryan |
95 |
> |
96 |
> -- |
97 |
> gentoo-cluster@g.o mailing list |
98 |
> |
99 |
> |