1 |
Hello! |
2 |
This is my first message on the list so i hope that i'm not going to ask |
3 |
silly or already answered question ^^ |
4 |
|
5 |
i'm a student and i'm porting an electromagnetic field simulator to a |
6 |
parallel and distributed linux cluster for final thesis; i'm using both |
7 |
OpenMP and MPI over Infiniband to achieve speed improvements |
8 |
|
9 |
the openmp part is done and now i'm facing problem with setting up MPI over |
10 |
Infinband |
11 |
i have correctly set up the kernel modules |
12 |
installed the right drivers for the board (a mellanox hca) and userspace |
13 |
programs (os |
14 |
installed mpavich2 mpi implementation (thanks to msg [1]) |
15 |
|
16 |
however i fail to run all of this together: |
17 |
for example ibhost correctly find the two nodes connected |
18 |
|
19 |
Ca : 0x0002c90300018b8e ports 2 " HCA-1" |
20 |
Ca : 0x0002c90300018b12 ports 2 "localhost HCA-1" |
21 |
|
22 |
but ibping doens't receive responses |
23 |
|
24 |
ibwarn: [32052] ibping: Ping.. |
25 |
ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2) |
26 |
ibwarn: [32052] main: ibping to Lid 2 failed |
27 |
|
28 |
subsequently any other operation with MPI fails |
29 |
strangely enough however IPoIB works very well and i can ping and connect |
30 |
with no problems |
31 |
|
32 |
the two machines are identical and they use a crossover cable (point to |
33 |
point) |
34 |
03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0 |
35 |
2.5GT/s] (rev a0) |
36 |
|
37 |
what can be the cause of all of this? am i forgetting something? |
38 |
any help is greatly appreciated |
39 |
|
40 |
|
41 |
for the mantainers, is it possible to have openib-diags installed in |
42 |
/usr/bin instead of /usr/sbin? most of the files recall other programs or |
43 |
script only from /usr/bin |
44 |
i resolved doing |
45 |
|
46 |
for x in `ls -l /usr/sbin/ib*|awk '{print $9}'`; do ln -s $x |
47 |
/usr/bin/`basename $x`; done |
48 |
|
49 |
[1] |
50 |
http://archives.gentoo.org/gentoo-science/msg_7c030de6ea7ce8673ab90061a066df28.xml |
51 |
|
52 |
|
53 |
thanks a lot |
54 |
Vittorio |