1 |
On Суббота 14 февраля 2009 06:28:44 Vittorio wrote: |
2 |
> Hello! |
3 |
> This is my first message on the list so i hope that i'm not going to ask |
4 |
> silly or already answered question ^^ |
5 |
> |
6 |
> i'm a student and i'm porting an electromagnetic field simulator to a |
7 |
> parallel and distributed linux cluster for final thesis; i'm using both |
8 |
> OpenMP and MPI over Infiniband to achieve speed improvements |
9 |
> |
10 |
> the openmp part is done and now i'm facing problem with setting up MPI over |
11 |
> Infinband |
12 |
> i have correctly set up the kernel modules |
13 |
> installed the right drivers for the board (a mellanox hca) and userspace |
14 |
> programs (os |
15 |
> installed mpavich2 mpi implementation (thanks to msg [1]) |
16 |
> |
17 |
> however i fail to run all of this together: |
18 |
> for example ibhost correctly find the two nodes connected |
19 |
> |
20 |
> Ca : 0x0002c90300018b8e ports 2 " HCA-1" |
21 |
> Ca : 0x0002c90300018b12 ports 2 "localhost HCA-1" |
22 |
> |
23 |
> but ibping doens't receive responses |
24 |
> |
25 |
> ibwarn: [32052] ibping: Ping.. |
26 |
> ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2) |
27 |
> ibwarn: [32052] main: ibping to Lid 2 failed |
28 |
> |
29 |
> subsequently any other operation with MPI fails |
30 |
> strangely enough however IPoIB works very well and i can ping and connect |
31 |
> with no problems |
32 |
> |
33 |
> the two machines are identical and they use a crossover cable (point to |
34 |
> point) |
35 |
> 03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe |
36 |
> 2.0 2.5GT/s] (rev a0) |
37 |
> |
38 |
> what can be the cause of all of this? am i forgetting something? |
39 |
> any help is greatly appreciated |
40 |
> |
41 |
> |
42 |
> for the mantainers, is it possible to have openib-diags installed in |
43 |
> /usr/bin instead of /usr/sbin? most of the files recall other programs or |
44 |
> script only from /usr/bin |
45 |
> i resolved doing |
46 |
> |
47 |
> for x in `ls -l /usr/sbin/ib*|awk '{print $9}'`; do ln -s $x |
48 |
> /usr/bin/`basename $x`; done |
49 |
> |
50 |
> [1] |
51 |
> http://archives.gentoo.org/gentoo-science/msg_7c030de6ea7ce8673ab90061a066d |
52 |
>f28.xml |
53 |
> |
54 |
> |
55 |
> thanks a lot |
56 |
> Vittorio |
57 |
Hi |
58 |
First of all you should be shore that opensm was started |
59 |
Then you could use ib subnet |
60 |
for mpi trqansport i use openmpi |
61 |
-- |
62 |
Alexey 'Alexxy' Shvetsov |
63 |
Gentoo/KDE |
64 |
Gentoo/MIPS |
65 |
Gentoo Team Ru |