Gentoo Archives: gentoo-science

From: Alexey Shvetsov <alexxy@g.o>
To: gentoo-science@l.g.o
Subject: Re: [gentoo-science] [sys-cluster] help with infiniband and mpi
Date: Sat, 14 Feb 2009 10:52:09
Message-Id: 200902141352.05112.alexxy@gentoo.org
In Reply to: [gentoo-science] [sys-cluster] help with infiniband and mpi by Vittorio
1 On Суббота 14 февраля 2009 06:28:44 Vittorio wrote:
2 > Hello!
3 > This is my first message on the list so i hope that i'm not going to ask
4 > silly or already answered question ^^
5 >
6 > i'm a student and i'm porting an electromagnetic field simulator to a
7 > parallel and distributed linux cluster for final thesis; i'm using both
8 > OpenMP and MPI over Infiniband to achieve speed improvements
9 >
10 > the openmp part is done and now i'm facing problem with setting up MPI over
11 > Infinband
12 > i have correctly set up the kernel modules
13 > installed the right drivers for the board (a mellanox hca) and userspace
14 > programs (os
15 > installed mpavich2 mpi implementation (thanks to msg [1])
16 >
17 > however i fail to run all of this together:
18 > for example ibhost correctly find the two nodes connected
19 >
20 > Ca : 0x0002c90300018b8e ports 2 " HCA-1"
21 > Ca : 0x0002c90300018b12 ports 2 "localhost HCA-1"
22 >
23 > but ibping doens't receive responses
24 >
25 > ibwarn: [32052] ibping: Ping..
26 > ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
27 > ibwarn: [32052] main: ibping to Lid 2 failed
28 >
29 > subsequently any other operation with MPI fails
30 > strangely enough however IPoIB works very well and i can ping and connect
31 > with no problems
32 >
33 > the two machines are identical and they use a crossover cable (point to
34 > point)
35 > 03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe
36 > 2.0 2.5GT/s] (rev a0)
37 >
38 > what can be the cause of all of this? am i forgetting something?
39 > any help is greatly appreciated
40 >
41 >
42 > for the mantainers, is it possible to have openib-diags installed in
43 > /usr/bin instead of /usr/sbin? most of the files recall other programs or
44 > script only from /usr/bin
45 > i resolved doing
46 >
47 > for x in `ls -l /usr/sbin/ib*|awk '{print $9}'`; do ln -s $x
48 > /usr/bin/`basename $x`; done
49 >
50 > [1]
51 > http://archives.gentoo.org/gentoo-science/msg_7c030de6ea7ce8673ab90061a066d
52 >f28.xml
53 >
54 >
55 > thanks a lot
56 > Vittorio
57 Hi
58 First of all you should be shore that opensm was started
59 Then you could use ib subnet
60 for mpi trqansport i use openmpi
61 --
62 Alexey 'Alexxy' Shvetsov
63 Gentoo/KDE
64 Gentoo/MIPS
65 Gentoo Team Ru

Attachments

File name MIME type
signature.asc application/pgp-signature