Gentoo Archives: gentoo-science

From: Vittorio <vitto.giova@×××××.it>
To: gentoo-science@l.g.o
Subject: [gentoo-science] [sys-cluster] help with infiniband and mpi
Date: Sat, 14 Feb 2009 03:28:46
Message-Id: 4de51c660902131928v1f054463r6e655aeb2af361e5@mail.gmail.com
1 Hello!
2 This is my first message on the list so i hope that i'm not going to ask
3 silly or already answered question ^^
4
5 i'm a student and i'm porting an electromagnetic field simulator to a
6 parallel and distributed linux cluster for final thesis; i'm using both
7 OpenMP and MPI over Infiniband to achieve speed improvements
8
9 the openmp part is done and now i'm facing problem with setting up MPI over
10 Infinband
11 i have correctly set up the kernel modules
12 installed the right drivers for the board (a mellanox hca) and userspace
13 programs (os
14 installed mpavich2 mpi implementation (thanks to msg [1])
15
16 however i fail to run all of this together:
17 for example ibhost correctly find the two nodes connected
18
19 Ca : 0x0002c90300018b8e ports 2 " HCA-1"
20 Ca : 0x0002c90300018b12 ports 2 "localhost HCA-1"
21
22 but ibping doens't receive responses
23
24 ibwarn: [32052] ibping: Ping..
25 ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
26 ibwarn: [32052] main: ibping to Lid 2 failed
27
28 subsequently any other operation with MPI fails
29 strangely enough however IPoIB works very well and i can ping and connect
30 with no problems
31
32 the two machines are identical and they use a crossover cable (point to
33 point)
34 03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0
35 2.5GT/s] (rev a0)
36
37 what can be the cause of all of this? am i forgetting something?
38 any help is greatly appreciated
39
40
41 for the mantainers, is it possible to have openib-diags installed in
42 /usr/bin instead of /usr/sbin? most of the files recall other programs or
43 script only from /usr/bin
44 i resolved doing
45
46 for x in `ls -l /usr/sbin/ib*|awk '{print $9}'`; do ln -s $x
47 /usr/bin/`basename $x`; done
48
49 [1]
50 http://archives.gentoo.org/gentoo-science/msg_7c030de6ea7ce8673ab90061a066df28.xml
51
52
53 thanks a lot
54 Vittorio

Replies

Subject Author
Re: [gentoo-science] [sys-cluster] help with infiniband and mpi Alexey Shvetsov <alexxy@g.o>