Gentoo Archives: gentoo-dev

From: David Seifert <soap@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] RFC: BLAS and LAPACK runtime switching
Date: Tue, 28 May 2019 12:05:59
Message-Id: d8630ad99775233b33da0be5063b177b560246ba.camel@gentoo.org
In Reply to: [gentoo-dev] RFC: BLAS and LAPACK runtime switching by Mo Zhou
1 On Tue, 2019-05-28 at 01:37 -0700, Mo Zhou wrote:
2 > Hi Gentoo devs,
3 >
4 > Classical numerical linear algebra libraries, BLAS[1] and LAPACK[2]
5 > play important roles in the scientific computing field, as many
6 > software such as Numpy, Scipy, Julia, Octave, R are built upon them.
7 >
8 > There is a standard implementation of BLAS and LAPACK, named netlib
9 > or simply "reference implementation". This implementation had been
10 > provided by gentoo's main repo. However, it has a major problem:
11 > performance. On the other hand, a number of well-optimized
12 > BLAS/LAPACK
13 > implementations exist, including OpenBLAS (free), BLIS (free),
14 > MKL (non-free), etc., but none of them has been properly integrated
15 > into the Gentoo distribution.
16 >
17 > I'm writing to propose a good solution to this problem. If no gentoo
18 > developer is object to this proposal, I'll keep moving forward and
19 > start submitting PRs to Gentoo main repo.
20 >
21 > Historical Obstacle
22 > -------------------
23 >
24 > Different BLAS/LAPACK implementations are expected to be compatible
25 > to each other in both the API and ABI level. They can be used as
26 > drop-in replacement to the others. This sounds nice, but the
27 > difference
28 > in SONAME hampered the gentoo integration of well-optimized ones.
29 >
30 > Assume a Gentoo user compiled a pile of packages on top of the
31 > reference
32 > BLAS and LAPACK, namely these reverse dependencies are linked against
33 > libblas.so.3 and liblapack.so.3 . When the user discovered that
34 > OpenBLAS provides much better performance, they'll have to recompile
35 > the whole reverse dependency tree in order to take advantage from
36 > OpenBLAS,
37 > because the SONAME of OpenBLAS is libopenblas.so.0 . When the user
38 > wants to try MKL (libmkl_rt.so), they'll have to recompile the whole
39 > reverse dependency tree again.
40 >
41 > This is not friendly to our earth.
42 >
43 > Goal
44 > ----
45 >
46 > * When a program is linked against libblas.so or liblapack.so
47 > provided by any BLAS/LAPACK provider, the eselect-based solution
48 > will allow user to switch the underlying library without
49 > recompiling
50 > anything.
51 >
52 > * When a program is linked against a specific implementation, e.g.
53 > libmkl_rt.so, the solution doesn't break anything.
54 >
55 > Solution
56 > --------
57 >
58 > Similar to Debian's update-alternatives mechanism, Gentoo's eselect
59 > is good at dealing with drop-in replacements as well. My preliminary
60 > investigation suggests that eselect is enough for enabling
61 > BLAS/LAPACK
62 > runtime switching. Hence, the proposed solution is eselect-based:
63 >
64 > * Every BLAS/LAPACK implementation should provide generic library
65 > and eselect candidate libraries at the same time. Taking netlib,
66 > BLIS and OpenBLAS as examples:
67 >
68 > reference:
69 >
70 > usr/lib64/blas/reference/libblas.so.3 (SONAME=libblas.so.3)
71 > -- default BLAS provider
72 > -- candidate of the eselect "blas" unit
73 > -- will be symlinked to usr/lib64/libblas.so.3 by eselect
74 >
75 > usr/lib64/lapack/reference/liblapack.so.3
76 > (SONAME=liblapack.so.3)
77 > -- default LAPACK provider
78 > -- candidate of the eselect "lapack" unit
79 > -- will be symlinked to usr/lib64/liblapack.so.3 by eselect
80 >
81 > blis (doesn't provide LAPACK):
82 >
83 > usr/lib64/libblis.so.2 (SONAME=libblis.so.2)
84 > -- general purpose
85 >
86 > usr/lib64/blas/blis/libblas.so.3 (SONAME=libblas.so.3)
87 > -- candidate of the eselect "blas" unit
88 > -- will be symlinked to usr/lib64/libblas.so.3 by eselect
89 > -- compiled from the same set of object files as libblis.so.2
90 >
91 > openblas:
92 >
93 > usr/lib64/libopenblas.so.0 (SONAME=libopenblas.so.0)
94 > -- general purpose
95 >
96 > usr/lib64/blas/openblas/libblas.so.3 (SONAME=libblas.so.3)
97 > -- candidate of the eselect "blas" unit
98 > -- will be symlinked to usr/lib64/libblas.so.3 by eselect
99 > -- compiled from the same set of object files as
100 > libopenblas.so.0
101 >
102 > usr/lib64/lapack/openblas/liblapack.so.3
103 > (SONAME=liblapack.so.3)
104 > -- candidate of the eselect "lapack" unit
105 > -- will be symlinked to usr/lib64/liblapack.so.3 by eselect
106 > -- compiled from the same set of object files as
107 > libopenblas.so.0
108 >
109 > This solution is similar to Debian's[3]. This solution achieves our
110 > goal,
111 > and it requires us to patch upstream build systems (same to Debian).
112 > Preliminary demonstration for this solution is available, see below.
113 >
114 > Is this solution reliable?
115 > --------------------------
116 >
117 > * A similar solution has been used by Debian for many years.
118 > * Many projects call BLAS/LAPACK libraries through FFI, including
119 > Julia.
120 > (See Julia's standard library: LinearAlgebra)
121 >
122 > Proposed Changes
123 > ----------------
124 >
125 > 1. Deprecate sci-libs/{blas,cblas,lapack,lapacke}-reference from
126 > gentoo
127 > main repo. They use exactly the same source tarball. It's not
128 > quite
129 > helpful to package these components in a fine-grained manner. A
130 > single
131 > sci-libs/lapack package is enough.
132 >
133 > 2. Merge the "cblas" eselect unit into "blas" unit. It is potentially
134 > harmful when "blas" and "cblas" point to different
135 > implementations.
136 > That means "app-eselect/eselect-cblas" should be deprecated.
137 >
138 > 3. Update virtual/{blas,cblas,lapack,lapacke}. BLAS/LAPACK providers
139 > will be registered in their dependency information.
140 >
141 > Note, ebuilds for BLAS/LAPACK reverse dependencies are expected to
142 > work
143 > with these changes correctly without change. For example, my local
144 > numpy-1.16.1 compilation was successful without change.
145 >
146 > Preliminary Demonstration
147 > -------------------------
148 >
149 > The preliminary implementation is available in my personal
150 > overlay[4].
151 > A simple sanity test script `check-cpp.sh` is provided to illustrate
152 > the effectiveness of the proposed solution.
153 >
154 > The script `check-cpp.sh` compiles two C++ programs -- one calls
155 > general
156 > matrix-matrix multiplication from BLAS, while another one calls
157 > general
158 > singular value decomposition from LAPACK. Once compiled, this script
159 > will switch different BLAS/LAPACK implementations and run the C++
160 > programs
161 > without recompilation.
162 >
163 > The preliminary result is avaiable here[5]. (CPU=Power9,
164 > ARCH=ppc64le)
165 > From the experimental results, we find that
166 >
167 > For (512x512) single precision matrix multiplication:
168 > * reference BLAS takes ~360 ms
169 > * BLIS takes ~70 ms
170 > * OpenBLAS takes ~10 ms
171 >
172 > For (512x512) single precision singular value decomposition:
173 > * reference LAPACK takes ~1900 ms
174 > * BLIS (+reference LAPACK) takes ~1500 ms
175 > * OpenBLAS takes ~1100 ms
176 >
177 > The difference in computation speed illustrates the effectiveness of
178 > the proposed solution. Theoretically, any other package could take
179 > advantage from this solution without any recompilation as long as
180 > it's linked against a library with SONAME.
181 >
182 > Acknowledgement
183 > ---------------
184 > This is an on-going GSoC-2019 Porject:
185 > https://summerofcode.withgoogle.com/projects/?sp-page=2#6268942782300160
186 >
187 > Mentor: Benda Xu
188 >
189 > [1] BLAS = Basic Linear Algebra Subroutines. It's a set of API + ABI.
190 > [2] LAPACK = Linear Algebra PACKage. It's a set of API + ABI.
191 > [3] https://wiki.debian.org/DebianScience/LinearAlgebraLibraries
192 > [4] https://github.com/cdluminate/my-overlay
193 > [5]
194 > https://gist.github.com/cdluminate/0cfeab19b89a8b5ac4ea2c5f942d8f64
195 >
196
197 We already have such a solution in the sci-overlay. It has proven
198 extremely brittle and shaky. The plan is to do this via USE flags
199 similar to python-single-r1 flags. Optionally, we can leave a "virtual"
200 USE flag, where users can switch implementation at runtime, but this
201 will not be a supported configuration.
202
203 While I understand that runtime-switching sounds like a great feature,
204 it has proven too difficult to get right and too hard to enforce
205 invariants on correct symlinks. People that want this can go the
206 virtual+eselect approach in the overlay, but 99% of Gentoo users will
207 be happy with just linking against OpenBLAS/reference-lapack and not
208 having to fix weird stale symlinks that eselect-alternatives somehow
209 lost track of.
210
211 David
212
213 See also:
214 https://bugs.gentoo.org/632624
215 https://bugs.gentoo.org/348843#c30

Replies

Subject Author
Re: [gentoo-dev] RFC: BLAS and LAPACK runtime switching Benda Xu <heroxbd@g.o>