Gentoo Archives: gentoo-dev

From: Mo Zhou <lumin@××××××.org>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] RFC: BLAS and LAPACK runtime switching
Date: Tue, 28 May 2019 08:37:05
Message-Id: 2d3636f5bd6a738f30a4ad2e697b1ddb@debian.org
1 Hi Gentoo devs,
2
3 Classical numerical linear algebra libraries, BLAS[1] and LAPACK[2]
4 play important roles in the scientific computing field, as many
5 software such as Numpy, Scipy, Julia, Octave, R are built upon them.
6
7 There is a standard implementation of BLAS and LAPACK, named netlib
8 or simply "reference implementation". This implementation had been
9 provided by gentoo's main repo. However, it has a major problem:
10 performance. On the other hand, a number of well-optimized BLAS/LAPACK
11 implementations exist, including OpenBLAS (free), BLIS (free),
12 MKL (non-free), etc., but none of them has been properly integrated
13 into the Gentoo distribution.
14
15 I'm writing to propose a good solution to this problem. If no gentoo
16 developer is object to this proposal, I'll keep moving forward and
17 start submitting PRs to Gentoo main repo.
18
19 Historical Obstacle
20 -------------------
21
22 Different BLAS/LAPACK implementations are expected to be compatible
23 to each other in both the API and ABI level. They can be used as
24 drop-in replacement to the others. This sounds nice, but the difference
25 in SONAME hampered the gentoo integration of well-optimized ones.
26
27 Assume a Gentoo user compiled a pile of packages on top of the reference
28 BLAS and LAPACK, namely these reverse dependencies are linked against
29 libblas.so.3 and liblapack.so.3 . When the user discovered that
30 OpenBLAS provides much better performance, they'll have to recompile
31 the whole reverse dependency tree in order to take advantage from
32 OpenBLAS,
33 because the SONAME of OpenBLAS is libopenblas.so.0 . When the user
34 wants to try MKL (libmkl_rt.so), they'll have to recompile the whole
35 reverse dependency tree again.
36
37 This is not friendly to our earth.
38
39 Goal
40 ----
41
42 * When a program is linked against libblas.so or liblapack.so
43 provided by any BLAS/LAPACK provider, the eselect-based solution
44 will allow user to switch the underlying library without recompiling
45 anything.
46
47 * When a program is linked against a specific implementation, e.g.
48 libmkl_rt.so, the solution doesn't break anything.
49
50 Solution
51 --------
52
53 Similar to Debian's update-alternatives mechanism, Gentoo's eselect
54 is good at dealing with drop-in replacements as well. My preliminary
55 investigation suggests that eselect is enough for enabling BLAS/LAPACK
56 runtime switching. Hence, the proposed solution is eselect-based:
57
58 * Every BLAS/LAPACK implementation should provide generic library
59 and eselect candidate libraries at the same time. Taking netlib,
60 BLIS and OpenBLAS as examples:
61
62 reference:
63
64 usr/lib64/blas/reference/libblas.so.3 (SONAME=libblas.so.3)
65 -- default BLAS provider
66 -- candidate of the eselect "blas" unit
67 -- will be symlinked to usr/lib64/libblas.so.3 by eselect
68
69 usr/lib64/lapack/reference/liblapack.so.3 (SONAME=liblapack.so.3)
70 -- default LAPACK provider
71 -- candidate of the eselect "lapack" unit
72 -- will be symlinked to usr/lib64/liblapack.so.3 by eselect
73
74 blis (doesn't provide LAPACK):
75
76 usr/lib64/libblis.so.2 (SONAME=libblis.so.2)
77 -- general purpose
78
79 usr/lib64/blas/blis/libblas.so.3 (SONAME=libblas.so.3)
80 -- candidate of the eselect "blas" unit
81 -- will be symlinked to usr/lib64/libblas.so.3 by eselect
82 -- compiled from the same set of object files as libblis.so.2
83
84 openblas:
85
86 usr/lib64/libopenblas.so.0 (SONAME=libopenblas.so.0)
87 -- general purpose
88
89 usr/lib64/blas/openblas/libblas.so.3 (SONAME=libblas.so.3)
90 -- candidate of the eselect "blas" unit
91 -- will be symlinked to usr/lib64/libblas.so.3 by eselect
92 -- compiled from the same set of object files as
93 libopenblas.so.0
94
95 usr/lib64/lapack/openblas/liblapack.so.3 (SONAME=liblapack.so.3)
96 -- candidate of the eselect "lapack" unit
97 -- will be symlinked to usr/lib64/liblapack.so.3 by eselect
98 -- compiled from the same set of object files as
99 libopenblas.so.0
100
101 This solution is similar to Debian's[3]. This solution achieves our
102 goal,
103 and it requires us to patch upstream build systems (same to Debian).
104 Preliminary demonstration for this solution is available, see below.
105
106 Is this solution reliable?
107 --------------------------
108
109 * A similar solution has been used by Debian for many years.
110 * Many projects call BLAS/LAPACK libraries through FFI, including Julia.
111 (See Julia's standard library: LinearAlgebra)
112
113 Proposed Changes
114 ----------------
115
116 1. Deprecate sci-libs/{blas,cblas,lapack,lapacke}-reference from gentoo
117 main repo. They use exactly the same source tarball. It's not quite
118 helpful to package these components in a fine-grained manner. A
119 single
120 sci-libs/lapack package is enough.
121
122 2. Merge the "cblas" eselect unit into "blas" unit. It is potentially
123 harmful when "blas" and "cblas" point to different implementations.
124 That means "app-eselect/eselect-cblas" should be deprecated.
125
126 3. Update virtual/{blas,cblas,lapack,lapacke}. BLAS/LAPACK providers
127 will be registered in their dependency information.
128
129 Note, ebuilds for BLAS/LAPACK reverse dependencies are expected to work
130 with these changes correctly without change. For example, my local
131 numpy-1.16.1 compilation was successful without change.
132
133 Preliminary Demonstration
134 -------------------------
135
136 The preliminary implementation is available in my personal overlay[4].
137 A simple sanity test script `check-cpp.sh` is provided to illustrate
138 the effectiveness of the proposed solution.
139
140 The script `check-cpp.sh` compiles two C++ programs -- one calls general
141 matrix-matrix multiplication from BLAS, while another one calls general
142 singular value decomposition from LAPACK. Once compiled, this script
143 will switch different BLAS/LAPACK implementations and run the C++
144 programs
145 without recompilation.
146
147 The preliminary result is avaiable here[5]. (CPU=Power9, ARCH=ppc64le)
148 From the experimental results, we find that
149
150 For (512x512) single precision matrix multiplication:
151 * reference BLAS takes ~360 ms
152 * BLIS takes ~70 ms
153 * OpenBLAS takes ~10 ms
154
155 For (512x512) single precision singular value decomposition:
156 * reference LAPACK takes ~1900 ms
157 * BLIS (+reference LAPACK) takes ~1500 ms
158 * OpenBLAS takes ~1100 ms
159
160 The difference in computation speed illustrates the effectiveness of
161 the proposed solution. Theoretically, any other package could take
162 advantage from this solution without any recompilation as long as
163 it's linked against a library with SONAME.
164
165 Acknowledgement
166 ---------------
167 This is an on-going GSoC-2019 Porject:
168 https://summerofcode.withgoogle.com/projects/?sp-page=2#6268942782300160
169
170 Mentor: Benda Xu
171
172 [1] BLAS = Basic Linear Algebra Subroutines. It's a set of API + ABI.
173 [2] LAPACK = Linear Algebra PACKage. It's a set of API + ABI.
174 [3] https://wiki.debian.org/DebianScience/LinearAlgebraLibraries
175 [4] https://github.com/cdluminate/my-overlay
176 [5] https://gist.github.com/cdluminate/0cfeab19b89a8b5ac4ea2c5f942d8f64

Replies