1 |
On Tue, 2019-05-28 at 01:37 -0700, Mo Zhou wrote: |
2 |
> Hi Gentoo devs, |
3 |
> |
4 |
> Classical numerical linear algebra libraries, BLAS[1] and LAPACK[2] |
5 |
> play important roles in the scientific computing field, as many |
6 |
> software such as Numpy, Scipy, Julia, Octave, R are built upon them. |
7 |
> |
8 |
> There is a standard implementation of BLAS and LAPACK, named netlib |
9 |
> or simply "reference implementation". This implementation had been |
10 |
> provided by gentoo's main repo. However, it has a major problem: |
11 |
> performance. On the other hand, a number of well-optimized |
12 |
> BLAS/LAPACK |
13 |
> implementations exist, including OpenBLAS (free), BLIS (free), |
14 |
> MKL (non-free), etc., but none of them has been properly integrated |
15 |
> into the Gentoo distribution. |
16 |
> |
17 |
> I'm writing to propose a good solution to this problem. If no gentoo |
18 |
> developer is object to this proposal, I'll keep moving forward and |
19 |
> start submitting PRs to Gentoo main repo. |
20 |
> |
21 |
> Historical Obstacle |
22 |
> ------------------- |
23 |
> |
24 |
> Different BLAS/LAPACK implementations are expected to be compatible |
25 |
> to each other in both the API and ABI level. They can be used as |
26 |
> drop-in replacement to the others. This sounds nice, but the |
27 |
> difference |
28 |
> in SONAME hampered the gentoo integration of well-optimized ones. |
29 |
> |
30 |
> Assume a Gentoo user compiled a pile of packages on top of the |
31 |
> reference |
32 |
> BLAS and LAPACK, namely these reverse dependencies are linked against |
33 |
> libblas.so.3 and liblapack.so.3 . When the user discovered that |
34 |
> OpenBLAS provides much better performance, they'll have to recompile |
35 |
> the whole reverse dependency tree in order to take advantage from |
36 |
> OpenBLAS, |
37 |
> because the SONAME of OpenBLAS is libopenblas.so.0 . When the user |
38 |
> wants to try MKL (libmkl_rt.so), they'll have to recompile the whole |
39 |
> reverse dependency tree again. |
40 |
> |
41 |
> This is not friendly to our earth. |
42 |
> |
43 |
> Goal |
44 |
> ---- |
45 |
> |
46 |
> * When a program is linked against libblas.so or liblapack.so |
47 |
> provided by any BLAS/LAPACK provider, the eselect-based solution |
48 |
> will allow user to switch the underlying library without |
49 |
> recompiling |
50 |
> anything. |
51 |
> |
52 |
> * When a program is linked against a specific implementation, e.g. |
53 |
> libmkl_rt.so, the solution doesn't break anything. |
54 |
> |
55 |
> Solution |
56 |
> -------- |
57 |
> |
58 |
> Similar to Debian's update-alternatives mechanism, Gentoo's eselect |
59 |
> is good at dealing with drop-in replacements as well. My preliminary |
60 |
> investigation suggests that eselect is enough for enabling |
61 |
> BLAS/LAPACK |
62 |
> runtime switching. Hence, the proposed solution is eselect-based: |
63 |
> |
64 |
> * Every BLAS/LAPACK implementation should provide generic library |
65 |
> and eselect candidate libraries at the same time. Taking netlib, |
66 |
> BLIS and OpenBLAS as examples: |
67 |
> |
68 |
> reference: |
69 |
> |
70 |
> usr/lib64/blas/reference/libblas.so.3 (SONAME=libblas.so.3) |
71 |
> -- default BLAS provider |
72 |
> -- candidate of the eselect "blas" unit |
73 |
> -- will be symlinked to usr/lib64/libblas.so.3 by eselect |
74 |
> |
75 |
> usr/lib64/lapack/reference/liblapack.so.3 |
76 |
> (SONAME=liblapack.so.3) |
77 |
> -- default LAPACK provider |
78 |
> -- candidate of the eselect "lapack" unit |
79 |
> -- will be symlinked to usr/lib64/liblapack.so.3 by eselect |
80 |
> |
81 |
> blis (doesn't provide LAPACK): |
82 |
> |
83 |
> usr/lib64/libblis.so.2 (SONAME=libblis.so.2) |
84 |
> -- general purpose |
85 |
> |
86 |
> usr/lib64/blas/blis/libblas.so.3 (SONAME=libblas.so.3) |
87 |
> -- candidate of the eselect "blas" unit |
88 |
> -- will be symlinked to usr/lib64/libblas.so.3 by eselect |
89 |
> -- compiled from the same set of object files as libblis.so.2 |
90 |
> |
91 |
> openblas: |
92 |
> |
93 |
> usr/lib64/libopenblas.so.0 (SONAME=libopenblas.so.0) |
94 |
> -- general purpose |
95 |
> |
96 |
> usr/lib64/blas/openblas/libblas.so.3 (SONAME=libblas.so.3) |
97 |
> -- candidate of the eselect "blas" unit |
98 |
> -- will be symlinked to usr/lib64/libblas.so.3 by eselect |
99 |
> -- compiled from the same set of object files as |
100 |
> libopenblas.so.0 |
101 |
> |
102 |
> usr/lib64/lapack/openblas/liblapack.so.3 |
103 |
> (SONAME=liblapack.so.3) |
104 |
> -- candidate of the eselect "lapack" unit |
105 |
> -- will be symlinked to usr/lib64/liblapack.so.3 by eselect |
106 |
> -- compiled from the same set of object files as |
107 |
> libopenblas.so.0 |
108 |
> |
109 |
> This solution is similar to Debian's[3]. This solution achieves our |
110 |
> goal, |
111 |
> and it requires us to patch upstream build systems (same to Debian). |
112 |
> Preliminary demonstration for this solution is available, see below. |
113 |
> |
114 |
> Is this solution reliable? |
115 |
> -------------------------- |
116 |
> |
117 |
> * A similar solution has been used by Debian for many years. |
118 |
> * Many projects call BLAS/LAPACK libraries through FFI, including |
119 |
> Julia. |
120 |
> (See Julia's standard library: LinearAlgebra) |
121 |
> |
122 |
> Proposed Changes |
123 |
> ---------------- |
124 |
> |
125 |
> 1. Deprecate sci-libs/{blas,cblas,lapack,lapacke}-reference from |
126 |
> gentoo |
127 |
> main repo. They use exactly the same source tarball. It's not |
128 |
> quite |
129 |
> helpful to package these components in a fine-grained manner. A |
130 |
> single |
131 |
> sci-libs/lapack package is enough. |
132 |
> |
133 |
> 2. Merge the "cblas" eselect unit into "blas" unit. It is potentially |
134 |
> harmful when "blas" and "cblas" point to different |
135 |
> implementations. |
136 |
> That means "app-eselect/eselect-cblas" should be deprecated. |
137 |
> |
138 |
> 3. Update virtual/{blas,cblas,lapack,lapacke}. BLAS/LAPACK providers |
139 |
> will be registered in their dependency information. |
140 |
> |
141 |
> Note, ebuilds for BLAS/LAPACK reverse dependencies are expected to |
142 |
> work |
143 |
> with these changes correctly without change. For example, my local |
144 |
> numpy-1.16.1 compilation was successful without change. |
145 |
> |
146 |
> Preliminary Demonstration |
147 |
> ------------------------- |
148 |
> |
149 |
> The preliminary implementation is available in my personal |
150 |
> overlay[4]. |
151 |
> A simple sanity test script `check-cpp.sh` is provided to illustrate |
152 |
> the effectiveness of the proposed solution. |
153 |
> |
154 |
> The script `check-cpp.sh` compiles two C++ programs -- one calls |
155 |
> general |
156 |
> matrix-matrix multiplication from BLAS, while another one calls |
157 |
> general |
158 |
> singular value decomposition from LAPACK. Once compiled, this script |
159 |
> will switch different BLAS/LAPACK implementations and run the C++ |
160 |
> programs |
161 |
> without recompilation. |
162 |
> |
163 |
> The preliminary result is avaiable here[5]. (CPU=Power9, |
164 |
> ARCH=ppc64le) |
165 |
> From the experimental results, we find that |
166 |
> |
167 |
> For (512x512) single precision matrix multiplication: |
168 |
> * reference BLAS takes ~360 ms |
169 |
> * BLIS takes ~70 ms |
170 |
> * OpenBLAS takes ~10 ms |
171 |
> |
172 |
> For (512x512) single precision singular value decomposition: |
173 |
> * reference LAPACK takes ~1900 ms |
174 |
> * BLIS (+reference LAPACK) takes ~1500 ms |
175 |
> * OpenBLAS takes ~1100 ms |
176 |
> |
177 |
> The difference in computation speed illustrates the effectiveness of |
178 |
> the proposed solution. Theoretically, any other package could take |
179 |
> advantage from this solution without any recompilation as long as |
180 |
> it's linked against a library with SONAME. |
181 |
> |
182 |
> Acknowledgement |
183 |
> --------------- |
184 |
> This is an on-going GSoC-2019 Porject: |
185 |
> https://summerofcode.withgoogle.com/projects/?sp-page=2#6268942782300160 |
186 |
> |
187 |
> Mentor: Benda Xu |
188 |
> |
189 |
> [1] BLAS = Basic Linear Algebra Subroutines. It's a set of API + ABI. |
190 |
> [2] LAPACK = Linear Algebra PACKage. It's a set of API + ABI. |
191 |
> [3] https://wiki.debian.org/DebianScience/LinearAlgebraLibraries |
192 |
> [4] https://github.com/cdluminate/my-overlay |
193 |
> [5] |
194 |
> https://gist.github.com/cdluminate/0cfeab19b89a8b5ac4ea2c5f942d8f64 |
195 |
> |
196 |
|
197 |
We already have such a solution in the sci-overlay. It has proven |
198 |
extremely brittle and shaky. The plan is to do this via USE flags |
199 |
similar to python-single-r1 flags. Optionally, we can leave a "virtual" |
200 |
USE flag, where users can switch implementation at runtime, but this |
201 |
will not be a supported configuration. |
202 |
|
203 |
While I understand that runtime-switching sounds like a great feature, |
204 |
it has proven too difficult to get right and too hard to enforce |
205 |
invariants on correct symlinks. People that want this can go the |
206 |
virtual+eselect approach in the overlay, but 99% of Gentoo users will |
207 |
be happy with just linking against OpenBLAS/reference-lapack and not |
208 |
having to fix weird stale symlinks that eselect-alternatives somehow |
209 |
lost track of. |
210 |
|
211 |
David |
212 |
|
213 |
See also: |
214 |
https://bugs.gentoo.org/632624 |
215 |
https://bugs.gentoo.org/348843#c30 |