1 |
Hi Gentoo devs, |
2 |
|
3 |
Classical numerical linear algebra libraries, BLAS[1] and LAPACK[2] |
4 |
play important roles in the scientific computing field, as many |
5 |
software such as Numpy, Scipy, Julia, Octave, R are built upon them. |
6 |
|
7 |
There is a standard implementation of BLAS and LAPACK, named netlib |
8 |
or simply "reference implementation". This implementation had been |
9 |
provided by gentoo's main repo. However, it has a major problem: |
10 |
performance. On the other hand, a number of well-optimized BLAS/LAPACK |
11 |
implementations exist, including OpenBLAS (free), BLIS (free), |
12 |
MKL (non-free), etc., but none of them has been properly integrated |
13 |
into the Gentoo distribution. |
14 |
|
15 |
I'm writing to propose a good solution to this problem. If no gentoo |
16 |
developer is object to this proposal, I'll keep moving forward and |
17 |
start submitting PRs to Gentoo main repo. |
18 |
|
19 |
Historical Obstacle |
20 |
------------------- |
21 |
|
22 |
Different BLAS/LAPACK implementations are expected to be compatible |
23 |
to each other in both the API and ABI level. They can be used as |
24 |
drop-in replacement to the others. This sounds nice, but the difference |
25 |
in SONAME hampered the gentoo integration of well-optimized ones. |
26 |
|
27 |
Assume a Gentoo user compiled a pile of packages on top of the reference |
28 |
BLAS and LAPACK, namely these reverse dependencies are linked against |
29 |
libblas.so.3 and liblapack.so.3 . When the user discovered that |
30 |
OpenBLAS provides much better performance, they'll have to recompile |
31 |
the whole reverse dependency tree in order to take advantage from |
32 |
OpenBLAS, |
33 |
because the SONAME of OpenBLAS is libopenblas.so.0 . When the user |
34 |
wants to try MKL (libmkl_rt.so), they'll have to recompile the whole |
35 |
reverse dependency tree again. |
36 |
|
37 |
This is not friendly to our earth. |
38 |
|
39 |
Goal |
40 |
---- |
41 |
|
42 |
* When a program is linked against libblas.so or liblapack.so |
43 |
provided by any BLAS/LAPACK provider, the eselect-based solution |
44 |
will allow user to switch the underlying library without recompiling |
45 |
anything. |
46 |
|
47 |
* When a program is linked against a specific implementation, e.g. |
48 |
libmkl_rt.so, the solution doesn't break anything. |
49 |
|
50 |
Solution |
51 |
-------- |
52 |
|
53 |
Similar to Debian's update-alternatives mechanism, Gentoo's eselect |
54 |
is good at dealing with drop-in replacements as well. My preliminary |
55 |
investigation suggests that eselect is enough for enabling BLAS/LAPACK |
56 |
runtime switching. Hence, the proposed solution is eselect-based: |
57 |
|
58 |
* Every BLAS/LAPACK implementation should provide generic library |
59 |
and eselect candidate libraries at the same time. Taking netlib, |
60 |
BLIS and OpenBLAS as examples: |
61 |
|
62 |
reference: |
63 |
|
64 |
usr/lib64/blas/reference/libblas.so.3 (SONAME=libblas.so.3) |
65 |
-- default BLAS provider |
66 |
-- candidate of the eselect "blas" unit |
67 |
-- will be symlinked to usr/lib64/libblas.so.3 by eselect |
68 |
|
69 |
usr/lib64/lapack/reference/liblapack.so.3 (SONAME=liblapack.so.3) |
70 |
-- default LAPACK provider |
71 |
-- candidate of the eselect "lapack" unit |
72 |
-- will be symlinked to usr/lib64/liblapack.so.3 by eselect |
73 |
|
74 |
blis (doesn't provide LAPACK): |
75 |
|
76 |
usr/lib64/libblis.so.2 (SONAME=libblis.so.2) |
77 |
-- general purpose |
78 |
|
79 |
usr/lib64/blas/blis/libblas.so.3 (SONAME=libblas.so.3) |
80 |
-- candidate of the eselect "blas" unit |
81 |
-- will be symlinked to usr/lib64/libblas.so.3 by eselect |
82 |
-- compiled from the same set of object files as libblis.so.2 |
83 |
|
84 |
openblas: |
85 |
|
86 |
usr/lib64/libopenblas.so.0 (SONAME=libopenblas.so.0) |
87 |
-- general purpose |
88 |
|
89 |
usr/lib64/blas/openblas/libblas.so.3 (SONAME=libblas.so.3) |
90 |
-- candidate of the eselect "blas" unit |
91 |
-- will be symlinked to usr/lib64/libblas.so.3 by eselect |
92 |
-- compiled from the same set of object files as |
93 |
libopenblas.so.0 |
94 |
|
95 |
usr/lib64/lapack/openblas/liblapack.so.3 (SONAME=liblapack.so.3) |
96 |
-- candidate of the eselect "lapack" unit |
97 |
-- will be symlinked to usr/lib64/liblapack.so.3 by eselect |
98 |
-- compiled from the same set of object files as |
99 |
libopenblas.so.0 |
100 |
|
101 |
This solution is similar to Debian's[3]. This solution achieves our |
102 |
goal, |
103 |
and it requires us to patch upstream build systems (same to Debian). |
104 |
Preliminary demonstration for this solution is available, see below. |
105 |
|
106 |
Is this solution reliable? |
107 |
-------------------------- |
108 |
|
109 |
* A similar solution has been used by Debian for many years. |
110 |
* Many projects call BLAS/LAPACK libraries through FFI, including Julia. |
111 |
(See Julia's standard library: LinearAlgebra) |
112 |
|
113 |
Proposed Changes |
114 |
---------------- |
115 |
|
116 |
1. Deprecate sci-libs/{blas,cblas,lapack,lapacke}-reference from gentoo |
117 |
main repo. They use exactly the same source tarball. It's not quite |
118 |
helpful to package these components in a fine-grained manner. A |
119 |
single |
120 |
sci-libs/lapack package is enough. |
121 |
|
122 |
2. Merge the "cblas" eselect unit into "blas" unit. It is potentially |
123 |
harmful when "blas" and "cblas" point to different implementations. |
124 |
That means "app-eselect/eselect-cblas" should be deprecated. |
125 |
|
126 |
3. Update virtual/{blas,cblas,lapack,lapacke}. BLAS/LAPACK providers |
127 |
will be registered in their dependency information. |
128 |
|
129 |
Note, ebuilds for BLAS/LAPACK reverse dependencies are expected to work |
130 |
with these changes correctly without change. For example, my local |
131 |
numpy-1.16.1 compilation was successful without change. |
132 |
|
133 |
Preliminary Demonstration |
134 |
------------------------- |
135 |
|
136 |
The preliminary implementation is available in my personal overlay[4]. |
137 |
A simple sanity test script `check-cpp.sh` is provided to illustrate |
138 |
the effectiveness of the proposed solution. |
139 |
|
140 |
The script `check-cpp.sh` compiles two C++ programs -- one calls general |
141 |
matrix-matrix multiplication from BLAS, while another one calls general |
142 |
singular value decomposition from LAPACK. Once compiled, this script |
143 |
will switch different BLAS/LAPACK implementations and run the C++ |
144 |
programs |
145 |
without recompilation. |
146 |
|
147 |
The preliminary result is avaiable here[5]. (CPU=Power9, ARCH=ppc64le) |
148 |
From the experimental results, we find that |
149 |
|
150 |
For (512x512) single precision matrix multiplication: |
151 |
* reference BLAS takes ~360 ms |
152 |
* BLIS takes ~70 ms |
153 |
* OpenBLAS takes ~10 ms |
154 |
|
155 |
For (512x512) single precision singular value decomposition: |
156 |
* reference LAPACK takes ~1900 ms |
157 |
* BLIS (+reference LAPACK) takes ~1500 ms |
158 |
* OpenBLAS takes ~1100 ms |
159 |
|
160 |
The difference in computation speed illustrates the effectiveness of |
161 |
the proposed solution. Theoretically, any other package could take |
162 |
advantage from this solution without any recompilation as long as |
163 |
it's linked against a library with SONAME. |
164 |
|
165 |
Acknowledgement |
166 |
--------------- |
167 |
This is an on-going GSoC-2019 Porject: |
168 |
https://summerofcode.withgoogle.com/projects/?sp-page=2#6268942782300160 |
169 |
|
170 |
Mentor: Benda Xu |
171 |
|
172 |
[1] BLAS = Basic Linear Algebra Subroutines. It's a set of API + ABI. |
173 |
[2] LAPACK = Linear Algebra PACKage. It's a set of API + ABI. |
174 |
[3] https://wiki.debian.org/DebianScience/LinearAlgebraLibraries |
175 |
[4] https://github.com/cdluminate/my-overlay |
176 |
[5] https://gist.github.com/cdluminate/0cfeab19b89a8b5ac4ea2c5f942d8f64 |