1 |
On Wed, 2019-05-29 at 22:33 +0800, Benda Xu wrote: |
2 |
> Hi Michał, |
3 |
> |
4 |
> Michał Górny <mgorny@g.o> writes: |
5 |
> |
6 |
> > On Tue, 2019-05-28 at 01:37 -0700, Mo Zhou wrote: |
7 |
> > > Different BLAS/LAPACK implementations are expected to be |
8 |
> > > compatible |
9 |
> > > to each other in both the API and ABI level. They can be used as |
10 |
> > > drop-in replacement to the others. This sounds nice, but the |
11 |
> > > difference |
12 |
> > > in SONAME hampered the gentoo integration of well-optimized ones. |
13 |
> > |
14 |
> > If SONAMEs are different, then they are not compatible by |
15 |
> > definition. |
16 |
> |
17 |
> This blas/lapack SONAME difference is a special case. They are |
18 |
> parially |
19 |
> compatible in the sense that every alternative implementation of blas |
20 |
> is |
21 |
> a superset of the reference one. |
22 |
> |
23 |
> Therefore linking to the reference at build time will make sure the |
24 |
> compatibility with the alternative implementations, even with |
25 |
> different |
26 |
> SONAME. |
27 |
> |
28 |
> > > [...] |
29 |
> > > |
30 |
> > > Similar to Debian's update-alternatives mechanism, Gentoo's |
31 |
> > > eselect |
32 |
> > > is good at dealing with drop-in replacements as well. My |
33 |
> > > preliminary |
34 |
> > > investigation suggests that eselect is enough for enabling |
35 |
> > > BLAS/LAPACK |
36 |
> > > runtime switching. Hence, the proposed solution is eselect-based: |
37 |
> > > |
38 |
> > > * Every BLAS/LAPACK implementation should provide generic |
39 |
> > > library |
40 |
> > > and eselect candidate libraries at the same time. Taking |
41 |
> > > netlib, |
42 |
> > > BLIS and OpenBLAS as examples: |
43 |
> > > |
44 |
> > > reference: |
45 |
> > > |
46 |
> > > usr/lib64/blas/reference/libblas.so.3 (SONAME=libblas.so.3) |
47 |
> > > -- default BLAS provider |
48 |
> > > -- candidate of the eselect "blas" unit |
49 |
> > > -- will be symlinked to usr/lib64/libblas.so.3 by eselect |
50 |
> > |
51 |
> > /usr/lib64 is not supposed to be modified by eselect, it's package |
52 |
> > manager area. Yes, I know a lot of modules still do that but |
53 |
> > that's no |
54 |
> > reason to make things worse when people are putting significant |
55 |
> > effort |
56 |
> > to actually improve things. |
57 |
> |
58 |
> Sorry, I didn't see your reply before mine. |
59 |
> |
60 |
> We are going to use the LDPATH and ld.so.conf mechanism suggested by |
61 |
> you. |
62 |
> |
63 |
> > > usr/lib64/lapack/reference/liblapack.so.3 |
64 |
> > > (SONAME=liblapack.so.3) |
65 |
> > > -- default LAPACK provider |
66 |
> > > -- candidate of the eselect "lapack" unit |
67 |
> > > -- will be symlinked to usr/lib64/liblapack.so.3 by |
68 |
> > > eselect |
69 |
> > > |
70 |
> > > blis (doesn't provide LAPACK): |
71 |
> > > |
72 |
> > > usr/lib64/libblis.so.2 (SONAME=libblis.so.2) |
73 |
> > > -- general purpose |
74 |
> > > |
75 |
> > > usr/lib64/blas/blis/libblas.so.3 (SONAME=libblas.so.3) |
76 |
> > > -- candidate of the eselect "blas" unit |
77 |
> > > -- will be symlinked to usr/lib64/libblas.so.3 by eselect |
78 |
> > > -- compiled from the same set of object files as |
79 |
> > > libblis.so.2 |
80 |
> > > |
81 |
> > > openblas: |
82 |
> > > |
83 |
> > > usr/lib64/libopenblas.so.0 (SONAME=libopenblas.so.0) |
84 |
> > > -- general purpose |
85 |
> > > |
86 |
> > > usr/lib64/blas/openblas/libblas.so.3 (SONAME=libblas.so.3) |
87 |
> > > -- candidate of the eselect "blas" unit |
88 |
> > > -- will be symlinked to usr/lib64/libblas.so.3 by eselect |
89 |
> > > -- compiled from the same set of object files as |
90 |
> > > libopenblas.so.0 |
91 |
> > > |
92 |
> > > usr/lib64/lapack/openblas/liblapack.so.3 |
93 |
> > > (SONAME=liblapack.so.3) |
94 |
> > > -- candidate of the eselect "lapack" unit |
95 |
> > > -- will be symlinked to usr/lib64/liblapack.so.3 by |
96 |
> > > eselect |
97 |
> > > -- compiled from the same set of object files as |
98 |
> > > libopenblas.so.0 |
99 |
> > > |
100 |
> > > This solution is similar to Debian's[3]. This solution achieves |
101 |
> > > our |
102 |
> > > goal, |
103 |
> > > and it requires us to patch upstream build systems (same to |
104 |
> > > Debian). |
105 |
> > > Preliminary demonstration for this solution is available, see |
106 |
> > > below. |
107 |
> > |
108 |
> > So basically the three walls of text say in round-about way that |
109 |
> > you're |
110 |
> > going to introduce custom hacks to recompile libraries with |
111 |
> > different |
112 |
> > SONAME. Ok. |
113 |
> > > Is this solution reliable? |
114 |
> > > -------------------------- |
115 |
> > > |
116 |
> > > * A similar solution has been used by Debian for many years. |
117 |
> > > * Many projects call BLAS/LAPACK libraries through FFI, including |
118 |
> > > Julia. |
119 |
> > > (See Julia's standard library: LinearAlgebra) |
120 |
> > > |
121 |
> > > Proposed Changes |
122 |
> > > ---------------- |
123 |
> > > |
124 |
> > > 1. Deprecate sci-libs/{blas,cblas,lapack,lapacke}-reference from |
125 |
> > > gentoo |
126 |
> > > main repo. They use exactly the same source tarball. It's not |
127 |
> > > quite |
128 |
> > > helpful to package these components in a fine-grained manner. |
129 |
> > > A |
130 |
> > > single |
131 |
> > > sci-libs/lapack package is enough. |
132 |
> > |
133 |
> > Where's the gain in that? |
134 |
> > > 2. Merge the "cblas" eselect unit into "blas" unit. It is |
135 |
> > > potentially |
136 |
> > > harmful when "blas" and "cblas" point to different |
137 |
> > > implementations. |
138 |
> > > That means "app-eselect/eselect-cblas" should be deprecated. |
139 |
> > > |
140 |
> > > 3. Update virtual/{blas,cblas,lapack,lapacke}. BLAS/LAPACK |
141 |
> > > providers |
142 |
> > > will be registered in their dependency information. |
143 |
> > > |
144 |
> > > Note, ebuilds for BLAS/LAPACK reverse dependencies are expected |
145 |
> > > to work |
146 |
> > > with these changes correctly without change. For example, my |
147 |
> > > local |
148 |
> > > numpy-1.16.1 compilation was successful without change. |
149 |
> > > |
150 |
> > > Preliminary Demonstration |
151 |
> > > ------------------------- |
152 |
> > > |
153 |
> > > The preliminary implementation is available in my personal |
154 |
> > > overlay[4]. |
155 |
> > > A simple sanity test script `check-cpp.sh` is provided to |
156 |
> > > illustrate |
157 |
> > > the effectiveness of the proposed solution. |
158 |
> > > |
159 |
> > > The script `check-cpp.sh` compiles two C++ programs -- one calls |
160 |
> > > general |
161 |
> > > matrix-matrix multiplication from BLAS, while another one calls |
162 |
> > > general |
163 |
> > > singular value decomposition from LAPACK. Once compiled, this |
164 |
> > > script |
165 |
> > > will switch different BLAS/LAPACK implementations and run the C++ |
166 |
> > > programs |
167 |
> > > without recompilation. |
168 |
> > > |
169 |
> > > The preliminary result is avaiable here[5]. (CPU=Power9, |
170 |
> > > ARCH=ppc64le) |
171 |
> > > From the experimental results, we find that |
172 |
> > > |
173 |
> > > For (512x512) single precision matrix multiplication: |
174 |
> > > * reference BLAS takes ~360 ms |
175 |
> > > * BLIS takes ~70 ms |
176 |
> > > * OpenBLAS takes ~10 ms |
177 |
> > > |
178 |
> > > For (512x512) single precision singular value decomposition: |
179 |
> > > * reference LAPACK takes ~1900 ms |
180 |
> > > * BLIS (+reference LAPACK) takes ~1500 ms |
181 |
> > > * OpenBLAS takes ~1100 ms |
182 |
> > > |
183 |
> > > The difference in computation speed illustrates the effectiveness |
184 |
> > > of |
185 |
> > > the proposed solution. Theoretically, any other package could |
186 |
> > > take |
187 |
> > > advantage from this solution without any recompilation as long as |
188 |
> > > it's linked against a library with SONAME. |
189 |
> > |
190 |
> > An actual ABI compliance test, e.g. done using abi-compliance- |
191 |
> > checker |
192 |
> > would be more interesting. |
193 |
> |
194 |
> As said above, the symbols don't need to be 1-1 copy of each other. |
195 |
> Any |
196 |
> library which is a superset of the reference one will work. |
197 |
|
198 |
Again, I'm willing to accept this under a USE="lapack_targets_virtual" |
199 |
configuration, but wholesale editing of DT_NEEDED entries is definitely |
200 |
too scary and too invasive for most non-sci/hpc users of Gentoo. Again, |
201 |
for 99% of users, OpenBLAS will be the right trade-off between |
202 |
performance and customizability. Every recompilation of libreoffice or |
203 |
chromium will devour more CPU cycles than switching between USE-flag |
204 |
implementations. |