1 |
Hello, |
2 |
|
3 |
We've kinda hit a brick wall with distutils-r1. TLDR; the core issue |
4 |
is that some packages using setuptools may not install some files |
5 |
(package data files) when using distutils-r1. Therefore, I'm asking you |
6 |
to pay special attention to whether all files are installed when |
7 |
migrating packages and defer switching to distutils-r1 if you notice |
8 |
some missing files. We're working on fixing the issue but it's just |
9 |
hard. |
10 |
|
11 |
|
12 |
|
13 |
The problem occurs with so-called 'package data files'. That is, files |
14 |
that are expected to be installed in Python's site-packages directory |
15 |
that are not Python modules. |
16 |
|
17 |
By default, when installing packages distutils install only .py files. |
18 |
However, some packages expect additional files to be located relatively |
19 |
to those .py files. Normally, those files can be specified via MANIFEST |
20 |
file. |
21 |
|
22 |
However, setuptools folk decided that explicitly listing installed |
23 |
files was too cumbersome and added some 'smart' logic to find them |
24 |
automatically. As a result, setuptools interacts with the VCS to check |
25 |
which files are part of the repository, and installs those files |
26 |
automatically. |
27 |
|
28 |
As you may guess, this logic works only when sources are installed |
29 |
from VCS checkout. What happen in tarballs then? That's where egg-info |
30 |
files come into play. |
31 |
|
32 |
When a distribution tarball is created, setuptools put egg-info into |
33 |
it, with the SOURCES.txt file listing all the files. Then, when package |
34 |
is installed from the tarball, setuptools uses the same egg-info |
35 |
directory, notices existing files and reuses the file list. |
36 |
|
37 |
Of course, it all falls apart when we override --egg-base, that is |
38 |
the directory where egg-info files are located. Setuptools no longer |
39 |
find the pre-created file and since VCS is not available, additional |
40 |
files are not installed. |
41 |
|
42 |
If you believe that the solution is simple, you're wrong. We simply |
43 |
need to override --egg-base since setuptools use that directory both |
44 |
for reading and writing. Using the same directory for multiple parallel |
45 |
builds means two things: |
46 |
|
47 |
1) if installed files differ per Python version, we're going to get |
48 |
a mess, |
49 |
|
50 |
2) there's an awful race condition that one implementation may read |
51 |
incomplete (or even empty) SOURCES.txt that other implementation just |
52 |
started writing. |
53 |
|
54 |
So how to fix it properly? I have no idea. |
55 |
|
56 |
The 'best' thing to do seems to be to convince upstreams that relying |
57 |
on the awfully fragile setuptools auto-adding of files is bad and they |
58 |
should instead list them explicitly in MANIFEST.in. But then we've got |
59 |
to be more convincing since we're basically saying that one |
60 |
of the 'awesome' and documented features of setuptools must not be ever |
61 |
used. |
62 |
|
63 |
The other solution is to put more hackery in the eclass. That is, find |
64 |
all .egg files in ${S} and copy them to the future 'egg-base' for each |
65 |
implementation. Then, we could keep both the separation of egg-info per |
66 |
implementation and respect the initial contents. |
67 |
|
68 |
However, I'm not really happy to play that kind of games. Probably |
69 |
the most proper place to put that logic would be 'esetup.py' function |
70 |
but that would be just awful. Then there's |
71 |
distutils-r1_python_prepare_all() which people still forget to call. |
72 |
|
73 |
Any suggestions, thoughts? I will probably try to contact upstreams of |
74 |
some random packages and see how's their stance on using MANIFEST.in |
75 |
instead of the auto-logic. |
76 |
|
77 |
-- |
78 |
Best regards, |
79 |
Michał Górny |