Gentoo Archives: gentoo-python

From: "Michał Górny" <mgorny@g.o>
To: gentoo-python@l.g.o
Cc: python@g.o
Subject: [gentoo-python] distutils-r1, setuptools and egg_info mess
Date: Fri, 16 Aug 2013 20:12:14
Message-Id: 20130816221158.043c03d5@gentoo.org
1 Hello,
2
3 We've kinda hit a brick wall with distutils-r1. TLDR; the core issue
4 is that some packages using setuptools may not install some files
5 (package data files) when using distutils-r1. Therefore, I'm asking you
6 to pay special attention to whether all files are installed when
7 migrating packages and defer switching to distutils-r1 if you notice
8 some missing files. We're working on fixing the issue but it's just
9 hard.
10
11
12
13 The problem occurs with so-called 'package data files'. That is, files
14 that are expected to be installed in Python's site-packages directory
15 that are not Python modules.
16
17 By default, when installing packages distutils install only .py files.
18 However, some packages expect additional files to be located relatively
19 to those .py files. Normally, those files can be specified via MANIFEST
20 file.
21
22 However, setuptools folk decided that explicitly listing installed
23 files was too cumbersome and added some 'smart' logic to find them
24 automatically. As a result, setuptools interacts with the VCS to check
25 which files are part of the repository, and installs those files
26 automatically.
27
28 As you may guess, this logic works only when sources are installed
29 from VCS checkout. What happen in tarballs then? That's where egg-info
30 files come into play.
31
32 When a distribution tarball is created, setuptools put egg-info into
33 it, with the SOURCES.txt file listing all the files. Then, when package
34 is installed from the tarball, setuptools uses the same egg-info
35 directory, notices existing files and reuses the file list.
36
37 Of course, it all falls apart when we override --egg-base, that is
38 the directory where egg-info files are located. Setuptools no longer
39 find the pre-created file and since VCS is not available, additional
40 files are not installed.
41
42 If you believe that the solution is simple, you're wrong. We simply
43 need to override --egg-base since setuptools use that directory both
44 for reading and writing. Using the same directory for multiple parallel
45 builds means two things:
46
47 1) if installed files differ per Python version, we're going to get
48 a mess,
49
50 2) there's an awful race condition that one implementation may read
51 incomplete (or even empty) SOURCES.txt that other implementation just
52 started writing.
53
54 So how to fix it properly? I have no idea.
55
56 The 'best' thing to do seems to be to convince upstreams that relying
57 on the awfully fragile setuptools auto-adding of files is bad and they
58 should instead list them explicitly in MANIFEST.in. But then we've got
59 to be more convincing since we're basically saying that one
60 of the 'awesome' and documented features of setuptools must not be ever
61 used.
62
63 The other solution is to put more hackery in the eclass. That is, find
64 all .egg files in ${S} and copy them to the future 'egg-base' for each
65 implementation. Then, we could keep both the separation of egg-info per
66 implementation and respect the initial contents.
67
68 However, I'm not really happy to play that kind of games. Probably
69 the most proper place to put that logic would be 'esetup.py' function
70 but that would be just awful. Then there's
71 distutils-r1_python_prepare_all() which people still forget to call.
72
73 Any suggestions, thoughts? I will probably try to contact upstreams of
74 some random packages and see how's their stance on using MANIFEST.in
75 instead of the auto-logic.
76
77 --
78 Best regards,
79 Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies