Gentoo Archives: gentoo-python

From: "Michał Górny" <mgorny@g.o>
To: gentoo-python@l.g.o
Cc: python@g.o
Subject: [gentoo-python] distutils-r1, setuptools and egg_info mess
Date: Fri, 16 Aug 2013 20:12:14
Message-Id: 20130816221158.043c03d5@gentoo.org
Hello,

We've kinda hit a brick wall with distutils-r1. TLDR; the core issue
is that some packages using setuptools may not install some files
(package data files) when using distutils-r1. Therefore, I'm asking you
to pay special attention to whether all files are installed when
migrating packages and defer switching to distutils-r1 if you notice
some missing files. We're working on fixing the issue but it's just
hard.



The problem occurs with so-called 'package data files'. That is, files
that are expected to be installed in Python's site-packages directory
that are not Python modules.

By default, when installing packages distutils install only .py files.
However, some packages expect additional files to be located relatively
to those .py files. Normally, those files can be specified via MANIFEST
file.

However, setuptools folk decided that explicitly listing installed
files was too cumbersome and added some 'smart' logic to find them
automatically. As a result, setuptools interacts with the VCS to check
which files are part of the repository, and installs those files
automatically.

As you may guess, this logic works only when sources are installed
from VCS checkout. What happen in tarballs then? That's where egg-info
files come into play.

When a distribution tarball is created, setuptools put egg-info into
it, with the SOURCES.txt file listing all the files. Then, when package
is installed from the tarball, setuptools uses the same egg-info
directory, notices existing files and reuses the file list.

Of course, it all falls apart when we override --egg-base, that is
the directory where egg-info files are located. Setuptools no longer
find the pre-created file and since VCS is not available, additional
files are not installed.

If you believe that the solution is simple, you're wrong. We simply
need to override --egg-base since setuptools use that directory both
for reading and writing. Using the same directory for multiple parallel
builds means two things:

1) if installed files differ per Python version, we're going to get
a mess,

2) there's an awful race condition that one implementation may read
incomplete (or even empty) SOURCES.txt that other implementation just
started writing.

So how to fix it properly? I have no idea.

The 'best' thing to do seems to be to convince upstreams that relying
on the awfully fragile setuptools auto-adding of files is bad and they
should instead list them explicitly in MANIFEST.in. But then we've got
to be more convincing since we're basically saying that one
of the 'awesome' and documented features of setuptools must not be ever
used.

The other solution is to put more hackery in the eclass. That is, find
all .egg files in ${S} and copy them to the future 'egg-base' for each
implementation. Then, we could keep both the separation of egg-info per
implementation and respect the initial contents.

However, I'm not really happy to play that kind of games. Probably
the most proper place to put that logic would be 'esetup.py' function
but that would be just awful. Then there's
distutils-r1_python_prepare_all() which people still forget to call.

Any suggestions, thoughts? I will probably try to contact upstreams of
some random packages and see how's their stance on using MANIFEST.in
instead of the auto-logic.

-- 
Best regards,
Michał Górny

Attachments

File name MIME type
signature.asc application/pgp-signature

Replies