Anders =?iso-8859-1?Q?Th=F8gersen?= <anderslt@...> posted
20060520223006.GA9058@..., excerpted below, on Sun, 21 May
2006 00:30:06 +0200:
> On 04:52 Fri 12 May 2006, Duncan wrote:
>> Anders posted as summarized on 12 May 2006:
>> > [Repeatable segfault doing emerge sync at 51%. Portage-2.0.54]
>> [That's almost certainly a portage cache corruption issue. Try emerge
>> --metadata. That should just update the cache without doing the sync
>> part first. If that fails, delete the cache and run emerge --metadata
>> again, to rebuild it.]
> Sorry for the late reply,...
Don't worry too much about the timeliness as the problem's yours, not
mine, so your schedule. From the other side, that's one reason I prefer
newsgroups or mailing lists to private help -- if one person doesn't get
in a timely reply, someone else likely will. (The other big reason is
that no single person always guesses the problem right or has the
experience to fix it, and a list/newsgroup allows more folks a chance to
look at it than private mail would.)
> I backed up /var/cache/edb as you suggested and began emerge --metadata,
> ... First segfault occurred at 31%. Feeling bold i restarted the
> command and this time it went all the way to the magic 51% where it
> segfaulted as before. From here every emerge --metadata results in a
> segfault at 51% :-/
> If I understand you correctly the problem of this segfault is due to a
> specific file in the poretage tree. To correct this problem must I then
> locate this file?
Well, locating it would help, but it may be that it isn't necessary, as
there are other ways to tackle the problem.
A couple things to keep in mind: (1) Portage /can/ operate without that
cache -- it's just /very/ slow. Thus, if it comes to being a problem with
the portage you are running, you should still be able to merge a different
version. (2) We now know the problem regenerates from a clear cache.
At this point, with the problem regenerating from a clear cache, the next
thing I'd want to establish is that it's not a file system problem.
Delete the cache again. If you have /var or /var/cache on its own mount,
umount it (depending on whether you have /var/log on the same mount, and
on the services you are running, you may have to switch to single user
mode or at least shut down your syslog and perhaps other services in order
to umount /var) and do a full fsck on it. Remount and startup your
services again or simply reboot, and try the emerge --metadata again. If
the problem isn't yet gone, delete the cache again and continue...
The next item on the checklist is the file system containing the portage
tree itself. The tree can be redownloaded, so in general, it's safe to
delete. If you run FEATURES=buildpkg, as I've often recommended on this
list (different topic but something to look at once you get up and running
again, if you haven't already), and your $PKGDIR is in the portage tree as
it is by default (/usr/portage/packages, IIRC), you'll want to copy or
move that elsewhere. Depending on your internet speed and whether you are
charged per byte downloaded, you may wish to do the same thing with
$DISTDIR (/usr/portage/distfiles by default), which contains all the
source tarballs portage had downloaded. Then delete the portage tree, and
if it's on a non-root filesystem, unmount and fsck it as well. See below
for refetching, as there's an easier way than emerge --sync when you are
fetching the entire thing.
If either or both of the above are on your root filesystem, after the
deletes, reboot or boot to your rescue solution (the liveCD or
alternate boot volume or whatever) and do the fsck from there. The
deletes aren't absolutely necessary, but are worthwhile since the data is
redownloadable/rebuildable anyway, and if the problem /is/ a filesystem
error, it's easier just renewing the data than it is trying to rebuild the
file from incomplete data in lost&found. Additionally, if there happen to
be other errors on the filesystem and thus other files end up in
lost&found, it's easier to find the files you really /do/ need to recover
there if there's less noise from files that would be easier simply
refetched or recached.
Now that you know it's not a problem with a bad filesystem, the next step
is getting a new copy of the portage tree. Since we deleted the tree we
had, emerge --sync isn't the most efficient option, tho it would normally
do the job. Rather, and this kills two birds with one stone as it's the
next thing to try as well, use emerge-webrsync. This fetches a verified
snapshot tarball of the tree taken daily, so it's not quite as uptodate as
a live sync would be (it could be up to 24 hours old), but it's more
efficient if you aren't starting with a mostly uptodate tree with only a
few changes needed, than emerge --sync would be. Doing it this way, we
test another sync method and ensure that we get a complete copy of the
tree, as well, bypassing the rsync and any possibly broken files that had
been causing problems in your local copy of the tree.
emerge-webrsync performs an emerge --metadata after completing the tree
sync, so if it goes fine, you should be back in business. Try another
emerge --sync and see.
If you are still having problems at /that/ point, having verified that
it's not a filesystem issue, and trying a completely new copy of the tree
fetched with emerge-webrsync, /then/ things start getting interesting.
There are still some things that can be tried, but better to wait until we
know they are needed before getting worried. The output of
emerge-webrsync or the next sync where the problem reoccurs would be
interesting as well, so post it. Also, at this point, it may be useful to
file a portage bug and get the opinion of the real experts. However,
hopefully, that's not necessary, as a clean filesystem and copy of the
tree will have eliminated the issue.
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
email@example.com mailing list