Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Re: Segfault in emerge
Date: Sun, 21 May 2006 14:35:14
Message-Id: e4ptk1$muh$1@sea.gmane.org
In Reply to: Re: [gentoo-amd64] Re: Segfault in emerge by "Anders Thøgersen"
1 Anders =?iso-8859-1?Q?Th=F8gersen?= <anderslt@×××××.com> posted
2 20060520223006.GA9058@××××××.mydomain, excerpted below, on Sun, 21 May
3 2006 00:30:06 +0200:
4
5 > On 04:52 Fri 12 May 2006, Duncan wrote:
6 >> Anders posted as summarized on 12 May 2006:
7 >>
8 >> > [Repeatable segfault doing emerge sync at 51%. Portage-2.0.54]
9 >>
10 >> [That's almost certainly a portage cache corruption issue. Try emerge
11 >> --metadata. That should just update the cache without doing the sync
12 >> part first. If that fails, delete the cache and run emerge --metadata
13 >> again, to rebuild it.]
14 >
15 > Sorry for the late reply,...
16
17 Don't worry too much about the timeliness as the problem's yours, not
18 mine, so your schedule. From the other side, that's one reason I prefer
19 newsgroups or mailing lists to private help -- if one person doesn't get
20 in a timely reply, someone else likely will. (The other big reason is
21 that no single person always guesses the problem right or has the
22 experience to fix it, and a list/newsgroup allows more folks a chance to
23 look at it than private mail would.)
24
25 > I backed up /var/cache/edb as you suggested and began emerge --metadata,
26 > ... First segfault occurred at 31%. Feeling bold i restarted the
27 > command and this time it went all the way to the magic 51% where it
28 > segfaulted as before. From here every emerge --metadata results in a
29 > segfault at 51% :-/
30 >
31 > If I understand you correctly the problem of this segfault is due to a
32 > specific file in the poretage tree. To correct this problem must I then
33 > locate this file?
34
35 Well, locating it would help, but it may be that it isn't necessary, as
36 there are other ways to tackle the problem.
37
38 A couple things to keep in mind: (1) Portage /can/ operate without that
39 cache -- it's just /very/ slow. Thus, if it comes to being a problem with
40 the portage you are running, you should still be able to merge a different
41 version. (2) We now know the problem regenerates from a clear cache.
42
43 At this point, with the problem regenerating from a clear cache, the next
44 thing I'd want to establish is that it's not a file system problem.
45 Delete the cache again. If you have /var or /var/cache on its own mount,
46 umount it (depending on whether you have /var/log on the same mount, and
47 on the services you are running, you may have to switch to single user
48 mode or at least shut down your syslog and perhaps other services in order
49 to umount /var) and do a full fsck on it. Remount and startup your
50 services again or simply reboot, and try the emerge --metadata again. If
51 the problem isn't yet gone, delete the cache again and continue...
52
53 The next item on the checklist is the file system containing the portage
54 tree itself. The tree can be redownloaded, so in general, it's safe to
55 delete. If you run FEATURES=buildpkg, as I've often recommended on this
56 list (different topic but something to look at once you get up and running
57 again, if you haven't already), and your $PKGDIR is in the portage tree as
58 it is by default (/usr/portage/packages, IIRC), you'll want to copy or
59 move that elsewhere. Depending on your internet speed and whether you are
60 charged per byte downloaded, you may wish to do the same thing with
61 $DISTDIR (/usr/portage/distfiles by default), which contains all the
62 source tarballs portage had downloaded. Then delete the portage tree, and
63 if it's on a non-root filesystem, unmount and fsck it as well. See below
64 for refetching, as there's an easier way than emerge --sync when you are
65 fetching the entire thing.
66
67 If either or both of the above are on your root filesystem, after the
68 deletes, reboot or boot to your rescue solution (the liveCD or
69 alternate boot volume or whatever) and do the fsck from there. The
70 deletes aren't absolutely necessary, but are worthwhile since the data is
71 redownloadable/rebuildable anyway, and if the problem /is/ a filesystem
72 error, it's easier just renewing the data than it is trying to rebuild the
73 file from incomplete data in lost&found. Additionally, if there happen to
74 be other errors on the filesystem and thus other files end up in
75 lost&found, it's easier to find the files you really /do/ need to recover
76 there if there's less noise from files that would be easier simply
77 refetched or recached.
78
79 Now that you know it's not a problem with a bad filesystem, the next step
80 is getting a new copy of the portage tree. Since we deleted the tree we
81 had, emerge --sync isn't the most efficient option, tho it would normally
82 do the job. Rather, and this kills two birds with one stone as it's the
83 next thing to try as well, use emerge-webrsync. This fetches a verified
84 snapshot tarball of the tree taken daily, so it's not quite as uptodate as
85 a live sync would be (it could be up to 24 hours old), but it's more
86 efficient if you aren't starting with a mostly uptodate tree with only a
87 few changes needed, than emerge --sync would be. Doing it this way, we
88 test another sync method and ensure that we get a complete copy of the
89 tree, as well, bypassing the rsync and any possibly broken files that had
90 been causing problems in your local copy of the tree.
91
92 emerge-webrsync performs an emerge --metadata after completing the tree
93 sync, so if it goes fine, you should be back in business. Try another
94 emerge --sync and see.
95
96 If you are still having problems at /that/ point, having verified that
97 it's not a filesystem issue, and trying a completely new copy of the tree
98 fetched with emerge-webrsync, /then/ things start getting interesting.
99 There are still some things that can be tried, but better to wait until we
100 know they are needed before getting worried. The output of
101 emerge-webrsync or the next sync where the problem reoccurs would be
102 interesting as well, so post it. Also, at this point, it may be useful to
103 file a portage bug and get the opinion of the real experts. However,
104 hopefully, that's not necessary, as a clean filesystem and copy of the
105 tree will have eliminated the issue.
106
107
108
109 --
110 Duncan - List replies preferred. No HTML msgs.
111 "Every nonfree program has a lord, a master --
112 and if you use the program, he is your master." Richard Stallman
113
114 --
115 gentoo-amd64@g.o mailing list