Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: urgent: Segfaults after synchronously emerging|downloading 30GB|burnng a DVD iso image
Date: Sat, 27 May 2006 13:40:18
Message-Id: e59knt$89f$1@sea.gmane.org
In Reply to: [gentoo-amd64] urgent: Segfaults after synchronously emerging|downloading 30GB|burnng a DVD iso image by Dieter Ries
1 Dieter Ries <Clip2@×××.de> posted 20060527083416.76720@×××.net, excerpted
2 below, on Sat, 27 May 2006 10:34:16 +0200:
3
4 > i have a very severe and urgent problem:
5 >
6 > Then the emerge stopped with some error i dont remember, and after that,
7 > everything got somehow slow. because i didnt want the dvd to be ruined i
8 > waited till it was burned and the 30G were downloade. during my waiting i
9 > tried $top or $ps -A to see the running processes, but i got
10 > "Speicherzugriffsfehler", which is AFAIK the same as segmentation fault.
11 >
12 > i got it for everything i tried, and when i tried something from the KDE
13 > menu, nothing happened.
14
15 > when the data was downloaded and the dvd burned, i tried to shutdown from
16 > KDE, no success. the i tried to shutdown from the console, no success
17 > either. in the end i had to use the reset button.
18 >
19 > then, just after "freeing unused kernel memory" there are many errors, all
20 > looking quite the same[.] init hung at that state, nothing worked.
21 >
22 > so i got my livecd, botted from it, then i ran fsck for all the
23 > partitions, without any errors or anything, everything seemed fine.
24 >
25 > i then mounted my system and home partition and proc and typed chroot
26 > /mnt/gentoo /bin/bash, which was followed by, guess it: segmentation
27 > fault.
28 >
29 > the data on all my partitions from sda5 to 10 is still there and i can
30 > mount them all, but i cant chroot and i cant boot.
31 >
32 > so no gentoo anymore[,] i am now using knoppix 4.0 to write for help.
33 >
34 > is there any chance to get my gentoo back to life without completely
35 > install it again? and why does the system break when doing some things
36 > simultaneously?
37 >
38 > can this be a hardware issue?
39
40 OUCH!
41
42 It /could/ be a hardware issue, but as you can boot from LiveCD and the
43 fscks all come out fine, it wouldn't appear to be.
44
45 I think the problem is much more likely a glibc update gone bad.
46 Virtually /everything/ on a system links to glibc, so when it goes bad,
47 you end up as they say "Up a creek without a paddle!"
48
49 I've actually had it happen once, when a portage bug was triggered by an
50 obscure series of events that happened to all come together in a glibc
51 update. I was able to recover, however, as the problem in that case was a
52 bunch of missing symlinks, and I happened to have mc open at the time and
53 just didn't close it, but restored enough symlinks by hand based on
54 trying to run something and getting the error and fixing that symlink
55 and trying again, using mc to get enough of a working system to finish
56 recovery by opening up a binpkged version (thanks to FEATURES=buildpkg,
57 that's one of the times it saved my butt!) of glibc and restoring the
58 symlinks with a mass copy from there. (I had to do the manual error,
59 rebuild symlink cycle several times, until I got enough of them rebuilt to
60 at least run bzip2 so I could untar the appropriate glibc tbz2 binpkg.)
61
62 So anyway, yeah, I know the feeling!
63
64 Assuming the problem is indeed glibc
65
66 If you have been using FEATURES=buildpkg, recovery shouldn't be too
67 difficult. Simply boot the LiveCD, mount the hard drive root and /usr and
68 /var partitions if you have them, and untar the last correctly working
69 glibc package over the hard drive root. Don't chroot to it until after
70 the untar, so you don't kill functionality, just untar the package to the
71 mounted hard drive root with any other partitions it might write to
72 mounted to the correct place on top of that root.
73
74 Note that you'll probably want to save copies of any of the following
75 files in /etc that you've modified, as the untarring will overwrite them.
76 You can restore them afterward. host.conf, init.d/nscd, nscd.conf,
77 nsswitch.conf, rpc.
78
79 If you haven't been using FEATURES=buildpkg, the process is a bit more
80 complicated, but still nothing to panic over. You'll have to use the
81 quickpkg feature on the CD to build a copy of the glibc package on the CD,
82 then untar it over the mounted hard drive root as above (saving backups of
83 the /etc files as above too).
84
85 After this and recovery of the backed up /etc files, if the problem was
86 indeed glibc, you should again have a working system. Since you bypassed
87 portage by untarring the glibc directly, however, the version of glibc
88 that portage thinks is installed will probably be wrong. Thus, you'll
89 want to remerge a known working version using portage. Again, that won't
90 be a big deal if you've been using FEATURES=buildpkg, since you can just
91 emerge -K the version you untarred. If not, you'll need to recompile a
92 new version, which of course will take awhile. You may wish to wait until
93 after tonite's gaming thing, if you won't have time to recompile it before
94 then.
95
96 After you have your system back up and running, consider a couple things
97 that might make life easier next time.
98
99 Obviously, I'm going to recommend adding buildpkg to your features if you
100 haven't got it there already. It really /can/ help. To jumpstart the
101 binary package store then, consider using quickpkg to package up all your
102 vital packages, gcc, glibc, portage, python, binutils, etc, at a minimum.
103 If you want to get everything packaged right away, use emerge --pretend
104 --emptytree to get a list, and package all those up using quickpkg. (You
105 can automate the process if you wish using tools such as cut to get the
106 appropriate fields out of the emerge --pretend output, then feed that
107 to a file for further editing if desired, and then into quickpkg as the
108 list of packages it needs to package. I did it this way when I
109 jumpstarted my binpkg cache.) Alternatively, you can just add the
110 buildpkg feature and emerge --emptytree world, but that will of course
111 take awhile.
112
113 Second suggestion and something I'm again doing here, consider creating a
114 second copy of your root partition, with /var and /usr as well if you have
115 them separate. Then, periodically, when you know you have a stable
116 running system, erase the copy and recopy everything over from your known
117 stable running system. The idea here is that if your system goes haywire
118 for whatever reason, you can simply boot the backup root partition, which
119 will have a complete working system on it as of the time you did the
120 backup. Thus, no worries about this happening again, as you can just boot
121 the backup system (provided you keep the snapshot fairly close to your
122 working system so you aren't trying to use something terribly outdated).
123
124 I actually do this with most of my system. The root partition has /usr
125 and /var on it as well, so the portage database (stored in /var/db) is
126 current with what's on that partition, and I keep a copy of that
127 partition, which I refer to as my rootmirror. Likewise, I keep a copy of
128 /home, a copy of my media partition, a copy of my packages (the result of
129 FEATURES=buildpkg) partition, etc. I don't worry about a copy of /var/log
130 (which is on a separate partition than /var), or about the portage tree
131 (which I can simply resync if it's lost), or /tmp (since the stuff in
132 there by definition need not survive a reboot). I make sure I keep the
133 backup copies updated to the point where if I lose everything on the
134 working copy, I am comfortable resuming from the backup copy, knowing that
135 I can redo anything changed between them in a reasonable time, should it
136 come to that.
137
138 If you had been doing this, then you wouldn't be sweating it now, as you'd
139 just have booted your backup copy and resumed from there. Thus, consider
140 setting up your system that way once you are back up and running, so you
141 aren't left in that sort of situation ever again. (Of course, if your
142 hard drive dies, that's another matter. Here, I use a 4-disk RAID-6 to
143 address that problem -- I can loose any two of the four hard drives
144 without losing anything vital. It's software RAID, so if the board goes,
145 I can buy another board, install the drives and CPUs, rebuild my kernel
146 for the new board using an emergency CD, and be up and running once again.
147 That is, however, about the only case where I'd have to use the emergency
148 CD, as in the other cases, I should still be able to boot to the backup
149 root snapshot and recover from there.)
150
151 Good luck! I hope it /is/ just glibc, as that's scary to recover from
152 when the problem occurs, but not the end of the world. If it's not glibc,
153 things get rather more complex, but all evidence so far says that's what
154 it is.
155
156 --
157 Duncan - List replies preferred. No HTML msgs.
158 "Every nonfree program has a lord, a master --
159 and if you use the program, he is your master." Richard Stallman
160
161 --
162 gentoo-amd64@g.o mailing list

Replies