Gentoo Archives: gentoo-amd64

From: Wil Reichert <wil.reichert@×××××.com>
To: gentoo-amd64@l.g.o
Subject: Re: [gentoo-amd64] Re: Gentoo crashing?
Date: Mon, 14 May 2007 14:01:37
Message-Id: 7a329d910705140657wa8f2a85w1bae25cf872ef857@mail.gmail.com
In Reply to: Re: [gentoo-amd64] Re: Gentoo crashing? by Isidore Ducasse
1 mobo == motherboard
2
3 I always use matched ram. I also stick to well known name brands
4 (corsair, kingston, OCZ, etc). With todays dual channel RAM
5 controllers you _really_ want your RAM to have identical timings,
6 voltages, etc. If all your sticks are following JEDEC standards it
7 shouldn't matter, but I've been building my own & other peoples
8 machines long enough to be superstitious.
9
10 Wil
11
12 On 5/14/07, Isidore Ducasse <ducasse.isidore@×××××.com> wrote:
13 > Very interesting post!
14 > Could you explain what "mobo" means?
15 > And BTW (_almost_ off-topic...) I've heard that RAM sticks should be identical when plugged on the same motherboard, but it was some "good vendor advice" so I'd rather rely on some experienced user's answer.
16 > So is there an issue if two RAM sticks of different brands are plugged on the same motherboard? What if, whilst of the same brand, they don't have the same capacity? Could Peter's issue be related to this kind of problem?
17 >
18 > On Mon, 14 May 2007 10:50:31 +0000 (UTC)
19 > Duncan <1i5t5.duncan@×××.net> wrote:
20 >
21 > > "Peter Davoust" <worldgnat@×××××.com> posted
22 > > 7c08b4dd0705132304h5eccea49k22513343959aff52@××××××××××.com, excerpted
23 > > below, on Mon, 14 May 2007 02:04:30 -0400:
24 > >
25 > > > I agree, it could be the heat, and that was the first thing that came to
26 > > > my mind, but Vista boots and runs for long periods of time with no
27 > > > issues. I'll check it out with the new kernel in the morning and see
28 > > > what it does.
29 > >
30 > > Note that Gentoo tends to use hardware to its limits rather more than
31 > > most OSs, MSWormOS and other Linux distributions alike. Vista is so new,
32 > > and /does/ stress at least the video hardware rather more (if aero is on,
33 > > anyway), so I don't know if anyone can rightly say with it, but certainly
34 > > with older MS platforms, it hasn't been uncommon at /all/ for Gentoo to
35 > > cause problems where MS didn't, and even other Linux distributions didn't.
36 > >
37 > > Part of the reason is that Gentoo tends to be compiled/optimized for the
38 > > specific CPU it's running on, so it makes more efficient use of it,
39 > > including use of functionality distributions (and MS) compiled for use on
40 > > generic hardware simply don't use, plus simply the fact that when the CPU
41 > > is busy, it's often getting more done in the same time, so it IS working
42 > > harder and therefore stressing out the hardware more.
43 > >
44 > > Anyway, just because another OS doesn't have problems on a computer
45 > > doesn't mean Gentoo won't, and there are quite a number of folks on the
46 > > forums and on the gentoo-user list that will tell you the same thing --
47 > > learned from hard experience.
48 > >
49 > > Meanwhile, you mention specifically that one of the crashes was during a
50 > > bz2 decompress. As someone who has HAD memory issues in the past, I can
51 > > DEFINITELY tell you that bz2 DOES often trigger memory errors, if
52 > > ANYTHING will! If the issues with BZ2 turn out to be common, CHECK THAT
53 > > MEMORY, and check it again! You mentioned you have 2 gigs. Hopefully
54 > > it's in the form of 2 or more sticks. If so, you should be able to take
55 > > part of it out and see if the problem persists. Then test the other
56 > > memory. If the problem happens with one set but not the other, you have
57 > > your problem. Do note, however, that just because the problem continues
58 > > to occur with either memory set doesn't necessarily mean it's not the
59 > > memory, particularly if they are the same brand and size, purchased from
60 > > the same place at the same time, so are likely in the same lot.
61 > >
62 > > In my case, I had purchased generic memory that couldn't quite do its
63 > > rated pc3200 (clock at 200 MHz x 2, since it was DDR). I ran memtest and
64 > > it passed with flying colors, because the memory worked fine, and memtest
65 > > apparently doesn't really stress the memory timings, only testing the
66 > > memory cells. However, I was crashing in operation, sometimes just the
67 > > app, sometimes the entire kernel would panic. I turned on the kernel's
68 > > MCE (machine check exception) reporting, and the memory was indeed the
69 > > problem (google MCEs, there's an app available that you can run, feeding
70 > > it the numbers, and it'll spit out the error in English), only wasn't
71 > > quite sure whether it was the memory itself, or the mobo, causing
72 > > perfectly good memory to generate errors upon data delivery because it
73 > > couldn't reliably get the data to the CPU.
74 > >
75 > > While I didn't have the necessary BIOS settings at the time, sometime
76 > > later a BIOS update gave me additional memory settings, and I found that
77 > > reducing the memory timings by a single notch, to 183 MHz (DDR doubled to
78 > > 366), effectively PC3000 memory, did the trick. I was even able to tweak
79 > > some of the individual wait-state settings to get back a bit of the
80 > > performance I lost with the under-clocking. The memory and entire
81 > > machine was rock-stable at the 183 MHz PC3000 memory setting.
82 > >
83 > > Later I upgraded from my then two 512 MB sticks to four 2 GB sticks, 8
84 > > gigs memory total. It was indeed the memory, not the board, as the new
85 > > memory was just as stable at PC3200 as the old memory had been at the
86 > > under-clocked PC3000 speed.
87 > >
88 > > Anyway, the way bzip2 works is apparently extremely stressful on memory,
89 > > as more than anything else, that would trigger the errors. Compiles were
90 > > frustrating too, but sometimes I could compile for quite some time
91 > > without issues. That's why I didn't think it was the CPUs even before I
92 > > got the program to read the MCE numbers and tell me what they were. They
93 > > confirmed, it was memory related, the errors were on data as the CPU got
94 > > it. I just didn't know until I actually changed memory whether it was
95 > > the mobo generating errors on the data in transit, or the memory itself.
96 > > It turned out to be the memory.
97 > >
98 > > --
99 > > Duncan - List replies preferred. No HTML msgs.
100 > > "Every nonfree program has a lord, a master --
101 > > and if you use the program, he is your master." Richard Stallman
102 > >
103 > > --
104 > > gentoo-amd64@g.o mailing list
105 > >
106 > --
107 > gentoo-amd64@g.o mailing list
108 >
109 >
110 --
111 gentoo-amd64@g.o mailing list