Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Initial install issues
Date: Wed, 31 May 2006 16:58:37
Message-Id: e5khlh$lc4$1@sea.gmane.org
In Reply to: Re: [gentoo-amd64] Initial install issues by Lance Jacobs
1 Lance Jacobs <lance@×××××××.net> posted
2 Pine.LNX.4.64.0605311051370.17278@×××××××××××××××.net, excerpted below, on
3 Wed, 31 May 2006 10:58:16 -0400:
4
5 > Just FYI, I have this fixed now, and you were right on the money. I
6 > wouldn't have believed it if I hadn't seen it, but it was the RAM. I
7 > replaced the OCZ memory with equivalent parts from Crucial, and the
8 > system is fine now. It still seems strange that many things seemed to
9 > run fine with the old RAM, except for bunzip2 and md5sum, and now
10 > everything is good -- only some code is intolerant of bad bits? It just
11 > seems wrong.... Anyway, the system is rock solid now.
12
13 It's all down to the application (as in what the software is does,
14 rather than as in the specific executable)... md5sum and bunzip2 just
15 happen to be in a class of application that happens to be more sensitive
16 to this sort of thing than others, since their application (or part of it)
17 is that they both verify integrity, and even a single bit-flip somewhere
18 will cause that verification to fail. Most applications aren't that
19 sensitive. In a normal executable, a single random bit-flip won't make a
20 lot of difference. If it's in the lower order bits of an image bitmap or
21 sound sample, you'll not notice it at all (see steganography), and
22 certainly the result can still be played or viewed without error. If it's
23 in the wrong place in an executable, you'll get bad results, but likely
24 not bad enough to immediately crash, but rather, output that gets worse
25 and worse, executables that get less and less stable, over time. If you
26 don't tend to run executables for days or weeks at a time, if you shut
27 down your computer when not in use, and you never use integrity
28 verification applications, such memory unreliability may go entirely
29 undetected and unsuspected.
30
31 You mention that you have the memory in something else now, for further
32 testing. Note that depending on the exact nature of the problem, the
33 memory may come up clean on a different mobo. Hardware tolerances and
34 resistance to data signal noise being what they are, it's entirely
35 possible the memory was just at one end of the spec and the board at the
36 other, in terms of tolerances that would work, and they were thus
37 incompatible with each other, while each remains within spec or only
38 slightly out of spec, and will work with 90% of what's out there -- they
39 just wouldn't work when that particular memory was in that particular
40 board.
41
42 Based on the experience I had (which I posted earlier), you may also find
43 that the memory is perfectly fine under most conditions, but is subject to
44 errors in certain corner conditions. If you've ever seen the complete set
45 of non-auto memory parameters available in some BIOS setups, there's quite
46 a list of them, ten or so. If one of the rare corner-case ones doesn't
47 meet the on-stick memory ratings by even a single clock, but that state
48 transfer doesn't happen very frequently and even when it does, all but one
49 of the chips on the stick is fine with it, and even then, that single
50 exception only happens in a certain temperature zone in the case of the
51 third memory access in a row of a specific pattern, it could be extremely
52 difficult to find or verify, yet cause annoying problems just often enough
53 to be a real frustration!
54
55 You did mention that the memory was under warrantee, however, and that
56 it's going back, regardless, and that's a wise decision.
57
58
59
60 --
61 Duncan - List replies preferred. No HTML msgs.
62 "Every nonfree program has a lord, a master --
63 and if you use the program, he is your master." Richard Stallman
64
65 --
66 gentoo-amd64@g.o mailing list