1 |
Lance Jacobs <lance@×××××××.net> posted |
2 |
Pine.LNX.4.64.0605311051370.17278@×××××××××××××××.net, excerpted below, on |
3 |
Wed, 31 May 2006 10:58:16 -0400: |
4 |
|
5 |
> Just FYI, I have this fixed now, and you were right on the money. I |
6 |
> wouldn't have believed it if I hadn't seen it, but it was the RAM. I |
7 |
> replaced the OCZ memory with equivalent parts from Crucial, and the |
8 |
> system is fine now. It still seems strange that many things seemed to |
9 |
> run fine with the old RAM, except for bunzip2 and md5sum, and now |
10 |
> everything is good -- only some code is intolerant of bad bits? It just |
11 |
> seems wrong.... Anyway, the system is rock solid now. |
12 |
|
13 |
It's all down to the application (as in what the software is does, |
14 |
rather than as in the specific executable)... md5sum and bunzip2 just |
15 |
happen to be in a class of application that happens to be more sensitive |
16 |
to this sort of thing than others, since their application (or part of it) |
17 |
is that they both verify integrity, and even a single bit-flip somewhere |
18 |
will cause that verification to fail. Most applications aren't that |
19 |
sensitive. In a normal executable, a single random bit-flip won't make a |
20 |
lot of difference. If it's in the lower order bits of an image bitmap or |
21 |
sound sample, you'll not notice it at all (see steganography), and |
22 |
certainly the result can still be played or viewed without error. If it's |
23 |
in the wrong place in an executable, you'll get bad results, but likely |
24 |
not bad enough to immediately crash, but rather, output that gets worse |
25 |
and worse, executables that get less and less stable, over time. If you |
26 |
don't tend to run executables for days or weeks at a time, if you shut |
27 |
down your computer when not in use, and you never use integrity |
28 |
verification applications, such memory unreliability may go entirely |
29 |
undetected and unsuspected. |
30 |
|
31 |
You mention that you have the memory in something else now, for further |
32 |
testing. Note that depending on the exact nature of the problem, the |
33 |
memory may come up clean on a different mobo. Hardware tolerances and |
34 |
resistance to data signal noise being what they are, it's entirely |
35 |
possible the memory was just at one end of the spec and the board at the |
36 |
other, in terms of tolerances that would work, and they were thus |
37 |
incompatible with each other, while each remains within spec or only |
38 |
slightly out of spec, and will work with 90% of what's out there -- they |
39 |
just wouldn't work when that particular memory was in that particular |
40 |
board. |
41 |
|
42 |
Based on the experience I had (which I posted earlier), you may also find |
43 |
that the memory is perfectly fine under most conditions, but is subject to |
44 |
errors in certain corner conditions. If you've ever seen the complete set |
45 |
of non-auto memory parameters available in some BIOS setups, there's quite |
46 |
a list of them, ten or so. If one of the rare corner-case ones doesn't |
47 |
meet the on-stick memory ratings by even a single clock, but that state |
48 |
transfer doesn't happen very frequently and even when it does, all but one |
49 |
of the chips on the stick is fine with it, and even then, that single |
50 |
exception only happens in a certain temperature zone in the case of the |
51 |
third memory access in a row of a specific pattern, it could be extremely |
52 |
difficult to find or verify, yet cause annoying problems just often enough |
53 |
to be a real frustration! |
54 |
|
55 |
You did mention that the memory was under warrantee, however, and that |
56 |
it's going back, regardless, and that's a wise decision. |
57 |
|
58 |
|
59 |
|
60 |
-- |
61 |
Duncan - List replies preferred. No HTML msgs. |
62 |
"Every nonfree program has a lord, a master -- |
63 |
and if you use the program, he is your master." Richard Stallman |
64 |
|
65 |
-- |
66 |
gentoo-amd64@g.o mailing list |