1 |
Richard Fish wrote: |
2 |
> On 11/30/06, Vladimir G. Ivanovic <vgivanovic@×××××××.net> wrote: |
3 |
>> I have done nothing to my hardware and I've seen this error, oh, a |
4 |
>> half a dozen times, the last time 3 months (?) ago. I ran memtest when |
5 |
>> I installed new memory, and it did not report problems even when run |
6 |
>> for hours. |
7 |
> |
8 |
> memtest is basically useless these days. It can only tell you if you |
9 |
> have a bad memory cell, which almost never happens today. Most memory |
10 |
> problems are the result of timing issues between the processor(s) and |
11 |
> DMA controllers. |
12 |
> |
13 |
> This script [1] seems to be a much better memory test for modern |
14 |
> systems, although you may have to make some tweaks to run it on |
15 |
> Gentoo. |
16 |
|
17 |
Just for kicks I'll run the script and see what happens. |
18 |
|
19 |
> |
20 |
>> And I do not get random segfaults with other programs. |
21 |
> |
22 |
> Yes, compiling is very unique in this regard. The memory access |
23 |
> pattern of a compiler, reading and writing to locations on different |
24 |
> rows, or even different modules, under high CPU load and using lots of |
25 |
> memory, with some IO thrown in for good measure, tends to reveal |
26 |
> hardware problems quite nicely. |
27 |
> |
28 |
>> Finally, I don't think my hardware fixed itself. |
29 |
>> |
30 |
>> Given all of this, my suspicion is that these errors are software |
31 |
>> bugs, not hardware problems. |
32 |
|
33 |
For grins, here is part of comment #174: |
34 |
|
35 |
Random segfaults during compilation. ... in general a sign of |
36 |
hardware problems. |
37 |
|
38 |
// No, this is in general a sign of GCC 4.1 - problem ;-) |
39 |
> |
40 |
> If we were talking about a driver, or an event-based GUI program, I |
41 |
> might agree. But a compiler is going to take the exact same actions |
42 |
> given the same input and options. The compiler isn't going to do |
43 |
> something different between 2 different executions over the _exact_ |
44 |
> same sources because it feels like it. |
45 |
|
46 |
You're right at the logical level, but not at the physical level. |
47 |
Cache effects and different disk accesses are two physical differences |
48 |
that spring to mind. Temporary files will be in different physical |
49 |
sectors, or in the buffer cache or not; directories may or may not be |
50 |
in the directory cache. Depending on what else is running, the pattern |
51 |
of cache misses will be different. |
52 |
|
53 |
I emerge with -j2. Plus I'm doing work while the emerges happen. The |
54 |
likelihood of the memory access pattern of two compiles being the same |
55 |
is precisely zero. |
56 |
|
57 |
> |
58 |
>> |
59 |
>> The other thing that I don't really believe is the part about "this |
60 |
>> bug not being reproducible" as reported by portage/emerge/make/gcc. |
61 |
> |
62 |
> Then you should read the gcc sources. One of the patches applied by |
63 |
> Gentoo adds a retry loop when the compiler is about to exit with an |
64 |
> internal compiler error (ICE). It retries the compile twice, and if |
65 |
> either of those succeeds, you get the "The bug is not reproducible" |
66 |
> message. |
67 |
|
68 |
Interesting. I did not know that. But I don't get why gcc exits with |
69 |
an error when the second (or third) try succeeds? Why not just print a |
70 |
warning, perhaps at the end so it is noticeable? Most people will |
71 |
restart the entire emerge, which seems like a gargantuan amount of |
72 |
wasted effort since the re-compilation has succeeded. |
73 |
|
74 |
> It doesn't output anything because that would possibly |
75 |
> obscure the original error. |
76 |
> |
77 |
> The gentoo devs probably added this loop to avoid more duplicates of [2]. |
78 |
> |
79 |
> -Richard |
80 |
> |
81 |
> [1] http://people.redhat.com/dledford/memtest.html |
82 |
> [2] http://bugs.gentoo.org/show_bug.cgi?id=20600 |
83 |
|
84 |
-- |
85 |
gentoo-user@g.o mailing list |