1 |
On Samstag 06 Juni 2009, Alexander Puchmayr wrote: |
2 |
> Hi there! |
3 |
> |
4 |
> This week I've tried to setup a home-server, but the system is highly |
5 |
> instable. The first symptoms were lots of page allocation errors, which |
6 |
> disappeared after setting the internal memory allocator from SLUB to SLAB |
7 |
> and increasing the min_free_kbytes in /proc/sys/vm from 8MB to 20MB. |
8 |
> |
9 |
> The machine is a AMD Athlon64X2 5050e on a asus M3A78-Pro board with 2x2GB |
10 |
> RAM. I'm using kernel 2.6.29.4 (vanilla, but the result is the same as |
11 |
> using 2.6.29-gentoo-r5), and I also upgraded the board's BIOS to the latest |
12 |
> version (which is 0902) |
13 |
> |
14 |
> But still the system freezes after some hours. It just freezes. Console is |
15 |
> dead, no entry in the logs, no network connectivity, even sysrq doesn't |
16 |
> seem to do anything. The worst thing is I don't even have an idea what the |
17 |
> error could be, and in the rare situations when it crashed and the console |
18 |
> was not blanked, I only see the end of a stack trace, and the intresting |
19 |
> parts are scrolled out (and I can't scroll back as the console is |
20 |
> absolutely dead :-( ) The only button that is still working is the reset |
21 |
> button, and after rebooting the log does't tell anything (just ends without |
22 |
> any message) |
23 |
> |
24 |
> I inspected my dmesg-output right after booting more precisely, and I've |
25 |
> found some strange entries which could indicate a problem. What do you |
26 |
> think about them? |
27 |
> |
28 |
> [ 0.000000] ACPI Warning (tbfadt-0568): 32/64X length mismatch in |
29 |
> Gpe0Block: 64/32 [20081204] |
30 |
> [ 0.000000] FADT: X_PM1a_EVT_BLK.bit_width (16) does not match |
31 |
> PM1_EVT_LEN (4) |
32 |
> ... |
33 |
> [ 0.000000] 4 Processors exceeds NR_CPUS limit of 2 |
34 |
> [ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs |
35 |
> ... |
36 |
> [ 0.000999] Aperture pointing to e820 RAM. Ignoring. |
37 |
> [ 0.000999] Your BIOS doesn't leave a aperture memory hole |
38 |
> [ 0.000999] Please enable the IOMMU option in the BIOS setup |
39 |
> [ 0.000999] This costs you 64 MB of RAM |
40 |
> [ 0.000999] Mapping aperture over 65536 KB of RAM @ 20000000 |
41 |
> [ 0.000999] PM: Registered nosave memory: 0000000020000000 - |
42 |
> 0000000024000000 |
43 |
> ... |
44 |
> [ 0.099055] mtrr: your CPUs had inconsistent fixed MTRR settings |
45 |
> [ 0.099059] mtrr: probably your BIOS does not setup all CPUs. |
46 |
> [ 0.099116] mtrr: corrected configuration. |
47 |
> ... |
48 |
> [ 0.151260] PCI-DMA: Disabling AGP. |
49 |
> [ 0.151260] PCI-DMA: aperture base @ 20000000 size 65536 KB |
50 |
> [ 0.151260] PCI-DMA: using GART IOMMU. |
51 |
> [ 0.151260] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture |
52 |
> ... |
53 |
> [ 0.163241] system 00:09: iomem range 0xfec00000-0xfec00fff has been |
54 |
> reserved |
55 |
> [ 0.163305] system 00:09: iomem range 0xfee00000-0xfee00fff has been |
56 |
> reserved |
57 |
> [ 0.163365] system 00:0a: ioport range 0x4d0-0x4d1 has been reserved |
58 |
> [ 0.163422] system 00:0a: ioport range 0x40b-0x40b has been reserved |
59 |
> [ 0.163480] system 00:0a: ioport range 0x4d6-0x4d6 has been reserved |
60 |
> [ 0.163537] system 00:0a: ioport range 0xc00-0xc01 has been reserved |
61 |
> [ 0.163595] system 00:0a: ioport range 0xc14-0xc14 has been reserved |
62 |
> [ 0.163653] system 00:0a: ioport range 0xc50-0xc51 has been reserved |
63 |
> [ 0.163711] system 00:0a: ioport range 0xc52-0xc52 has been reserved |
64 |
> [ 0.163769] system 00:0a: ioport range 0xc6c-0xc6c has been reserved |
65 |
> [ 0.163827] system 00:0a: ioport range 0xc6f-0xc6f has been reserved |
66 |
> [ 0.163885] system 00:0a: ioport range 0xcd0-0xcd1 has been reserved |
67 |
> [ 0.163942] system 00:0a: ioport range 0xcd2-0xcd3 has been reserved |
68 |
> [ 0.163999] system 00:0a: ioport range 0xcd4-0xcd5 has been reserved |
69 |
> [ 0.164070] system 00:0a: ioport range 0xcd6-0xcd7 has been reserved |
70 |
> [ 0.164127] system 00:0a: ioport range 0xcd8-0xcdf has been reserved |
71 |
> [ 0.164184] system 00:0a: ioport range 0x800-0x89f has been reserved |
72 |
> [ 0.164241] system 00:0a: ioport range 0xb00-0xb3f has been reserved |
73 |
> [ 0.164305] system 00:0a: ioport range 0x900-0x90f has been reserved |
74 |
> [ 0.164363] system 00:0a: ioport range 0x910-0x91f has been reserved |
75 |
> [ 0.164421] system 00:0a: ioport range 0xfe00-0xfefe has been reserved |
76 |
> [ 0.164480] system 00:0a: iomem range 0xffb80000-0xffbfffff has been |
77 |
> reserved |
78 |
> [ 0.164538] system 00:0a: iomem range 0xfec10000-0xfec1001f has been |
79 |
> reserved |
80 |
> [ 0.164598] system 00:0c: ioport range 0xe00-0xe0f has been reserved |
81 |
> [ 0.164656] system 00:0c: ioport range 0xe80-0xe8f has been reserved |
82 |
> [ 0.164713] system 00:0c: ioport range 0xf40-0xf4f has been reserved |
83 |
> [ 0.164771] system 00:0c: ioport range 0xa30-0xa3f has been reserved |
84 |
> [ 0.164830] system 00:0d: iomem range 0xe0000000-0xefffffff has been |
85 |
> reserved |
86 |
> [ 0.164890] system 00:0e: iomem range 0x0-0x9ffff could not be reserved |
87 |
> [ 0.164947] system 00:0e: iomem range 0xc0000-0xcffff has been reserved |
88 |
> [ 0.165018] system 00:0e: iomem range 0xe0000-0xfffff could not be |
89 |
> reserved |
90 |
> [ 0.165076] system 00:0e: iomem range 0x100000-0xdfffffff could not be |
91 |
> reserved |
92 |
> [ 0.165158] system 00:0e: iomem range 0xfec00000-0xffffffff could not be |
93 |
> reserved |
94 |
> ... |
95 |
> [ 21.298450] ACPI: I/O resource piix4_smbus [0xb00-0xb07] conflicts with |
96 |
> ACPI region SOR1 [0xb00-0xb0f] |
97 |
> [ 21.298454] ACPI: Device needs an ACPI driver |
98 |
> [ 21.298461] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00, |
99 |
> revision 0 |
100 |
> ... |
101 |
> [ 73.861479] ACPI: I/O resource it87 [0xe85-0xe86] conflicts with ACPI |
102 |
> region HWRE [0xe85-0xe86] |
103 |
> [ 73.861483] ACPI: Device needs an ACPI driver |
104 |
> |
105 |
> Whats does this message "4 Processors exceeds NR_CPUS" say? the system is a |
106 |
> Dual-Core AMD Athlon64 5050e, AFAIK it has two cores and nothing more. The |
107 |
> mttr-Message later also indicate that there could be more than 2 CPUs |
108 |
> available. wondering... |
109 |
> |
110 |
> The next thing which seems somewhat strange to me is the AGP aperture and |
111 |
> the IOMMU. The Mainboard does not have an AGP port, nor does the bios have |
112 |
> any option to enable. The only thing I can set is the size of the memory |
113 |
> reservered for the onboad video card, which I set to the smallest value of |
114 |
> 32MB as the machine will usually not even have a display. |
115 |
> |
116 |
> The iomem-range reservation errors at the end? Harmful or not? |
117 |
> |
118 |
> The last messages come after loading the hw-sensors modules it87.ko and |
119 |
> i2c_piix4. |
120 |
> |
121 |
> Thanks in advance for suggestions |
122 |
> Alex |
123 |
|
124 |
*sigh* Ok, just for starters - all AMD cpus of the Athlon64 architecture have |
125 |
a builtin agpgart. This agpgart functions also as an iommu. This is a great |
126 |
hack to have a hardware iommu . Intel does not have this, so they rely on |
127 |
software. The solution came up while AMD devs and linux kernel devs worked |
128 |
together. |
129 |
Please read the following links: |
130 |
|
131 |
http://en.wikipedia.org/wiki/Iommu |
132 |
|
133 |
http://marc.info/?l=linux-kernel&m=107759901509280&w=2 |
134 |
|
135 |
http://marc.info/?l=linux-kernel&m=107764033904042&w=2 |
136 |
|
137 |
the iommu is needed so 32bit pci devices can live with their pci adress space |
138 |
behind 4gb and other sweet things. |
139 |
|
140 |
Sadly the iommu needs a minimum on memory for itself - and uses the agp- |
141 |
aperture. This is fine, but mobo vendors suck and make it too small/or not |
142 |
available. In that case the kernel is forced to use real memory for the iommu. |
143 |
|
144 |
In short, that message has nothing to do with your problem. |
145 |
|
146 |
The NR_CPU message is confusing - I strongly suspect that your kernel config |
147 |
is really fucked uo. |
148 |
|
149 |
The iomem-range messages are harmless. |
150 |
|
151 |
Please enable: |
152 |
|
153 |
[] Check for low memory corruption |
154 |
[] Reserve low 64K of RAM on AMI/Phoenix BIOSen |
155 |
|
156 |
in the kernel config. Also clean it up and remove stuff like 'hyperthreading |
157 |
scheduler'. |
158 |
|
159 |
If the problem persists, start testing your hardware. |
160 |
|
161 |
I would suspect the PSU. |