Gentoo Archives: gentoo-user

From: Volker Armin Hemmann <volkerarmin@××××××××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Kernel freezes
Date: Sat, 06 Jun 2009 09:12:55
Message-Id: 200906061112.47241.volkerarmin@googlemail.com
In Reply to: [gentoo-user] Kernel freezes by Alexander Puchmayr
1 On Samstag 06 Juni 2009, Alexander Puchmayr wrote:
2 > Hi there!
3 >
4 > This week I've tried to setup a home-server, but the system is highly
5 > instable. The first symptoms were lots of page allocation errors, which
6 > disappeared after setting the internal memory allocator from SLUB to SLAB
7 > and increasing the min_free_kbytes in /proc/sys/vm from 8MB to 20MB.
8 >
9 > The machine is a AMD Athlon64X2 5050e on a asus M3A78-Pro board with 2x2GB
10 > RAM. I'm using kernel 2.6.29.4 (vanilla, but the result is the same as
11 > using 2.6.29-gentoo-r5), and I also upgraded the board's BIOS to the latest
12 > version (which is 0902)
13 >
14 > But still the system freezes after some hours. It just freezes. Console is
15 > dead, no entry in the logs, no network connectivity, even sysrq doesn't
16 > seem to do anything. The worst thing is I don't even have an idea what the
17 > error could be, and in the rare situations when it crashed and the console
18 > was not blanked, I only see the end of a stack trace, and the intresting
19 > parts are scrolled out (and I can't scroll back as the console is
20 > absolutely dead :-( ) The only button that is still working is the reset
21 > button, and after rebooting the log does't tell anything (just ends without
22 > any message)
23 >
24 > I inspected my dmesg-output right after booting more precisely, and I've
25 > found some strange entries which could indicate a problem. What do you
26 > think about them?
27 >
28 > [ 0.000000] ACPI Warning (tbfadt-0568): 32/64X length mismatch in
29 > Gpe0Block: 64/32 [20081204]
30 > [ 0.000000] FADT: X_PM1a_EVT_BLK.bit_width (16) does not match
31 > PM1_EVT_LEN (4)
32 > ...
33 > [ 0.000000] 4 Processors exceeds NR_CPUS limit of 2
34 > [ 0.000000] SMP: Allowing 2 CPUs, 0 hotplug CPUs
35 > ...
36 > [ 0.000999] Aperture pointing to e820 RAM. Ignoring.
37 > [ 0.000999] Your BIOS doesn't leave a aperture memory hole
38 > [ 0.000999] Please enable the IOMMU option in the BIOS setup
39 > [ 0.000999] This costs you 64 MB of RAM
40 > [ 0.000999] Mapping aperture over 65536 KB of RAM @ 20000000
41 > [ 0.000999] PM: Registered nosave memory: 0000000020000000 -
42 > 0000000024000000
43 > ...
44 > [ 0.099055] mtrr: your CPUs had inconsistent fixed MTRR settings
45 > [ 0.099059] mtrr: probably your BIOS does not setup all CPUs.
46 > [ 0.099116] mtrr: corrected configuration.
47 > ...
48 > [ 0.151260] PCI-DMA: Disabling AGP.
49 > [ 0.151260] PCI-DMA: aperture base @ 20000000 size 65536 KB
50 > [ 0.151260] PCI-DMA: using GART IOMMU.
51 > [ 0.151260] PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
52 > ...
53 > [ 0.163241] system 00:09: iomem range 0xfec00000-0xfec00fff has been
54 > reserved
55 > [ 0.163305] system 00:09: iomem range 0xfee00000-0xfee00fff has been
56 > reserved
57 > [ 0.163365] system 00:0a: ioport range 0x4d0-0x4d1 has been reserved
58 > [ 0.163422] system 00:0a: ioport range 0x40b-0x40b has been reserved
59 > [ 0.163480] system 00:0a: ioport range 0x4d6-0x4d6 has been reserved
60 > [ 0.163537] system 00:0a: ioport range 0xc00-0xc01 has been reserved
61 > [ 0.163595] system 00:0a: ioport range 0xc14-0xc14 has been reserved
62 > [ 0.163653] system 00:0a: ioport range 0xc50-0xc51 has been reserved
63 > [ 0.163711] system 00:0a: ioport range 0xc52-0xc52 has been reserved
64 > [ 0.163769] system 00:0a: ioport range 0xc6c-0xc6c has been reserved
65 > [ 0.163827] system 00:0a: ioport range 0xc6f-0xc6f has been reserved
66 > [ 0.163885] system 00:0a: ioport range 0xcd0-0xcd1 has been reserved
67 > [ 0.163942] system 00:0a: ioport range 0xcd2-0xcd3 has been reserved
68 > [ 0.163999] system 00:0a: ioport range 0xcd4-0xcd5 has been reserved
69 > [ 0.164070] system 00:0a: ioport range 0xcd6-0xcd7 has been reserved
70 > [ 0.164127] system 00:0a: ioport range 0xcd8-0xcdf has been reserved
71 > [ 0.164184] system 00:0a: ioport range 0x800-0x89f has been reserved
72 > [ 0.164241] system 00:0a: ioport range 0xb00-0xb3f has been reserved
73 > [ 0.164305] system 00:0a: ioport range 0x900-0x90f has been reserved
74 > [ 0.164363] system 00:0a: ioport range 0x910-0x91f has been reserved
75 > [ 0.164421] system 00:0a: ioport range 0xfe00-0xfefe has been reserved
76 > [ 0.164480] system 00:0a: iomem range 0xffb80000-0xffbfffff has been
77 > reserved
78 > [ 0.164538] system 00:0a: iomem range 0xfec10000-0xfec1001f has been
79 > reserved
80 > [ 0.164598] system 00:0c: ioport range 0xe00-0xe0f has been reserved
81 > [ 0.164656] system 00:0c: ioport range 0xe80-0xe8f has been reserved
82 > [ 0.164713] system 00:0c: ioport range 0xf40-0xf4f has been reserved
83 > [ 0.164771] system 00:0c: ioport range 0xa30-0xa3f has been reserved
84 > [ 0.164830] system 00:0d: iomem range 0xe0000000-0xefffffff has been
85 > reserved
86 > [ 0.164890] system 00:0e: iomem range 0x0-0x9ffff could not be reserved
87 > [ 0.164947] system 00:0e: iomem range 0xc0000-0xcffff has been reserved
88 > [ 0.165018] system 00:0e: iomem range 0xe0000-0xfffff could not be
89 > reserved
90 > [ 0.165076] system 00:0e: iomem range 0x100000-0xdfffffff could not be
91 > reserved
92 > [ 0.165158] system 00:0e: iomem range 0xfec00000-0xffffffff could not be
93 > reserved
94 > ...
95 > [ 21.298450] ACPI: I/O resource piix4_smbus [0xb00-0xb07] conflicts with
96 > ACPI region SOR1 [0xb00-0xb0f]
97 > [ 21.298454] ACPI: Device needs an ACPI driver
98 > [ 21.298461] piix4_smbus 0000:00:14.0: SMBus Host Controller at 0xb00,
99 > revision 0
100 > ...
101 > [ 73.861479] ACPI: I/O resource it87 [0xe85-0xe86] conflicts with ACPI
102 > region HWRE [0xe85-0xe86]
103 > [ 73.861483] ACPI: Device needs an ACPI driver
104 >
105 > Whats does this message "4 Processors exceeds NR_CPUS" say? the system is a
106 > Dual-Core AMD Athlon64 5050e, AFAIK it has two cores and nothing more. The
107 > mttr-Message later also indicate that there could be more than 2 CPUs
108 > available. wondering...
109 >
110 > The next thing which seems somewhat strange to me is the AGP aperture and
111 > the IOMMU. The Mainboard does not have an AGP port, nor does the bios have
112 > any option to enable. The only thing I can set is the size of the memory
113 > reservered for the onboad video card, which I set to the smallest value of
114 > 32MB as the machine will usually not even have a display.
115 >
116 > The iomem-range reservation errors at the end? Harmful or not?
117 >
118 > The last messages come after loading the hw-sensors modules it87.ko and
119 > i2c_piix4.
120 >
121 > Thanks in advance for suggestions
122 > Alex
123
124 *sigh* Ok, just for starters - all AMD cpus of the Athlon64 architecture have
125 a builtin agpgart. This agpgart functions also as an iommu. This is a great
126 hack to have a hardware iommu . Intel does not have this, so they rely on
127 software. The solution came up while AMD devs and linux kernel devs worked
128 together.
129 Please read the following links:
130
131 http://en.wikipedia.org/wiki/Iommu
132
133 http://marc.info/?l=linux-kernel&m=107759901509280&w=2
134
135 http://marc.info/?l=linux-kernel&m=107764033904042&w=2
136
137 the iommu is needed so 32bit pci devices can live with their pci adress space
138 behind 4gb and other sweet things.
139
140 Sadly the iommu needs a minimum on memory for itself - and uses the agp-
141 aperture. This is fine, but mobo vendors suck and make it too small/or not
142 available. In that case the kernel is forced to use real memory for the iommu.
143
144 In short, that message has nothing to do with your problem.
145
146 The NR_CPU message is confusing - I strongly suspect that your kernel config
147 is really fucked uo.
148
149 The iomem-range messages are harmless.
150
151 Please enable:
152
153 [] Check for low memory corruption
154 [] Reserve low 64K of RAM on AMI/Phoenix BIOSen
155
156 in the kernel config. Also clean it up and remove stuff like 'hyperthreading
157 scheduler'.
158
159 If the problem persists, start testing your hardware.
160
161 I would suspect the PSU.

Replies

Subject Author
Re: [gentoo-user] Kernel freezes Alexander Puchmayr <alexander.puchmayr@×××××××.at>