1 |
Hi, |
2 |
|
3 |
I have an issue with a machine where I'm not able to detect the real |
4 |
root cause. It hangs up totally. It seems like it was running out of |
5 |
memory - but why? Hopefully somebody can give me some insight. As far I |
6 |
can see right now, it hangs up a few hours after an `emerge --update |
7 |
--newuse --deep --with-bdeps=y @world`. |
8 |
|
9 |
The machine is an Intel Atom with 8 GB RAM (physical, max) and 24 GB |
10 |
swap (a file). So 32 GB RAM in total. It has a 250GB SSD. It runs |
11 |
gentoo-sources-4.14.83 build with genkernel. Portage uses the stable |
12 |
tree only. It basically provides the hardware for a qemu VM which does |
13 |
the network management: primary ns, dhcp, apache ssl proxy. This VM uses |
14 |
4 GB RAM and has 8 GB swap file. The VM works smoothly. The atom machine |
15 |
itself acts further as basic nfs server to an independent dedicated |
16 |
server - which does the (re)exports - and as secondary ns. For this I'm |
17 |
convinced that 32GB RAM total have to be enough - correct me if I'm |
18 |
wrong! |
19 |
|
20 |
The issue starts round about in February (IIRC). The update of gcc-10.2 |
21 |
did fail. I have /var/tmp/portage on tmpfs - I did increase the size in |
22 |
fstab from 8 to 16 GB. Afterwards gcc build successfully. |
23 |
|
24 |
After two hang-ups I did increase the swap from 8 to 24 GB. It doesn't |
25 |
help. Here is a complete log from /var/log/messages: |
26 |
|
27 |
Apr 28 05:35:57 Syrin kernel: [1454017.499919] isc-net-0000 invoked |
28 |
oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), |
29 |
nodemask=(null), order=0, oom_score_adj=0 |
30 |
Apr 28 05:35:57 Syrin kernel: [1454017.499925] isc-net-0000 cpuset=/ |
31 |
mems_allowed=0 |
32 |
Apr 28 05:35:57 Syrin kernel: [1454017.499933] CPU: 0 PID: 27685 Comm: |
33 |
isc-net-0000 Not tainted 4.14.83-gentoo #1 |
34 |
Apr 28 05:35:57 Syrin kernel: [1454017.499935] Hardware name: MSI |
35 |
MS-7877/J1900I, BIOS V1.2 03/25/2014 |
36 |
Apr 28 05:35:57 Syrin kernel: [1454017.499936] Call Trace: |
37 |
Apr 28 05:35:57 Syrin kernel: [1454017.499948] dump_stack+0x67/0x98 |
38 |
Apr 28 05:35:57 Syrin kernel: [1454017.499954] dump_header+0x94/0x20c |
39 |
Apr 28 05:35:57 Syrin kernel: [1454017.499958] |
40 |
oom_kill_process+0x24a/0x420 |
41 |
Apr 28 05:35:57 Syrin kernel: [1454017.499962] ? |
42 |
oom_badness.part.9+0xd3/0x150 |
43 |
Apr 28 05:35:57 Syrin kernel: [1454017.499965] out_of_memory+0xf9/0x290 |
44 |
Apr 28 05:35:57 Syrin kernel: [1454017.499968] |
45 |
__alloc_pages_nodemask+0xf48/0xff0 |
46 |
Apr 28 05:35:57 Syrin kernel: [1454017.499974] |
47 |
filemap_fault+0x294/0x4c0 |
48 |
Apr 28 05:35:57 Syrin kernel: [1454017.499979] |
49 |
ext4_filemap_fault+0x2c/0x40 |
50 |
Apr 28 05:35:57 Syrin kernel: [1454017.499983] __do_fault+0x1f/0xb0 |
51 |
Apr 28 05:35:57 Syrin kernel: [1454017.499986] |
52 |
__handle_mm_fault+0x3ed/0xad0 |
53 |
Apr 28 05:35:57 Syrin kernel: [1454017.499991] |
54 |
handle_mm_fault+0xaa/0x1f0 |
55 |
Apr 28 05:35:57 Syrin kernel: [1454017.499996] |
56 |
__do_page_fault+0x250/0x4f0 |
57 |
Apr 28 05:35:57 Syrin kernel: [1454017.500000] ? page_fault+0x2f/0x50 |
58 |
Apr 28 05:35:57 Syrin kernel: [1454017.500003] page_fault+0x45/0x50 |
59 |
Apr 28 05:35:57 Syrin kernel: [1454017.500005] RIP: 0000: |
60 |
(null) |
61 |
Apr 28 05:35:57 Syrin kernel: [1454017.500007] RSP: |
62 |
12f83750:0000000000000001 EFLAGS: 7ffa12f837a0 |
63 |
Apr 28 05:35:57 Syrin kernel: [1454017.500010] Mem-Info: |
64 |
Apr 28 05:35:57 Syrin kernel: [1454017.500017] active_anon:1694713 |
65 |
inactive_anon:211859 isolated_anon:0 |
66 |
Apr 28 05:35:57 Syrin kernel: [1454017.500017] active_file:328 |
67 |
inactive_file:344 isolated_file:32 |
68 |
Apr 28 05:35:57 Syrin kernel: [1454017.500017] unevictable:1374 dirty:0 |
69 |
writeback:0 unstable:0 |
70 |
Apr 28 05:35:57 Syrin kernel: [1454017.500017] slab_reclaimable:4480 |
71 |
slab_unreclaimable:7449 |
72 |
Apr 28 05:35:57 Syrin kernel: [1454017.500017] mapped:1071 shmem:3 |
73 |
pagetables:16352 bounce:0 |
74 |
Apr 28 05:35:57 Syrin kernel: [1454017.500017] free:11655 free_pcp:534 |
75 |
free_cma:0 |
76 |
Apr 28 05:35:57 Syrin kernel: [1454017.500021] Node 0 |
77 |
active_anon:6778852kB inactive_anon:847436kB active_file:1312kB |
78 |
inactive_file:1376kB unevictable:5496kB isolated(anon):0kB |
79 |
isolated(file):128kB mapped:4284kB dirty:0kB writeback:0kB shmem:12kB |
80 |
writeback_tmp:0kB unstable:0kB all_unreclaimable? yes |
81 |
Apr 28 05:35:57 Syrin kernel: [1454017.500026] DMA free:15836kB min:20kB |
82 |
low:32kB high:44kB active_anon:0kB inactive_anon:0kB active_file:0kB |
83 |
inactive_file:0kB unevictable:0kB writepending:0kB present:15920kB |
84 |
managed:15836kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB |
85 |
free_pcp:0kB local_pcp:0kB free_cma:0kB |
86 |
Apr 28 05:35:57 Syrin kernel: [1454017.500027] lowmem_reserve[]: 0 2664 |
87 |
7647 7647 |
88 |
Apr 28 05:35:57 Syrin kernel: [1454017.500036] DMA32 free:23732kB |
89 |
min:3892kB low:6620kB high:9348kB active_anon:2319992kB |
90 |
inactive_anon:363260kB active_file:0kB inactive_file:92kB |
91 |
unevictable:0kB writepending:0kB present:2825512kB managed:2734888kB |
92 |
mlocked:0kB kernel_stack:140kB pagetables:22572kB bounce:0kB |
93 |
free_pcp:1136kB local_pcp:476kB free_cma:0kB |
94 |
Apr 28 05:35:57 Syrin kernel: [1454017.500037] lowmem_reserve[]: 0 0 |
95 |
4982 4982 |
96 |
Apr 28 05:35:57 Syrin kernel: [1454017.500045] Normal free:7052kB |
97 |
min:7284kB low:12384kB high:17484kB active_anon:4458860kB |
98 |
inactive_anon:484176kB active_file:1368kB inactive_file:1568kB |
99 |
unevictable:5496kB writepending:0kB present:5242880kB managed:5102420kB |
100 |
mlocked:5496kB kernel_stack:2276kB pagetables:42836kB bounce:0kB |
101 |
free_pcp:1000kB local_pcp:724kB free_cma:0kB |
102 |
Apr 28 05:35:57 Syrin kernel: [1454017.500046] lowmem_reserve[]: 0 0 0 0 |
103 |
Apr 28 05:35:57 Syrin kernel: [1454017.500050] DMA: 1*4kB (U) 1*8kB (U) |
104 |
1*16kB (U) 0*32kB 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB |
105 |
(U) 1*2048kB (M) 3*4096kB (M) = 15836kB |
106 |
Apr 28 05:35:57 Syrin kernel: [1454017.500070] DMA32: 268*4kB (UE) |
107 |
1562*8kB (UE) 645*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB |
108 |
0*1024kB 0*2048kB 0*4096kB = 23888kB |
109 |
Apr 28 05:35:57 Syrin kernel: [1454017.500084] Normal: 293*4kB (UME) |
110 |
490*8kB (UME) 152*16kB (UE) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB |
111 |
0*1024kB 0*2048kB 0*4096kB = 7524kB |
112 |
Apr 28 05:35:57 Syrin kernel: [1454017.500100] Node 0 hugepages_total=0 |
113 |
hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB |
114 |
Apr 28 05:35:57 Syrin kernel: [1454017.500101] 1818 total pagecache |
115 |
pages |
116 |
Apr 28 05:35:57 Syrin kernel: [1454017.500110] 86 pages in swap cache |
117 |
Apr 28 05:35:57 Syrin kernel: [1454017.500112] Swap cache stats: add |
118 |
13659627, delete 13659513, find 1129210/1793572 |
119 |
Apr 28 05:35:57 Syrin kernel: [1454017.500113] Free swap = 0kB |
120 |
Apr 28 05:35:57 Syrin kernel: [1454017.500114] Total swap = 25165820kB |
121 |
Apr 28 05:35:57 Syrin kernel: [1454017.500115] 2021078 pages RAM |
122 |
Apr 28 05:35:57 Syrin kernel: [1454017.500116] 0 pages |
123 |
HighMem/MovableOnly |
124 |
Apr 28 05:35:57 Syrin kernel: [1454017.500117] 57792 pages reserved |
125 |
Apr 28 05:35:57 Syrin kernel: [1454017.500118] 0 pages hwpoisoned |
126 |
Apr 28 05:35:57 Syrin kernel: [1454017.500119] [ pid ] uid tgid |
127 |
total_vm rss nr_ptes nr_pmds swapents oom_score_adj name |
128 |
Apr 28 05:35:57 Syrin kernel: [1454017.500125] [ 4009] 0 4009 |
129 |
21207 320 7 3 20 0 apcupsd |
130 |
Apr 28 05:35:57 Syrin kernel: [1454017.500128] [ 4043] 0 4043 |
131 |
54371 48 12 3 220 0 rsyslogd |
132 |
Apr 28 05:35:57 Syrin kernel: [1454017.500131] [ 4084] 0 4084 |
133 |
1938 178 6 3 18 0 fcron |
134 |
Apr 28 05:35:57 Syrin kernel: [1454017.500133] [ 4400] 0 4400 |
135 |
17733 1376 8 3 0 0 ntpd |
136 |
Apr 28 05:35:57 Syrin kernel: [1454017.500136] [ 4429] 0 4429 |
137 |
2789 0 7 3 103 0 rsync |
138 |
Apr 28 05:35:57 Syrin kernel: [1454017.500139] [ 4460] 0 4460 |
139 |
1689 241 6 3 112 -1000 sshd |
140 |
Apr 28 05:35:57 Syrin kernel: [1454017.500142] [ 4693] 0 4693 |
141 |
1067352 80035 1546 7 664044 0 qemu-system-x86 |
142 |
Apr 28 05:35:57 Syrin kernel: [1454017.500145] [ 4863] 0 4863 |
143 |
1905 465 7 3 43 0 agetty |
144 |
Apr 28 05:35:57 Syrin kernel: [1454017.500148] [ 4864] 0 4864 |
145 |
1905 433 7 3 44 0 agetty |
146 |
Apr 28 05:35:57 Syrin kernel: [1454017.500151] [ 4865] 0 4865 |
147 |
1905 431 7 3 43 0 agetty |
148 |
Apr 28 05:35:57 Syrin kernel: [1454017.500153] [ 4866] 0 4866 |
149 |
1905 443 7 3 43 0 agetty |
150 |
Apr 28 05:35:57 Syrin kernel: [1454017.500156] [ 4867] 0 4867 |
151 |
1905 453 7 3 43 0 agetty |
152 |
Apr 28 05:35:57 Syrin kernel: [1454017.500159] [ 4868] 0 4868 |
153 |
1905 433 8 3 43 0 agetty |
154 |
Apr 28 05:35:57 Syrin kernel: [1454017.500162] [27439] 0 27439 |
155 |
675 295 5 3 60 0 rpcbind |
156 |
Apr 28 05:35:57 Syrin kernel: [1454017.500164] [27509] 0 27509 |
157 |
750 419 5 3 80 0 rpc.idmapd |
158 |
Apr 28 05:35:57 Syrin kernel: [1454017.500167] [27520] 65534 27520 |
159 |
693 421 5 3 53 0 rpc.statd |
160 |
Apr 28 05:35:57 Syrin kernel: [1454017.500170] [27583] 0 27583 |
161 |
819 365 5 3 108 0 rpc.mountd |
162 |
Apr 28 05:35:57 Syrin kernel: [1454017.500173] [27684] 40 27684 |
163 |
7570889 1823291 14584 32 5626031 0 named |
164 |
Apr 28 05:35:57 Syrin kernel: [1454017.500176] [10479] 0 10479 |
165 |
3262 449 7 3 182 0 udevd |
166 |
Apr 28 05:35:57 Syrin kernel: [1454017.500179] [ 2923] 0 2923 |
167 |
1938 318 6 3 10 0 fcron |
168 |
Apr 28 05:35:57 Syrin kernel: [1454017.500182] [ 2924] 0 2924 |
169 |
981 406 5 3 0 0 backup.cron |
170 |
Apr 28 05:35:57 Syrin kernel: [1454017.500184] [ 2930] 0 2930 |
171 |
981 462 5 3 0 0 rsync.sh |
172 |
Apr 28 05:35:57 Syrin kernel: [1454017.500187] [ 2965] 0 2965 |
173 |
13185 1261 30 3 0 0 rsync |
174 |
Apr 28 05:35:57 Syrin kernel: [1454017.500190] [ 2966] 0 2966 |
175 |
574 158 5 3 0 0 tee |
176 |
Apr 28 05:35:57 Syrin kernel: [1454017.500192] [ 2968] 0 2968 |
177 |
2038 482 7 3 0 0 rsync |
178 |
Apr 28 05:35:57 Syrin kernel: [1454017.500195] [ 2974] 0 2974 |
179 |
14142 1189 31 3 0 0 rsync |
180 |
Apr 28 05:35:57 Syrin kernel: [1454017.500198] [ 2995] 0 2995 |
181 |
1938 271 6 3 7 0 fcron |
182 |
Apr 28 05:35:57 Syrin kernel: [1454017.500201] [ 2996] 0 2996 |
183 |
981 68 6 3 0 0 sh |
184 |
Apr 28 05:35:57 Syrin kernel: [1454017.500203] Out of memory: Kill |
185 |
process 27684 (named) score 904 or sacrifice child |
186 |
Apr 28 05:35:57 Syrin kernel: [1454017.500234] Killed process 27684 |
187 |
(named) total-vm:30283556kB, anon-rss:7293164kB, file-rss:0kB, |
188 |
shmem-rss:0kB |
189 |
Apr 28 05:36:00 Syrin kernel: [1454019.937636] oom_reaper: reaped |
190 |
process 27684 (named), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB |
191 |
|
192 |
All the processes after udevd are just from this day (backup job). |
193 |
|
194 |
For comparison, the last call trace before was nearly the same: |
195 |
|
196 |
Apr 15 18:26:37 Syrin kernel: [376620.710330] Call Trace: |
197 |
Apr 15 18:26:37 Syrin kernel: [376620.710341] dump_stack+0x67/0x98 |
198 |
Apr 15 18:26:37 Syrin kernel: [376620.710347] dump_header+0x94/0x20c |
199 |
Apr 15 18:26:37 Syrin kernel: [376620.710352] |
200 |
oom_kill_process+0x24a/0x420 |
201 |
Apr 15 18:26:37 Syrin kernel: [376620.710355] ? |
202 |
oom_badness.part.9+0xd3/0x150 |
203 |
Apr 15 18:26:37 Syrin kernel: [376620.710358] out_of_memory+0xf9/0x290 |
204 |
Apr 15 18:26:37 Syrin kernel: [376620.710361] |
205 |
__alloc_pages_nodemask+0xf48/0xff0 |
206 |
Apr 15 18:26:37 Syrin kernel: [376620.710367] filemap_fault+0x294/0x4c0 |
207 |
Apr 15 18:26:37 Syrin kernel: [376620.710372] |
208 |
ext4_filemap_fault+0x2c/0x40 |
209 |
Apr 15 18:26:37 Syrin kernel: [376620.710376] __do_fault+0x1f/0xb0 |
210 |
Apr 15 18:26:37 Syrin kernel: [376620.710380] |
211 |
__handle_mm_fault+0x3ed/0xad0 |
212 |
Apr 15 18:26:37 Syrin kernel: [376620.710385] |
213 |
handle_mm_fault+0xaa/0x1f0 |
214 |
Apr 15 18:26:37 Syrin kernel: [376620.710390] |
215 |
__do_page_fault+0x250/0x4f0 |
216 |
Apr 15 18:26:37 Syrin kernel: [376620.710394] ? page_fault+0x2f/0x50 |
217 |
Apr 15 18:26:37 Syrin kernel: [376620.710396] page_fault+0x45/0x50 |
218 |
Apr 15 18:26:37 Syrin kernel: [376620.710400] RIP: |
219 |
21e50088:0x7f41aedee150 |
220 |
Apr 15 18:26:37 Syrin kernel: [376620.710401] RSP: |
221 |
21e50070:0000000000000000 EFLAGS: 7f41aedee150 |
222 |
Apr 15 18:26:37 Syrin kernel: [376620.710405] Mem-Info: |
223 |
Apr 15 18:26:37 Syrin kernel: [376620.710411] active_anon:1695970 |
224 |
inactive_anon:212020 isolated_anon:0 |
225 |
Apr 15 18:26:37 Syrin kernel: [376620.710411] active_file:339 |
226 |
inactive_file:320 isolated_file:0 |
227 |
Apr 15 18:26:37 Syrin kernel: [376620.710411] unevictable:1374 dirty:0 |
228 |
writeback:0 unstable:0 |
229 |
Apr 15 18:26:37 Syrin kernel: [376620.710411] slab_reclaimable:4311 |
230 |
slab_unreclaimable:7552 |
231 |
Apr 15 18:26:37 Syrin kernel: [376620.710411] mapped:1059 shmem:3 |
232 |
pagetables:16264 bounce:0 |
233 |
Apr 15 18:26:37 Syrin kernel: [376620.710411] free:11840 free_pcp:0 |
234 |
free_cma:0 |
235 |
|
236 |
Unfortunately I don't understand all the details. Any help is highly |
237 |
appreciated. |
238 |
|
239 |
I assume it has something to do with tmpfs which will not be freed. Just |
240 |
an assumption, I'm searching for clarity, not try and error. |
241 |
|
242 |
Thanks |
243 |
Kai |