Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Re: Hanging after a few days - 2.6.15-r7
Date: Tue, 14 Mar 2006 16:41:55
Message-Id: pan.2006.03.14.16.30.23.991638@cox.net
In Reply to: Re: [gentoo-amd64] Re: Hanging after a few days - 2.6.15-r7 by Brett Johnson
1 Brett Johnson posted <20060314135750.GC7383@××××.com>, excerpted below,
2 on Tue, 14 Mar 2006 07:57:51 -0600:
3
4 > I'd like to take a minute to thank Duncan, as I usually learn something
5 > new with his very informative posts! Thank you for taking the time to
6 > write such detailed posts, I for one look forward to reading them.
7
8 Thanks. Just remember that while I /try/ to make a distinction between
9 what's my opinion and what's fact as I know it, sometimes that fact as I
10 know it is wrong. =8^( That's happened a couple times recently. =8^(
11
12 I /do/ try to just as publicly acknowledge when I'm wrong, tho, so
13 hopefully if I've steered anybody wrong, they see my later nack too.
14
15 > On Mon, Mar 13, 2006 at 04:25:05PM -0700, Duncan wrote:
16 >> [With a kernel bug, first, see if it's in vanilla/mainline or not by
17 >> trying mainline, then using the instructions in the kernel BUG-HUNTING
18 >> doc to isolate it.
19 >>
20 > That is great information, and I will pursue this if I do indeed have
21 > some memory anomaly.
22
23 Interestingly enough, I just discovered BUG-HUNTING and used the
24 techniques therein to file my first kernel bug, isolated to a specific
25 line in 2.6.15-git11, that is, a change that happened between git10 and
26 git11, the 10th and 11th snapshots after 2.6.15 release, but before
27 2.6.16-rc1. So... new knowledge fresh on my mind to share! =8^) (The bug
28 is http://bugzilla.kernel.org/show_bug.cgi?id=6130 , and as reported in my
29 last comment to it, reversion of the culprit code in -rc6 has eliminated
30 the problem for now, altho they didn't fully trace the problem. It's
31 rolled back to working for .16, after which they'll tackle the problem
32 agaiin for .17.)
33
34 >> You didn't mention this so I'm not sure if you know to make the
35 >> distinction or not -- what sort of memory usage was it? Application
36 >> usage (indicates a leak) or simply cache or buffer usage?
37
38 > I also use the free command to view memory. I have been under the
39 > assumption that the "-/+ buffers/cache" line in the free command shows
40 > the actual memory "used" by the kernel and applications, and the "free"
41 > memory available for cache/buffers. This is how I have always based my
42 > assessment of memory usage.
43 >
44 > Here is a sample from this morning with just mutt, irssi, gaim and aterm
45 > running with fluxbox as the wm:
46 >
47 > total used free buffers cached
48 > Mem: 1028780 996560 32220 234996 202852
49 > -/+ buffers/cache: 558712 470068
50 > Swap: 979956 224 979732
51 >
52 > I have thought that me "Mem: used" should always be as high as possible,
53 > meaning I am using as much ram in the system for applications as well as
54 > cache and buffers. I assumed the" -/+ buffers/cache used" is how much
55 > memory is consumed by the kernel and running applications. The "-/+ used
56 > number" is the one that I have seen grow overnight unusually high. This
57 > is also the number that conky seems to report on. Prior to "the change",
58 > and after a reboot, running the normal applications listed above, I
59 > average about 120000 on the "-/+ buffers/cache used". Maybe I am
60 > interpreting this data incorrectly, and I in fact do not have a problem.
61
62 No, you are interpreting it correctly, AFAIK. I was concerned that you might
63 be using the top numbers and thinking that "free" was what you really had
64 available to use. Common mistake, but one you obviously got past some time
65 ago, as you're interpreting the numbers correctly. =8^) but also =8^( 'cause
66 that means you have a problem!
67
68 > As a side bar, I saw mention in this list before, I think by Duncan, a
69 > "recommended" book on the Linux kernel. Unfortunately, it seems I
70 > deleted that thread, so if someone knows a good book on the kernel and
71 > memory management, I would appreciate any recommendations.
72
73 It wasn't me, but I remember seeing the mention as well. Unfortunately, I
74 do /not/ remember what the recommendation was. =8^(
75
76 >> Kernel 2.6 has a swappiness tweaking control ( /proc/sys/vm/swappiness
77 >> ), that determines the balance between keeping stuff cached and
78 >> )swapping out more applications, once all memory is used. This is set by
79 >> default to 60.
80 >>
81 > Again, I learned something new today (and it's not even 8am here!), and
82 > I always kind of wondered why Duncan had some much swap and how he
83 > utilized it. This makes sense, and I will try tweaking this setting, as
84 > normally I don't use any swap.
85
86 I happened across that on the LWN weekly kernel page. =8^) Andrew M (IIRC)
87 had the featured quote. Apparently, there had been a lot of complaints
88 about the issue as I had explained above (apps left running overnight not
89 very responsive the first time they were used, and similar), and the featured
90 quote was Andrew saying he was going to "stick his fingers in his ears and
91 sing na-na-na" until the complainers could say they'd set swappiness to 0 and
92 were /still/ getting the problem! <g> As it happened, LWN either covered
93 the swappiness parameter that week, or had done so in one of the last couple
94 issues, so I learned what it was all about, but the reason it stuck as
95 effectively as it did was because of that quote! =8^)
96
97 As for my own swap, for awhile I was running with swap entirely disabled.
98 With a gig of memory for desktop use, it's possible, and the swap management
99 code /is/ pretty invasive and complicated, so if it's not necessary, turning
100 it off /should/ mean a slightly more efficient kernel, and will /certainly/
101 mean less code that can go wrong. At the time, I had unstable memory (cheap
102 generic stuff, couldn't quite handle the rated pc3200 400MHz clock, and no BIOS
103 setting to declock it until a later BIOS upgrade, the stuff runs rock-stable at
104 pc3000 383MHz clock), and figured the less unnecessary code there was, the
105 better I'd run.
106
107 I thought I'd upgrade memory first, since I was having issues with it, but
108 when the BIOS upgrade and declocking to pc3000 speeds solved that issue,
109 another soon popped up -- my AC died, a VERY BAD thing to happen in the
110 middle of the summer here in Phoenix, with 110-120 F highs ( up to 48 C ).
111 I replaced it, then started having issues with my hard drive, as it had
112 overheated. So... I had been thinking about going RAID, and took the
113 opportunity to do so, 4x300 gig Seagate SATAs.
114
115 As I hadn't even formatted about a hundred gig of my 250 gig Maxtor that
116 overheated and I was worried about going bad (I got it before I began
117 losing stuff, but it had developed some bad sectors), and with the mixed
118 RAID on the 4x300s giving me > 600 gig of usable storage, I had plenty of
119 space to work with. So, while apportioning it, I decided to go ahead and
120 apportion some space for swap once again. The general rule of thumb for swap
121 is 2X physical memory, and I was still planning on a memory upgrade, so
122 I took that into account as well. If I fill all 8 slots with 2-gig sticks,
123 that will give me 16 gigs of memory. However, I couldn't figure any way in
124 my wildest dreams that I'd need 48 gig of memory, and I figured I might
125 never upgrade to the full 16 gig of physical memory anyway, so I decided
126 16 gig of swap was a decent compromise.
127
128 As I said, I have that evenly allocated, a 4 gig swap partition on each
129 drive, specifically set to the same priority in fstab, which the kernel
130 then handles by striping them. Thus, just as with striped raid (raid-0),
131 if there's sufficient data being transferred, it's at the higher speed
132 of four disks rather than the lower speed of one. At that speed, swapping
133 is actually not too bad! The only time it gets bad is if there's other i/o
134 going on at the same time, and even then, the kernel splits the tasks up
135 quite well so I still usually get better performance with other i/o than
136 I would with swap alone (no other i/o) on a single disk.
137
138 RAID, as with the dual CPU, was a very good choice. I'm more than satisfied
139 with the performance of both, as both the dual CPU and the 4-way RAID have
140 exceeded my performance expectations.
141
142 I was actually tempted not to worry about upgrading my memory at all, after
143 that. However, I'm still wanting to be an AT, and being able to mount a tmpfs
144 and have it all in physical memory, large enough to run a couple emerges at a
145 time, will clear up my current biggest machine performance bottleneck.
146 Additionally, the next logical upgrade is to switch to dual-cores, thus
147 giving me 4 cores since I'm already running dual (socket) Opterons, and that
148 upgrade simply doesn't make sense unless I'm running at least a gig for each,
149 or 4 gigs total. Thus, 8 gigs should be quite decent, and serve me well
150 even after I upgrade to dual-cores, late this year or early next if I'm lucky.
151
152 As I said, I ordered that 8 gig last week. Hopefully, it will be here by
153 Wed, altho the status still was listed as in-warehouse this AM, and I ordered
154 2-day shipping, so it's now more likely to be Thursday.
155
156 So.. that'll leave me with 8 gig of physical memory, and 16 gig of 4-way swap.
157
158 BTW, other than during that xorg composite memory leak I mentioned, I haven't
159 seen memory usage go much beyond a gig into swap, 2 gigs total. The other
160 six gigs should therefore allow me to run portage's tempdir in RAM, and
161 /still/ give me more cached filesystem than I have now. =8^) I'm looking
162 forward to trying it out! =8^)
163
164 It'll also give me a chance to see how that memory hole for 32-bit PCI at
165 the top of the 4-gig memory space works. I've read about it, and I see the
166 options in the BIOS and know that's one of the things the kernel's IOMMU
167 setting deals with. Now, I'll get to see how it all actually works, and see if
168 how I /think/ it works actually /does/ work that way. =8^)
169
170 --
171 Duncan - List replies preferred. No HTML msgs.
172 "Every nonfree program has a lord, a master --
173 and if you use the program, he is your master." Richard Stallman in
174 http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
175
176
177 --
178 gentoo-amd64@g.o mailing list

Replies

Subject Author
Re: [gentoo-amd64] Re: Re: Hanging after a few days - 2.6.15-r7 Marco Matthies <marco-ml@×××.net>