Gentoo Archives: gentoo-user

From: Michael Mol <mikemol@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Re: Advice on system monitoring
Date: Mon, 05 Dec 2011 17:25:25
Message-Id: CA+czFiAkx=dyF12OOKJApFu8JGOTd0EhDzUTRrtp=JfWY-gQsg@mail.gmail.com
In Reply to: [gentoo-user] Re: Advice on system monitoring by James
1 On Mon, Dec 5, 2011 at 12:01 PM, James <wireless@×××××××××××.com> wrote:
2 > Michael Mol <mikemol <at> gmail.com> writes:
3 >> Let's start with that dual-xeon box I was using to benchmark "emerge
4 >> -e @world", figure I'm looking for how better to tune my MAKEOPTS and
5 >> EMERGE_DEFAULT_OPTS variables, and assume I'd like to get more
6 >> information about the following factors:
7 >
8 > Complex and never finished, imho.....
9 >
10 >
11 >> * What was the 1m, 5m 15m load averages?
12 >> * What were the similar averages for CPU spent in user time, system
13 >> time and I/O wait?
14 >
15 > sys-process/iotop
16 >
17 >> * What was network usage like? (I have a caching proxy server on the
18 >> network
19 >
20 > Lots of different tools to look at network performance:
21 >
22 > wireshark,  (look around /usr/portage/net-analyzer)
23 >
24 >
25 >> so even if distfiles are lost on-system, well, a cache hit
26 >> transfers at up to around 50MB/s. It'd be better, except for read
27 >> performance limitations on the router box, and write performance
28 >> limitations on the local machine)
29 >
30 >
31 > bonnie++ (or bonnie)
32 >
33 >
34 >> * What was the temperature of each CPU core, RAM module and hard
35 >> drive? (Not so relevant for improving system performance, but still of
36 >> interest.)
37 >
38 > app-admin/hddtemp (for drives)
39 >
40 > dunno on individual cpu cores...
41 >
42 >> I'd like to have a web interface I could navigate to which would show
43 >> graphs of these counters.
44 >
45 >
46 > Now all of that in one gui tool?  Do post back when you get it working,
47 > as I'd like to use it too!!!!!
48
49 The approach I'd like to take is to have all the monitoring set up,
50 launch emerge -e @world, and see what's going on around (and just
51 prior to) stalls and CPU waste. I'm defining a stall as where my
52 operating load falls below my number of CPU cores, and I'm defining
53 CPU waste as CPU time spent anywhere but 'user'. I'd like to look at
54 graphs of the metrics from over the course of the emerge.
55
56 My chief thought is this: I have both 'make' and 'emerge' trying to
57 reach a specific load average, which means that this particular
58 dynamic system is going to have feedback as they go back and forth. I
59 expect that I'll want to duck one of them under the other, but I don't
60 know which one yet, and I don't know how far.
61
62 I should also look to see if pbzip2 supports load awareness. Having
63 eight cores suddenly start churning through BWT blocks is great if
64 your load average is something like 0.24, but not so great if it
65 launches your load average up to around 12.
66
67 --
68 :wq