1 |
YyyyYYuIIIIIU
|
2 |
Sent from my Verizon Wireless BlackBerry
|
3 |
|
4 |
-----Original Message-----
|
5 |
From: Alan McKinnon <alan.mckinnon@×××××.com>
|
6 |
Date: Wed, 7 Oct 2015 20:39:42
|
7 |
To: <gentoo-user@l.g.o>
|
8 |
Reply-to: gentoo-user@l.g.o
|
9 |
Subject: Re: [gentoo-user] strange TCP timeout errors
|
10 |
|
11 |
On 07/10/2015 17:55, Grant wrote:
|
12 |
>>>>>> I've attached a PNG from Munin showing the TCP timeout errors on my |
13 |
>>>>>> Gentoo server over the past month. The data is expressed in timeouts |
14 |
>>>>>> per second and that rate is shown to be steadily increasing over the |
15 |
>>>>>> past month. That seems strange to me. Munin doesn't show any other |
16 |
>>>>>> data point increasing like this over the time period. Any ideas? |
17 |
>>>>>> |
18 |
>>>>>> - Grant |
19 |
>>>>>> |
20 |
>>>>> |
21 |
>>>>> weird - does it reset on an interface restart or reboot? |
22 |
>>>> |
23 |
>>>> this would be my test #1 |
24 |
>>> |
25 |
>>> |
26 |
>>> I rebooted and the rate of errors has dropped off to almost nothing. |
27 |
>>> |
28 |
>>> |
29 |
>>>>> Can you verify its not an artefact within munin (how?) |
30 |
>>>> |
31 |
>>>> In theory, a misconfigured graph can do this. Munin can draw many |
32 |
>>>> different types of graph, including cumulative values. Even for a data |
33 |
>>>> type like this which is X events per unit time, if you tell munin to add |
34 |
>>>> them all up, it will do so and graph it. |
35 |
>>>> |
36 |
>>>> Qucik test is to look at the graph config. |
37 |
>>> |
38 |
>>> |
39 |
>>> This graph lives in the "network" section of the munin web interface. |
40 |
>>> There is no matching section in /etc/munin/plugin-conf.d/munin-node so |
41 |
>>> it should be be using the default config. |
42 |
>>> |
43 |
>>> Any ideas based on this new info? |
44 |
>> |
45 |
>> A few :-) |
46 |
>> |
47 |
>> |
48 |
>> I can't find the plugin that delivers that graph though. Maybe I just |
49 |
>> don't have it, maybe it comes from contrib/ |
50 |
>> |
51 |
>> What's your USE for munin? |
52 |
> |
53 |
> |
54 |
> USE="apache cgi http mysql ssl syslog -asterisk -dhcpd -doc -ipmi |
55 |
> -ipv6 -irc -java -memcached -minimal -postgres (-selinux) {-test}" |
56 |
> |
57 |
> |
58 |
>> What do you have in "ls -al /etc/munin/plugins/" ? |
59 |
|
60 |
|
61 |
It's as I thought - your data is accurate but rrd has been given a
|
62 |
completely wrong method to derive the graphs.
|
63 |
|
64 |
Munin graphs for section "Network" do not have to be in a file called
|
65 |
"network" - it's just a category and the plugin defines what web-page
|
66 |
section it must be in. In your case, the relevant plugin is
|
67 |
netstat_multi which doesn't often get installed. It's data source is
|
68 |
"netstat -s" so grep that output for "timeout" to see it.
|
69 |
|
70 |
Timeouts are cumulative counters, they do not get less till they wrap
|
71 |
around. So to scale them, the plugin gets the rrd file to subtract
|
72 |
previous reading from current reading and divide by the time interval to
|
73 |
get the timeouts/sec. This is all done inside rrd when the data files
|
74 |
are updated (it's quite a lot of magic)
|
75 |
|
76 |
That plugin sets the graph type to DERIVE
|
77 |
(/etc/munin/plugins/netstat_multi around line 190. I feel it should be
|
78 |
GAUGE or COUNTER.
|
79 |
|
80 |
The proper reference on rrd is
|
81 |
http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html
|
82 |
and the munin docs are
|
83 |
https://munin.readthedocs.org/en/latest/index.html
|
84 |
|
85 |
You must edit the plugin file and IIRC recreate the rrd, you will lose
|
86 |
all past info (can't be helped).
|
87 |
|
88 |
|
89 |
[snip ls output]
|
90 |
|
91 |
|
92 |
> P.S. Any other good plugins you'd recommend? |
93 |
|
94 |
http://gallery.munin-monitoring.org/
|
95 |
|
96 |
Monitoring is highly site-specific so recommendations aren't usually
|
97 |
worth much, but that gallery has LOTS of contributed plugins
|
98 |
|
99 |
--
|
100 |
Alan McKinnon
|
101 |
alan.mckinnon@×××××.com |