1 |
On 07/10/2015 17:55, Grant wrote: |
2 |
>>>>>> I've attached a PNG from Munin showing the TCP timeout errors on my |
3 |
>>>>>> Gentoo server over the past month. The data is expressed in timeouts |
4 |
>>>>>> per second and that rate is shown to be steadily increasing over the |
5 |
>>>>>> past month. That seems strange to me. Munin doesn't show any other |
6 |
>>>>>> data point increasing like this over the time period. Any ideas? |
7 |
>>>>>> |
8 |
>>>>>> - Grant |
9 |
>>>>>> |
10 |
>>>>> |
11 |
>>>>> weird - does it reset on an interface restart or reboot? |
12 |
>>>> |
13 |
>>>> this would be my test #1 |
14 |
>>> |
15 |
>>> |
16 |
>>> I rebooted and the rate of errors has dropped off to almost nothing. |
17 |
>>> |
18 |
>>> |
19 |
>>>>> Can you verify its not an artefact within munin (how?) |
20 |
>>>> |
21 |
>>>> In theory, a misconfigured graph can do this. Munin can draw many |
22 |
>>>> different types of graph, including cumulative values. Even for a data |
23 |
>>>> type like this which is X events per unit time, if you tell munin to add |
24 |
>>>> them all up, it will do so and graph it. |
25 |
>>>> |
26 |
>>>> Qucik test is to look at the graph config. |
27 |
>>> |
28 |
>>> |
29 |
>>> This graph lives in the "network" section of the munin web interface. |
30 |
>>> There is no matching section in /etc/munin/plugin-conf.d/munin-node so |
31 |
>>> it should be be using the default config. |
32 |
>>> |
33 |
>>> Any ideas based on this new info? |
34 |
>> |
35 |
>> A few :-) |
36 |
>> |
37 |
>> |
38 |
>> I can't find the plugin that delivers that graph though. Maybe I just |
39 |
>> don't have it, maybe it comes from contrib/ |
40 |
>> |
41 |
>> What's your USE for munin? |
42 |
> |
43 |
> |
44 |
> USE="apache cgi http mysql ssl syslog -asterisk -dhcpd -doc -ipmi |
45 |
> -ipv6 -irc -java -memcached -minimal -postgres (-selinux) {-test}" |
46 |
> |
47 |
> |
48 |
>> What do you have in "ls -al /etc/munin/plugins/" ? |
49 |
|
50 |
|
51 |
It's as I thought - your data is accurate but rrd has been given a |
52 |
completely wrong method to derive the graphs. |
53 |
|
54 |
Munin graphs for section "Network" do not have to be in a file called |
55 |
"network" - it's just a category and the plugin defines what web-page |
56 |
section it must be in. In your case, the relevant plugin is |
57 |
netstat_multi which doesn't often get installed. It's data source is |
58 |
"netstat -s" so grep that output for "timeout" to see it. |
59 |
|
60 |
Timeouts are cumulative counters, they do not get less till they wrap |
61 |
around. So to scale them, the plugin gets the rrd file to subtract |
62 |
previous reading from current reading and divide by the time interval to |
63 |
get the timeouts/sec. This is all done inside rrd when the data files |
64 |
are updated (it's quite a lot of magic) |
65 |
|
66 |
That plugin sets the graph type to DERIVE |
67 |
(/etc/munin/plugins/netstat_multi around line 190. I feel it should be |
68 |
GAUGE or COUNTER. |
69 |
|
70 |
The proper reference on rrd is |
71 |
http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html |
72 |
and the munin docs are |
73 |
https://munin.readthedocs.org/en/latest/index.html |
74 |
|
75 |
You must edit the plugin file and IIRC recreate the rrd, you will lose |
76 |
all past info (can't be helped). |
77 |
|
78 |
|
79 |
[snip ls output] |
80 |
|
81 |
|
82 |
> P.S. Any other good plugins you'd recommend? |
83 |
|
84 |
http://gallery.munin-monitoring.org/ |
85 |
|
86 |
Monitoring is highly site-specific so recommendations aren't usually |
87 |
worth much, but that gallery has LOTS of contributed plugins |
88 |
|
89 |
-- |
90 |
Alan McKinnon |
91 |
alan.mckinnon@×××××.com |