Gentoo Archives: gentoo-user

From: Alan McKinnon <alan.mckinnon@×××××.com>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] strange TCP timeout errors
Date: Wed, 07 Oct 2015 18:40:39
Message-Id: 561566EE.9000507@gmail.com
In Reply to: Re: [gentoo-user] strange TCP timeout errors by Grant
1 On 07/10/2015 17:55, Grant wrote:
2 >>>>>> I've attached a PNG from Munin showing the TCP timeout errors on my
3 >>>>>> Gentoo server over the past month. The data is expressed in timeouts
4 >>>>>> per second and that rate is shown to be steadily increasing over the
5 >>>>>> past month. That seems strange to me. Munin doesn't show any other
6 >>>>>> data point increasing like this over the time period. Any ideas?
7 >>>>>>
8 >>>>>> - Grant
9 >>>>>>
10 >>>>>
11 >>>>> weird - does it reset on an interface restart or reboot?
12 >>>>
13 >>>> this would be my test #1
14 >>>
15 >>>
16 >>> I rebooted and the rate of errors has dropped off to almost nothing.
17 >>>
18 >>>
19 >>>>> Can you verify its not an artefact within munin (how?)
20 >>>>
21 >>>> In theory, a misconfigured graph can do this. Munin can draw many
22 >>>> different types of graph, including cumulative values. Even for a data
23 >>>> type like this which is X events per unit time, if you tell munin to add
24 >>>> them all up, it will do so and graph it.
25 >>>>
26 >>>> Qucik test is to look at the graph config.
27 >>>
28 >>>
29 >>> This graph lives in the "network" section of the munin web interface.
30 >>> There is no matching section in /etc/munin/plugin-conf.d/munin-node so
31 >>> it should be be using the default config.
32 >>>
33 >>> Any ideas based on this new info?
34 >>
35 >> A few :-)
36 >>
37 >>
38 >> I can't find the plugin that delivers that graph though. Maybe I just
39 >> don't have it, maybe it comes from contrib/
40 >>
41 >> What's your USE for munin?
42 >
43 >
44 > USE="apache cgi http mysql ssl syslog -asterisk -dhcpd -doc -ipmi
45 > -ipv6 -irc -java -memcached -minimal -postgres (-selinux) {-test}"
46 >
47 >
48 >> What do you have in "ls -al /etc/munin/plugins/" ?
49
50
51 It's as I thought - your data is accurate but rrd has been given a
52 completely wrong method to derive the graphs.
53
54 Munin graphs for section "Network" do not have to be in a file called
55 "network" - it's just a category and the plugin defines what web-page
56 section it must be in. In your case, the relevant plugin is
57 netstat_multi which doesn't often get installed. It's data source is
58 "netstat -s" so grep that output for "timeout" to see it.
59
60 Timeouts are cumulative counters, they do not get less till they wrap
61 around. So to scale them, the plugin gets the rrd file to subtract
62 previous reading from current reading and divide by the time interval to
63 get the timeouts/sec. This is all done inside rrd when the data files
64 are updated (it's quite a lot of magic)
65
66 That plugin sets the graph type to DERIVE
67 (/etc/munin/plugins/netstat_multi around line 190. I feel it should be
68 GAUGE or COUNTER.
69
70 The proper reference on rrd is
71 http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html
72 and the munin docs are
73 https://munin.readthedocs.org/en/latest/index.html
74
75 You must edit the plugin file and IIRC recreate the rrd, you will lose
76 all past info (can't be helped).
77
78
79 [snip ls output]
80
81
82 > P.S. Any other good plugins you'd recommend?
83
84 http://gallery.munin-monitoring.org/
85
86 Monitoring is highly site-specific so recommendations aren't usually
87 worth much, but that gallery has LOTS of contributed plugins
88
89 --
90 Alan McKinnon
91 alan.mckinnon@×××××.com

Replies

Subject Author
Re: [gentoo-user] strange TCP timeout errors brettrsears@×××××.com
Re: [gentoo-user] strange TCP timeout errors Grant <emailgrant@×××××.com>