Re: [gentoo-server] DoS Analysis and Prevemption - gentoo-server

From:	Kerin Millar <kerframil@×××××××××××.uk>
To:	gentoo-server@l.g.o
Subject:	Re: [gentoo-server] DoS Analysis and Prevemption
Date:	Sun, 28 Jul 2013 14:01:46
Message-Id:	`51F5243A.2080208@fastmail.co.uk`
In Reply to:	[gentoo-server] DoS Analysis and Prevemption by Christian Parpart

1

On 15/04/2013 16:07, Christian Parpart wrote:

2

> Hey all,

3

>

4

> we hit some nice traffic last night that took our main gateway down.

5

> Pacemaker was configured to failover to our second one, but that one

6

> died aswell.

7

>

8

> In a little post-analysis, I found the following in the logs:

9

>

10

> Apr 14 21:42:11 cesar1 kernel: [27613652.439846] BUG: soft lockup -

11

> CPU#4 stuck for 22s! [swapper/4:0]

12

> Apr 14 21:42:11 cesar1 kernel: [27613652.440319] Stack:

13

> Apr 14 21:42:11 cesar1 kernel: [27613652.440446] Call Trace:

14

> Apr 14 21:42:11 cesar1 kernel: [27613652.440595]  <IRQ>

15

> Apr 14 21:42:12 cesar1 kernel: [27613652.440828]  <EOI>

16

> Apr 14 21:42:12 cesar1 kernel: [27613652.440979] Code: c1 51 da 03 81 48

17

> c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90

18

> 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2

19

> Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not connect to any

20

> LDAP server as cn=admin,dc=rz,dc=dawanda,dc=com - Can't contact LDAP server

21

> Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not search LDAP

22

> server - Server is unavailable

23

> Apr 14 21:42:24 cesar1 crmd: [7287]: ERROR: process_lrm_event: LRM

24

> operation management-gateway-ip1_stop_0 (917) Timed Out (timeout=20000ms)

25

> Apr 14 21:42:48 cesar1 kernel: [27613688.611501] BUG: soft lockup -

26

> CPU#7 stuck for 22s! [named:32166]

27

> Apr 14 21:42:48 cesar1 kernel: [27613688.611914] Stack:

28

> Apr 14 21:42:48 cesar1 kernel: [27613688.612036] Call Trace:

29

> Apr 14 21:42:48 cesar1 kernel: [27613688.612200]  <IRQ>

30

> Apr 14 21:42:48 cesar1 kernel: [27613688.612408]  <EOI>

31

> Apr 14 21:42:48 cesar1 kernel: [27613688.612626] Code: c1 51 da 03 81 48

32

> c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90

33

> 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2

34

> Apr 14 21:42:55 cesar1 kernel: [27613695.946295] BUG: soft lockup -

35

> CPU#0 stuck for 21s! [ksoftirqd/0:3]

36

> Apr 14 21:42:55 cesar1 kernel: [27613695.946785] Stack:

37

> Apr 14 21:42:55 cesar1 kernel: [27613695.946917] Call Trace:

38

> Apr 14 21:42:55 cesar1 kernel: [27613695.947137] Code: c4 00 00 81 a8 44

39

> e0 ff ff ff 01 00 00 48 63 80 44 e0 ff ff a9 00 ff ff 07 74 36 65 48 8b

40

> 04 25 c8 c4 00 00 83 a8 44 e0 ff ff 01 <5d> c3

41

>

42

> We're using irqbalance to not only hit the first CPU for ethernet card

43

> hardware interrupts when traffic comes in (learned from last much more

44

> intensive DDoS).

45

46

To use irqbalance is wise. You could also try using receive packet 

47

steering [1] [2]:

48

49

#!/bin/bash

50

iface='eth*'

51

flow=16384

52

echo $flow > /proc/sys/net/core/rps_sock_flow_entries

53

queues=(/sys/class/net/${iface}/queues/rx-*)

54

for rx in "${queues[@]}"; do

55

         echo $(sed -e 's/0/f/g' < $rx/rps_cpus) > $rx/rps_cpus

56

         echo $flow > $rx/rps_flow_cnt

57

done

58

59

I have found this to be beneficial on systems running networking 

60

applications that are subject to a high load, but not for systems that 

61

are simply forwarding packets and processing them entirely in kernel space.

62

63

> However, since this not helped, I'd like to find out what else we can

64

> do. Our gateway has to do NAT and has a few other iptables rules it

65

> needs in order to run OpenStack behind,

66

> so I can't just drop it.

67

>

68

> Regarding the logs, I can see, that something caused the CPU cores to

69

> get stuck for a number of different processes.

70

> Has anyone ever encountered such error messages I quoted above or knows

71

72

I used to encounter them but they cleared up at some point during the 

73

3.4 (longterm) kernel series. If you also use the 3.4 series, I would 

74

advise upgrading if running < 3.4.51. If you are not using a longterm 

75

kernel, consider doing so unless there is a feature in a later kernel 

76

that you cannot do without. My experience of the later 'stable' kernels 

77

lately is that they have a tendency to introduce serious regressions.

78

79

> other things one might want to do in order to prevent hugh unsocialized

80

> incoming traffic from bringing a Linux node down?

81

82

If you can, talk with your upstream to see if there is a way in which 

83

such traffic can be throttled there.

84

85

Be sure to use good quality NICs. In particular, it should support 

86

multiqueue and adjustable interrupt coalescing (preferably on a dynamic 

87

basis). For what it's worth, I'm using Intel 82576 based cards for busy 

88

hosts. These support dynamic interrupt throttling. Even without such a 

89

feature, some cards will allow their behaviour to be altered via ethtool 

90

-C. Google will turn up a lot of information on this topic.

91

92

I should add that the stability of the driver is of paramount 

93

performance. Though my Intel cards have been solid, the igb driver 

94

bundled with the 3.4 kernel is not, which took me a long time to figure 

95

out. I now use a local ebuild to compile the igb driver from upstream. 

96

Not only did it improve performance, but it resolved all stability 

97

issues that I had experienced up until then.

98

99

In the event that you are also using the igb driver, ensure that it is 

100

configured optimally for multiqueue. Here's an example for the upstream 

101

driver (my NIC has 4 ports, each with 8 queues):

102

103

# cat /etc/modprobe.d/igb.conf

104

options igb RSS=8,8,8,8

105

106

Enable I/OAT if your hardware supports it. Some hardware will support it 

107

but fail to expose a BIOS option to enable it, in which case you can try 

108

using dca_force [3] (YMMV). Similarly, make use of x2APIC if supported, 

109

but do not make use of the IOMMU provided by Intel as of Nehalem (boot 

110

with intel_iommu=off if in doubt).

111

112

Consider fine-tuning sysctl.conf, especially those pertaining to buffer 

113

sizes/limits. I would consider this essential if operating at gigabit 

114

speeds or higher. Examples are widespread, such as in section 3.1 of the 

115

Mellanox performance tuning guide [4].

116

117

--Kerin

118

119

[1] https://lwn.net/Articles/361440/

120

[2] http://thread.gmane.org/gmane.linux.network/179883/focus=179976

121

[3] https://github.com/ice799/dca_force

122

[4]

123

http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters_rev_1_0.pdf

1	On 15/04/2013 16:07, Christian Parpart wrote:
2	> Hey all,
3	>
4	> we hit some nice traffic last night that took our main gateway down.
5	> Pacemaker was configured to failover to our second one, but that one
6	> died aswell.
7	>
8	> In a little post-analysis, I found the following in the logs:
9	>
10	> Apr 14 21:42:11 cesar1 kernel: [27613652.439846] BUG: soft lockup -
11	> CPU#4 stuck for 22s! [swapper/4:0]
12	> Apr 14 21:42:11 cesar1 kernel: [27613652.440319] Stack:
13	> Apr 14 21:42:11 cesar1 kernel: [27613652.440446] Call Trace:
14	> Apr 14 21:42:11 cesar1 kernel: [27613652.440595] <IRQ>
15	> Apr 14 21:42:12 cesar1 kernel: [27613652.440828] <EOI>
16	> Apr 14 21:42:12 cesar1 kernel: [27613652.440979] Code: c1 51 da 03 81 48
17	> c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90
18	> 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2
19	> Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not connect to any
20	> LDAP server as cn=admin,dc=rz,dc=dawanda,dc=com - Can't contact LDAP server
21	> Apr 14 21:42:12 cesar1 CRON[13599]: nss_ldap: could not search LDAP
22	> server - Server is unavailable
23	> Apr 14 21:42:24 cesar1 crmd: [7287]: ERROR: process_lrm_event: LRM
24	> operation management-gateway-ip1_stop_0 (917) Timed Out (timeout=20000ms)
25	> Apr 14 21:42:48 cesar1 kernel: [27613688.611501] BUG: soft lockup -
26	> CPU#7 stuck for 22s! [named:32166]
27	> Apr 14 21:42:48 cesar1 kernel: [27613688.611914] Stack:
28	> Apr 14 21:42:48 cesar1 kernel: [27613688.612036] Call Trace:
29	> Apr 14 21:42:48 cesar1 kernel: [27613688.612200] <IRQ>
30	> Apr 14 21:42:48 cesar1 kernel: [27613688.612408] <EOI>
31	> Apr 14 21:42:48 cesar1 kernel: [27613688.612626] Code: c1 51 da 03 81 48
32	> c7 c2 4e da 03 81 e9 dd fe ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90
33	> 55 b8 00 00 01 00 48 89 e5 f0 0f c1 07 <89> c2
34	> Apr 14 21:42:55 cesar1 kernel: [27613695.946295] BUG: soft lockup -
35	> CPU#0 stuck for 21s! [ksoftirqd/0:3]
36	> Apr 14 21:42:55 cesar1 kernel: [27613695.946785] Stack:
37	> Apr 14 21:42:55 cesar1 kernel: [27613695.946917] Call Trace:
38	> Apr 14 21:42:55 cesar1 kernel: [27613695.947137] Code: c4 00 00 81 a8 44
39	> e0 ff ff ff 01 00 00 48 63 80 44 e0 ff ff a9 00 ff ff 07 74 36 65 48 8b
40	> 04 25 c8 c4 00 00 83 a8 44 e0 ff ff 01 <5d> c3
41	>
42	> We're using irqbalance to not only hit the first CPU for ethernet card
43	> hardware interrupts when traffic comes in (learned from last much more
44	> intensive DDoS).
45
46	To use irqbalance is wise. You could also try using receive packet
47	steering [1] [2]:
48
49	#!/bin/bash
50	iface='eth*'
51	flow=16384
52	echo $flow > /proc/sys/net/core/rps_sock_flow_entries
53	queues=(/sys/class/net/${iface}/queues/rx-*)
54	for rx in "${queues[@]}"; do
55	echo $(sed -e 's/0/f/g' < $rx/rps_cpus) > $rx/rps_cpus
56	echo $flow > $rx/rps_flow_cnt
57	done
58
59	I have found this to be beneficial on systems running networking
60	applications that are subject to a high load, but not for systems that
61	are simply forwarding packets and processing them entirely in kernel space.
62
63	> However, since this not helped, I'd like to find out what else we can
64	> do. Our gateway has to do NAT and has a few other iptables rules it
65	> needs in order to run OpenStack behind,
66	> so I can't just drop it.
67	>
68	> Regarding the logs, I can see, that something caused the CPU cores to
69	> get stuck for a number of different processes.
70	> Has anyone ever encountered such error messages I quoted above or knows
71
72	I used to encounter them but they cleared up at some point during the
73	3.4 (longterm) kernel series. If you also use the 3.4 series, I would
74	advise upgrading if running < 3.4.51. If you are not using a longterm
75	kernel, consider doing so unless there is a feature in a later kernel
76	that you cannot do without. My experience of the later 'stable' kernels
77	lately is that they have a tendency to introduce serious regressions.
78
79	> other things one might want to do in order to prevent hugh unsocialized
80	> incoming traffic from bringing a Linux node down?
81
82	If you can, talk with your upstream to see if there is a way in which
83	such traffic can be throttled there.
84
85	Be sure to use good quality NICs. In particular, it should support
86	multiqueue and adjustable interrupt coalescing (preferably on a dynamic
87	basis). For what it's worth, I'm using Intel 82576 based cards for busy
88	hosts. These support dynamic interrupt throttling. Even without such a
89	feature, some cards will allow their behaviour to be altered via ethtool
90	-C. Google will turn up a lot of information on this topic.
91
92	I should add that the stability of the driver is of paramount
93	performance. Though my Intel cards have been solid, the igb driver
94	bundled with the 3.4 kernel is not, which took me a long time to figure
95	out. I now use a local ebuild to compile the igb driver from upstream.
96	Not only did it improve performance, but it resolved all stability
97	issues that I had experienced up until then.
98
99	In the event that you are also using the igb driver, ensure that it is
100	configured optimally for multiqueue. Here's an example for the upstream
101	driver (my NIC has 4 ports, each with 8 queues):
102
103	# cat /etc/modprobe.d/igb.conf
104	options igb RSS=8,8,8,8
105
106	Enable I/OAT if your hardware supports it. Some hardware will support it
107	but fail to expose a BIOS option to enable it, in which case you can try
108	using dca_force [3] (YMMV). Similarly, make use of x2APIC if supported,
109	but do not make use of the IOMMU provided by Intel as of Nehalem (boot
110	with intel_iommu=off if in doubt).
111
112	Consider fine-tuning sysctl.conf, especially those pertaining to buffer
113	sizes/limits. I would consider this essential if operating at gigabit
114	speeds or higher. Examples are widespread, such as in section 3.1 of the
115	Mellanox performance tuning guide [4].
116
117	--Kerin
118
119	[1] https://lwn.net/Articles/361440/
120	[2] http://thread.gmane.org/gmane.linux.network/179883/focus=179976
121	[3] https://github.com/ice799/dca_force
122	[4]
123	http://www.mellanox.com/related-docs/prod_software/Performance_Tuning_Guide_for_Mellanox_Network_Adapters_rev_1_0.pdf

Gentoo Archives: gentoo-server