Re: [gentoo-server] It's amazing what switching the IO scheduler can do. - gentoo-server

From:	Kerin Millar <kerframil@×××××.com>
To:	gentoo-server@l.g.o
Subject:	Re: [gentoo-server] It's amazing what switching the IO scheduler can do.
Date:	Sat, 06 Aug 2005 13:55:33
Message-Id:	`279fbba405080606533ac4d0e0@mail.gmail.com`
In Reply to:	Re: [gentoo-server] It's amazing what switching the IO scheduler can do. by William Kenworthy

1

On 8/6/05, William Kenworthy <billk@×××××××××.au> wrote:

2

> Hi, is there any guide as to what scheduler is best for what purposes?

3

> I have used CFQ since the early days as I was rather unhappy with the

4

> performance of 2.6 as against 2.4, but I am finding the latest 2.6

5

> kernels do not seem to be running as well as the earlier ones with this

6

> scheduler - so I presume there have have been changes/improvements etc

7

> that may mean I need to reassess what I am using.

8

9

Firstly, if you are wish to fairly compare 2.6 to 2.4 on a

10

"like-for-like" basis then configure the 2.6 kernel to run at 100HZ

11

and boot it with the "elevator=deadline" option. Secondly, it is

12

superfluous to base any performance notions upon experiences with

13

earlier versions of 2.6. I would definitely recommend that you

14

reassess the situation if needs be.

15

16

The topic of elevators/schedulers has been discussed in great (and

17

revealing) detail in various posts over at kerneltrap.org in the past.

18

The most informative post that I can recall was posted by someone who

19

unfortunately elected to reamain anonymous so I cannot credit him by

20

name. I daresay he would not object to my quoting him in full so

21

that's what I'm going to do:

22

23

<quote="Anonymous on Monday, September 20, 2004 - 09:31">

24

This is just an executive summary. I don't know the details.

25

Normally a disk scheduler tries to balance between these goals:

26

27

- fairness (let every process have its share of the access to disk)

28

- performance (try to serve requests close to current disk head

29

position first, because seeking there is fastest)

30

- realtime (guarantee that a request is serviced in a given time)

31

32

A scheduler gets as input the list of sectors that processes have

33

recently requested and which haven't yet given. It knows where the

34

disk head currently is, and whatever other data it "remembers" from

35

the past requests.

36

37

A typical scheduler is called cyclic elevator, where disk heads move

38

from beginning of disk to the end of disk in one direction. Assume

39

that you get some requests in a sorted queue, and you're going to

40

satisfy them. So, you'll first filter the list of requests by deciding

41

that you will not satisfy requests for sectors below your current head

42

position. Then you'll simply go to the sector closest to your current

43

position in your sorted list. So, crunch crunch crunch, you move from

44

start of disk to the end of disk. When you reach the highest

45

unsatisfied sector number, your cycle is over and you seek to the

46

lowest unsatisfied sector. Rinse and repeat.

47

48

Linux 2.2 at least used the cyclic elevator if I remember right.

49

50

So, with these approximate goals in mind, we can look at the different

51

schedulers available.

52

53

- deadline scheduler: this is a cyclic elevator but with a twist:

54

requests are given a deadline by which they get served. When a request

55

starts to look like it's going to expire, the kernel will skip

56

intermediate sectors and move to that request, thus giving some

57

realtime behaviour.

58

59

- AS scheduler: a cyclic elevator with waiting policy: after you

60

service a request that looks like it might have future requests coming

61

nearby, you pause even if there's more sectors in your work queue. The

62

anticipatory scheduler literally anticipates more requests to follow

63

on this track or very close by. How AS decides whether to anticipate

64

is basically just lot of guesswork based on typical access patterns.

65

66

- cfq scheduler: different sort of stab at fairness. It's different

67

from either of these two, and doesn't use cyclic elevator and has

68

realtime guarantees and aggressively avoids starvation. It could be a

69

good scheduler for multiuser systems.

70

71

- noop scheduler: just service next request in the queue without any

72

algorithm to prefer this or that request.

73

74

You want to use noop scheduler on devices where there are no seeking

75

penalty, such as flash drives. That's why USB stick wants noop.

76

Unfortunately, harddisks are very mechanial beasts and their

77

performance is highly controlled by their seeking abilities. All these

78

schedulers above are really trying to figure out how to extract

79

maximum performance off the harddisk without causing bad behaviour in

80

other cases.

81

</quote>

82

83

Note that if you're using a sophisticated hardware RAID device then

84

"noop" may also be a good choice. This is because there may only be

85

one logical block device presented to Linux and it doesn't make sense

86

for a fancy elevator to spend its time reordering requests according

87

to a model of access that is irrelevant to the underlying media (which

88

is abastracted from the kernel).

89

90

The explanation for CFQ is a little scanty. Another anonymous poster

91

at the same site summarised it rather well:

92

93

<quote="Anonymous on Wednesday, February 26, 2003 - 15:38">

94

CFQ aims to provide fairness among competing tasks in that when more

95

than one task requests disk access, the I/O scheduler round-robins

96

among all requestors rather than letting one requestor dominate. It

97

can result in seek storm, however, when the various tasks want to

98

access different areas of the disk. Thus, it can be bad for total

99

throughput. (This is unlike the network stack, where the cost of

100

switching between two unrelated streams is zero or nearly zero.)

101

102

AS seeks to improve throughput by reducing the seek storms that occur

103

when you switch among requestors. It observes that most small reads

104

are so-called dependent reads, and so even though a given requestor

105

might have only one request queued at the moment, it may have several

106

others for areas nearby on the disk. Thus, when it picks one of these

107

reads to schedule, it decides whether to hold off scheduling anything

108

else for that same disk for "a moment" to see if more read requests

109

arrive. The net effect is to reduce the total number of seeks that

110

result between competing streams by allowing greater batches of nearby

111

reads to execute together. Overall throughput should go up as a

112

result.

113

114

CFQ should be better for low-latency tasks *if* the resultant seek

115

storms don't kill throughput. If you have two streams duking it out

116

and no hysteresis to keep you from hopping queues continuously, then

117

the resulting seeks could cause CFQ to leave you crying. In the

118

dependent read case, since the requestor's queue will go empty (and

119

cause a queue switch) after each request, it seems as though these

120

seek storms are practically a given without aggressive readahead.

121

</quote>

122

123

Personally, I'm using the deadline elevator with an md (software)

124

RAID5 setup in conjunction with LVM2.

125

126

Here are my sources for the above quotations:

127

128

http://kerneltrap.org/node/3851

129

http://kerneltrap.org/node/592/2516

130

131

Cheers,

132

133

--Kerin Millar

134

135

--

136

gentoo-server@g.o mailing list

1	On 8/6/05, William Kenworthy <billk@×××××××××.au> wrote:
2	> Hi, is there any guide as to what scheduler is best for what purposes?
3	> I have used CFQ since the early days as I was rather unhappy with the
4	> performance of 2.6 as against 2.4, but I am finding the latest 2.6
5	> kernels do not seem to be running as well as the earlier ones with this
6	> scheduler - so I presume there have have been changes/improvements etc
7	> that may mean I need to reassess what I am using.
8
9	Firstly, if you are wish to fairly compare 2.6 to 2.4 on a
10	"like-for-like" basis then configure the 2.6 kernel to run at 100HZ
11	and boot it with the "elevator=deadline" option. Secondly, it is
12	superfluous to base any performance notions upon experiences with
13	earlier versions of 2.6. I would definitely recommend that you
14	reassess the situation if needs be.
15
16	The topic of elevators/schedulers has been discussed in great (and
17	revealing) detail in various posts over at kerneltrap.org in the past.
18	The most informative post that I can recall was posted by someone who
19	unfortunately elected to reamain anonymous so I cannot credit him by
20	name. I daresay he would not object to my quoting him in full so
21	that's what I'm going to do:
22
23	<quote="Anonymous on Monday, September 20, 2004 - 09:31">
24	This is just an executive summary. I don't know the details.
25	Normally a disk scheduler tries to balance between these goals:
26
27	- fairness (let every process have its share of the access to disk)
28	- performance (try to serve requests close to current disk head
29	position first, because seeking there is fastest)
30	- realtime (guarantee that a request is serviced in a given time)
31
32	A scheduler gets as input the list of sectors that processes have
33	recently requested and which haven't yet given. It knows where the
34	disk head currently is, and whatever other data it "remembers" from
35	the past requests.
36
37	A typical scheduler is called cyclic elevator, where disk heads move
38	from beginning of disk to the end of disk in one direction. Assume
39	that you get some requests in a sorted queue, and you're going to
40	satisfy them. So, you'll first filter the list of requests by deciding
41	that you will not satisfy requests for sectors below your current head
42	position. Then you'll simply go to the sector closest to your current
43	position in your sorted list. So, crunch crunch crunch, you move from
44	start of disk to the end of disk. When you reach the highest
45	unsatisfied sector number, your cycle is over and you seek to the
46	lowest unsatisfied sector. Rinse and repeat.
47
48	Linux 2.2 at least used the cyclic elevator if I remember right.
49
50	So, with these approximate goals in mind, we can look at the different
51	schedulers available.
52
53	- deadline scheduler: this is a cyclic elevator but with a twist:
54	requests are given a deadline by which they get served. When a request
55	starts to look like it's going to expire, the kernel will skip
56	intermediate sectors and move to that request, thus giving some
57	realtime behaviour.
58
59	- AS scheduler: a cyclic elevator with waiting policy: after you
60	service a request that looks like it might have future requests coming
61	nearby, you pause even if there's more sectors in your work queue. The
62	anticipatory scheduler literally anticipates more requests to follow
63	on this track or very close by. How AS decides whether to anticipate
64	is basically just lot of guesswork based on typical access patterns.
65
66	- cfq scheduler: different sort of stab at fairness. It's different
67	from either of these two, and doesn't use cyclic elevator and has
68	realtime guarantees and aggressively avoids starvation. It could be a
69	good scheduler for multiuser systems.
70
71	- noop scheduler: just service next request in the queue without any
72	algorithm to prefer this or that request.
73
74	You want to use noop scheduler on devices where there are no seeking
75	penalty, such as flash drives. That's why USB stick wants noop.
76	Unfortunately, harddisks are very mechanial beasts and their
77	performance is highly controlled by their seeking abilities. All these
78	schedulers above are really trying to figure out how to extract
79	maximum performance off the harddisk without causing bad behaviour in
80	other cases.
81	</quote>
82
83	Note that if you're using a sophisticated hardware RAID device then
84	"noop" may also be a good choice. This is because there may only be
85	one logical block device presented to Linux and it doesn't make sense
86	for a fancy elevator to spend its time reordering requests according
87	to a model of access that is irrelevant to the underlying media (which
88	is abastracted from the kernel).
89
90	The explanation for CFQ is a little scanty. Another anonymous poster
91	at the same site summarised it rather well:
92
93	<quote="Anonymous on Wednesday, February 26, 2003 - 15:38">
94	CFQ aims to provide fairness among competing tasks in that when more
95	than one task requests disk access, the I/O scheduler round-robins
96	among all requestors rather than letting one requestor dominate. It
97	can result in seek storm, however, when the various tasks want to
98	access different areas of the disk. Thus, it can be bad for total
99	throughput. (This is unlike the network stack, where the cost of
100	switching between two unrelated streams is zero or nearly zero.)
101
102	AS seeks to improve throughput by reducing the seek storms that occur
103	when you switch among requestors. It observes that most small reads
104	are so-called dependent reads, and so even though a given requestor
105	might have only one request queued at the moment, it may have several
106	others for areas nearby on the disk. Thus, when it picks one of these
107	reads to schedule, it decides whether to hold off scheduling anything
108	else for that same disk for "a moment" to see if more read requests
109	arrive. The net effect is to reduce the total number of seeks that
110	result between competing streams by allowing greater batches of nearby
111	reads to execute together. Overall throughput should go up as a
112	result.
113
114	CFQ should be better for low-latency tasks if the resultant seek
115	storms don't kill throughput. If you have two streams duking it out
116	and no hysteresis to keep you from hopping queues continuously, then
117	the resulting seeks could cause CFQ to leave you crying. In the
118	dependent read case, since the requestor's queue will go empty (and
119	cause a queue switch) after each request, it seems as though these
120	seek storms are practically a given without aggressive readahead.
121	</quote>
122
123	Personally, I'm using the deadline elevator with an md (software)
124	RAID5 setup in conjunction with LVM2.
125
126	Here are my sources for the above quotations:
127
128	http://kerneltrap.org/node/3851
129	http://kerneltrap.org/node/592/2516
130
131	Cheers,
132
133	--Kerin Millar
134
135	--
136	gentoo-server@g.o mailing list

Gentoo Archives: gentoo-server