[gentoo-amd64] Re: Hi and init problem - gentoo-amd64

From:	Duncan <1i5t5.duncan@×××.net>
To:	gentoo-amd64@l.g.o
Subject:	[gentoo-amd64] Re: Hi and init problem
Date:	Mon, 08 May 2006 10:16:03
Message-Id:	`pan.2006.05.08.10.13.34.966031@cox.net`
In Reply to:	Re: [gentoo-amd64] Hi and init problem by Dieter Ries

1

Dieter Ries posted <200605081030.04439.clip2@×××.de>, excerpted below,  on

2

Mon, 08 May 2006 10:30:02 +0200:

3

4

> I still dont understand why

5

> Checking all filesystems

6

> is running in the boot-up process without checkfs and checkroot in one of

7

> the runlevels.

8

9

There's two reasons for that.

10

11

One, Gentoo has an initscript dependency system.  If you had read the

12

Working with Gentoo section of the handbook, you'd probably understand

13

this a bit better.  Unfortunately, many people apparently think the

14

handbook is only for installation, and end up missing out on understanding

15

a lot of the rest of Gentoo as covered in the rest of the handbook. 

16

Without that understanding, they are much less efficient at properly

17

administrating their Gentoo system than they'd be otherwise, as they end

18

up doing things the hard way, and making mistakes they'd not make had they

19

read the documentation.  Gentoo has a reputation for some of the best

20

documentation in the community, so it's a shame when folks don't read it

21

and end up doing things the hard way as a result.

22

23

Anyway, what it amounts to is that other initscripts depend on checkfs and

24

checkroot, so the system ensures they are run before these other

25

initscripts  run, even if checkroot and checkfs aren't directly listed to

26

be run, themselves.  Again, this is covered in the handbook, if you want

27

to better understand how and why it works that way.

28

29

Reason two is actually what's working here, however.  Without it, it would

30

fall back to reason one above, but reason two is the actual mechanism in

31

play here.  Unfortunately, this one is /not/ covered in the handbook, or

32

wasn't last I looked, anyway.  However, it's a logical extension of reason

33

one, so understanding it makes following reason two easier.

34

35

As actually implemented by the /sbin/rc initscript (which is run

36

repeatedly by init, as configured in /etc/inittab, as part of the boot

37

process), certain scripts are considered "critical" to the boot process,

38

and thus, barring a local configuration that bypasses them, default to

39

being run directly by /sbin/rc as part of the boot process, regardless of

40

whether they are in the boot runlevel or not.

41

42

Take a look at the "get_critical_services" routine in /sbin/rc. 

43

Basically, unless you have an /etc/runlevels/boot/.critical file, rc sets:

44

45

CRITICAL_SERVICES="checkroot modules checkfs localmount clock"

46

47

Those services are then started in exactly that order, directly by rc,

48

previous to running the boot runlevel, regardless of whether they are set

49

to be started by the boot runlevel or not.

50

51

If you have the modules you need to mount your automatically mounted

52

filesystems built into the kernel, you can eliminate modules from that

53

list.  You can also try eliminating checkroot and checkfs, and localmount

54

in some cases, but the results won't always be quite what you expected. 

55

Certain other services might not start in the expected order, or at all,

56

because stuff is missing that they depend on and assume  is there.

57

58

With my system, I can safely list only checkroot and clock in my

59

/etc/runlevels/boot/.critical file.  That works, altho I have checkfs and

60

localmount in the boot runlevel so they get run anyway -- they just

61

parallelize a bit better (I have RC_PARALLEL_STARTUP="yes" set in

62

/etc/conf.d/rc).  However, if I remove checkroot or clock from the

63

.critical file, things don't work quite right -- they have to be there and

64

started by rc directly or the rest of the services in the boot runlevel

65

don't work as intended.

66

67

68

The question then occurs...  Why are these services considered so

69

critical?  In general, you will find your system remains much more stable

70

if you run checkroot and checkfs at boot every time, for your normally

71

mounted filesystems.  The problem is that a hardware fault that would

72

cause a small problem, if caught by an fsck at the next boot, may end up

73

being a HUGE problem if the system is allowed to continue writing to that

74

filesystem as if nothing were wrong.  A single cross-linked file can soon

75

become hundreds or thousands, as the metadata becomes increasingly

76

jumbled, until it's impossible to recover from without simply overwriting

77

it with a good backup.  The problem may take weeks or months, even years,

78

to develop into a system stability compromising issue that's finally

79

noticed when something critical gets damaged.  However, regularly running

80

those at-boot fscks ensures that doesn't happen.  With a journaled

81

filesystem, it's not as if it takes hours to run those checks anyway.  A

82

few extra seconds or a minute taken at boot, can save you a huge amount of

83

work later, because a small and initially insignificant error wasn't

84

caught until hundreds of files had been corrupted.

85

86

Of course, one is also expected to use fstab appropriately, turning off

87

fsck at boot for non-critical or not automounted filesystems.  Here, I

88

have identical backup snapshots of all the filesystems I consider valuable

89

enough to want to retain.  Those are not automounted, and are only written

90

to when I mkfs them and recopy over the data from the live filesystem

91

periodically as part of my backup routine.  As such, there's no need to

92

fsck them at every boot, because they've most likely not even been touched

93

since the last boot, not written to, not read from, or even mounted. 

94

Likewise, any partitions (like /tmp) that contain essentially throwaway

95

data, it's probably safe to skip the fsck, putting a zero in the

96

appropriate column of fstab.

97

98

For any partitions you depend on, however, while you can probably get away

99

with avoiding fsck at boot in the short term, to be safe, it's far better

100

just to do it.  As mentioned by someone else, you can set ext3 partitions

101

to not fsck at every boot, if desired.  That's a useful option.  Set it to

102

every third boot, or every fifth, but don't turn it off entirely, at the

103

risk of not catching minor/insignificant damage until it's major and

104

causes you serious issues.  Keep in mind that even a partition never

105

written to will develop "bit rot" over time, due to cosmic ray bitflipping

106

and the like.  The reality is that on the single bit level hard drives

107

aren't nearly as reliable as we like to think they are.  Awesome levels of

108

automated redundant information and error correction normally handle the

109

problems as they develop, correcting them behind the scenes.  That's

110

normal and good, and generally suffices for partitions not normally

111

written to.  However, once you start actively using a partition, writing

112

as well as reading, if one of those normally insignificant bitflips

113

happens in the wrong place, your write intended for one location on the

114

disk might end up at quite a different location.  That's what automated

115

fscks at boot, even after proper shutdown, are designed to detect and

116

correct.  Catch it early, and it's insignificant, background noise,

117

corrected by automated mechanisms such that you likely won't notice it at

118

all.  Fail to do those automated boot-time fscks, and you are playing the

119

odds, risking your data.  Setting the fscks to once every third boot is

120

still well within reasonable safety limits,  Setting one in five should be

121

safe under normal conditions but is playing the odds a bit more.  I'd not

122

recommend turning it off altogether, or setting it much less frequently

123

than one in five, as that's just undue risk, IMO.  You may well have no

124

problems doing it that way for years, if ever.  Another person may have

125

problems in a week or a month.  It's up to you how much risk you want to

126

put your data at.

127

128

Meanwhile, back in the Gentoo init scripts, mandating checkroot and

129

checkfs as "critical" parts of the boot sequence remains the most sane

130

default.  Gentoo provides the configurability to change those defaults for

131

those sysadmins that choose to do so, but setting anything else as the

132

default would simply not be the sane or responsible thing for Gentoo devs

133

to do.

134

135

--

136

Duncan - List replies preferred.   No HTML msgs.

137

"Every nonfree program has a lord, a master --

138

and if you use the program, he is your master."  Richard Stallman in

139

http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html

140

141

142

--

143

gentoo-amd64@g.o mailing list

1	Dieter Ries posted <200605081030.04439.clip2@×××.de>, excerpted below, on
2	Mon, 08 May 2006 10:30:02 +0200:
3
4	> I still dont understand why
5	> Checking all filesystems
6	> is running in the boot-up process without checkfs and checkroot in one of
7	> the runlevels.
8
9	There's two reasons for that.
10
11	One, Gentoo has an initscript dependency system. If you had read the
12	Working with Gentoo section of the handbook, you'd probably understand
13	this a bit better. Unfortunately, many people apparently think the
14	handbook is only for installation, and end up missing out on understanding
15	a lot of the rest of Gentoo as covered in the rest of the handbook.
16	Without that understanding, they are much less efficient at properly
17	administrating their Gentoo system than they'd be otherwise, as they end
18	up doing things the hard way, and making mistakes they'd not make had they
19	read the documentation. Gentoo has a reputation for some of the best
20	documentation in the community, so it's a shame when folks don't read it
21	and end up doing things the hard way as a result.
22
23	Anyway, what it amounts to is that other initscripts depend on checkfs and
24	checkroot, so the system ensures they are run before these other
25	initscripts run, even if checkroot and checkfs aren't directly listed to
26	be run, themselves. Again, this is covered in the handbook, if you want
27	to better understand how and why it works that way.
28
29	Reason two is actually what's working here, however. Without it, it would
30	fall back to reason one above, but reason two is the actual mechanism in
31	play here. Unfortunately, this one is /not/ covered in the handbook, or
32	wasn't last I looked, anyway. However, it's a logical extension of reason
33	one, so understanding it makes following reason two easier.
34
35	As actually implemented by the /sbin/rc initscript (which is run
36	repeatedly by init, as configured in /etc/inittab, as part of the boot
37	process), certain scripts are considered "critical" to the boot process,
38	and thus, barring a local configuration that bypasses them, default to
39	being run directly by /sbin/rc as part of the boot process, regardless of
40	whether they are in the boot runlevel or not.
41
42	Take a look at the "get_critical_services" routine in /sbin/rc.
43	Basically, unless you have an /etc/runlevels/boot/.critical file, rc sets:
44
45	CRITICAL_SERVICES="checkroot modules checkfs localmount clock"
46
47	Those services are then started in exactly that order, directly by rc,
48	previous to running the boot runlevel, regardless of whether they are set
49	to be started by the boot runlevel or not.
50
51	If you have the modules you need to mount your automatically mounted
52	filesystems built into the kernel, you can eliminate modules from that
53	list. You can also try eliminating checkroot and checkfs, and localmount
54	in some cases, but the results won't always be quite what you expected.
55	Certain other services might not start in the expected order, or at all,
56	because stuff is missing that they depend on and assume is there.
57
58	With my system, I can safely list only checkroot and clock in my
59	/etc/runlevels/boot/.critical file. That works, altho I have checkfs and
60	localmount in the boot runlevel so they get run anyway -- they just
61	parallelize a bit better (I have RC_PARALLEL_STARTUP="yes" set in
62	/etc/conf.d/rc). However, if I remove checkroot or clock from the
63	.critical file, things don't work quite right -- they have to be there and
64	started by rc directly or the rest of the services in the boot runlevel
65	don't work as intended.
66
67
68	The question then occurs... Why are these services considered so
69	critical? In general, you will find your system remains much more stable
70	if you run checkroot and checkfs at boot every time, for your normally
71	mounted filesystems. The problem is that a hardware fault that would
72	cause a small problem, if caught by an fsck at the next boot, may end up
73	being a HUGE problem if the system is allowed to continue writing to that
74	filesystem as if nothing were wrong. A single cross-linked file can soon
75	become hundreds or thousands, as the metadata becomes increasingly
76	jumbled, until it's impossible to recover from without simply overwriting
77	it with a good backup. The problem may take weeks or months, even years,
78	to develop into a system stability compromising issue that's finally
79	noticed when something critical gets damaged. However, regularly running
80	those at-boot fscks ensures that doesn't happen. With a journaled
81	filesystem, it's not as if it takes hours to run those checks anyway. A
82	few extra seconds or a minute taken at boot, can save you a huge amount of
83	work later, because a small and initially insignificant error wasn't
84	caught until hundreds of files had been corrupted.
85
86	Of course, one is also expected to use fstab appropriately, turning off
87	fsck at boot for non-critical or not automounted filesystems. Here, I
88	have identical backup snapshots of all the filesystems I consider valuable
89	enough to want to retain. Those are not automounted, and are only written
90	to when I mkfs them and recopy over the data from the live filesystem
91	periodically as part of my backup routine. As such, there's no need to
92	fsck them at every boot, because they've most likely not even been touched
93	since the last boot, not written to, not read from, or even mounted.
94	Likewise, any partitions (like /tmp) that contain essentially throwaway
95	data, it's probably safe to skip the fsck, putting a zero in the
96	appropriate column of fstab.
97
98	For any partitions you depend on, however, while you can probably get away
99	with avoiding fsck at boot in the short term, to be safe, it's far better
100	just to do it. As mentioned by someone else, you can set ext3 partitions
101	to not fsck at every boot, if desired. That's a useful option. Set it to
102	every third boot, or every fifth, but don't turn it off entirely, at the
103	risk of not catching minor/insignificant damage until it's major and
104	causes you serious issues. Keep in mind that even a partition never
105	written to will develop "bit rot" over time, due to cosmic ray bitflipping
106	and the like. The reality is that on the single bit level hard drives
107	aren't nearly as reliable as we like to think they are. Awesome levels of
108	automated redundant information and error correction normally handle the
109	problems as they develop, correcting them behind the scenes. That's
110	normal and good, and generally suffices for partitions not normally
111	written to. However, once you start actively using a partition, writing
112	as well as reading, if one of those normally insignificant bitflips
113	happens in the wrong place, your write intended for one location on the
114	disk might end up at quite a different location. That's what automated
115	fscks at boot, even after proper shutdown, are designed to detect and
116	correct. Catch it early, and it's insignificant, background noise,
117	corrected by automated mechanisms such that you likely won't notice it at
118	all. Fail to do those automated boot-time fscks, and you are playing the
119	odds, risking your data. Setting the fscks to once every third boot is
120	still well within reasonable safety limits, Setting one in five should be
121	safe under normal conditions but is playing the odds a bit more. I'd not
122	recommend turning it off altogether, or setting it much less frequently
123	than one in five, as that's just undue risk, IMO. You may well have no
124	problems doing it that way for years, if ever. Another person may have
125	problems in a week or a month. It's up to you how much risk you want to
126	put your data at.
127
128	Meanwhile, back in the Gentoo init scripts, mandating checkroot and
129	checkfs as "critical" parts of the boot sequence remains the most sane
130	default. Gentoo provides the configurability to change those defaults for
131	those sysadmins that choose to do so, but setting anything else as the
132	default would simply not be the sane or responsible thing for Gentoo devs
133	to do.
134
135	--
136	Duncan - List replies preferred. No HTML msgs.
137	"Every nonfree program has a lord, a master --
138	and if you use the program, he is your master." Richard Stallman in
139	http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
140
141
142	--
143	gentoo-amd64@g.o mailing list

Gentoo Archives: gentoo-amd64