Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: Hi and init problem
Date: Mon, 08 May 2006 10:16:03
Message-Id: pan.2006.05.08.10.13.34.966031@cox.net
In Reply to: Re: [gentoo-amd64] Hi and init problem by Dieter Ries
1 Dieter Ries posted <200605081030.04439.clip2@×××.de>, excerpted below, on
2 Mon, 08 May 2006 10:30:02 +0200:
3
4 > I still dont understand why
5 > Checking all filesystems
6 > is running in the boot-up process without checkfs and checkroot in one of
7 > the runlevels.
8
9 There's two reasons for that.
10
11 One, Gentoo has an initscript dependency system. If you had read the
12 Working with Gentoo section of the handbook, you'd probably understand
13 this a bit better. Unfortunately, many people apparently think the
14 handbook is only for installation, and end up missing out on understanding
15 a lot of the rest of Gentoo as covered in the rest of the handbook.
16 Without that understanding, they are much less efficient at properly
17 administrating their Gentoo system than they'd be otherwise, as they end
18 up doing things the hard way, and making mistakes they'd not make had they
19 read the documentation. Gentoo has a reputation for some of the best
20 documentation in the community, so it's a shame when folks don't read it
21 and end up doing things the hard way as a result.
22
23 Anyway, what it amounts to is that other initscripts depend on checkfs and
24 checkroot, so the system ensures they are run before these other
25 initscripts run, even if checkroot and checkfs aren't directly listed to
26 be run, themselves. Again, this is covered in the handbook, if you want
27 to better understand how and why it works that way.
28
29 Reason two is actually what's working here, however. Without it, it would
30 fall back to reason one above, but reason two is the actual mechanism in
31 play here. Unfortunately, this one is /not/ covered in the handbook, or
32 wasn't last I looked, anyway. However, it's a logical extension of reason
33 one, so understanding it makes following reason two easier.
34
35 As actually implemented by the /sbin/rc initscript (which is run
36 repeatedly by init, as configured in /etc/inittab, as part of the boot
37 process), certain scripts are considered "critical" to the boot process,
38 and thus, barring a local configuration that bypasses them, default to
39 being run directly by /sbin/rc as part of the boot process, regardless of
40 whether they are in the boot runlevel or not.
41
42 Take a look at the "get_critical_services" routine in /sbin/rc.
43 Basically, unless you have an /etc/runlevels/boot/.critical file, rc sets:
44
45 CRITICAL_SERVICES="checkroot modules checkfs localmount clock"
46
47 Those services are then started in exactly that order, directly by rc,
48 previous to running the boot runlevel, regardless of whether they are set
49 to be started by the boot runlevel or not.
50
51 If you have the modules you need to mount your automatically mounted
52 filesystems built into the kernel, you can eliminate modules from that
53 list. You can also try eliminating checkroot and checkfs, and localmount
54 in some cases, but the results won't always be quite what you expected.
55 Certain other services might not start in the expected order, or at all,
56 because stuff is missing that they depend on and assume is there.
57
58 With my system, I can safely list only checkroot and clock in my
59 /etc/runlevels/boot/.critical file. That works, altho I have checkfs and
60 localmount in the boot runlevel so they get run anyway -- they just
61 parallelize a bit better (I have RC_PARALLEL_STARTUP="yes" set in
62 /etc/conf.d/rc). However, if I remove checkroot or clock from the
63 .critical file, things don't work quite right -- they have to be there and
64 started by rc directly or the rest of the services in the boot runlevel
65 don't work as intended.
66
67
68 The question then occurs... Why are these services considered so
69 critical? In general, you will find your system remains much more stable
70 if you run checkroot and checkfs at boot every time, for your normally
71 mounted filesystems. The problem is that a hardware fault that would
72 cause a small problem, if caught by an fsck at the next boot, may end up
73 being a HUGE problem if the system is allowed to continue writing to that
74 filesystem as if nothing were wrong. A single cross-linked file can soon
75 become hundreds or thousands, as the metadata becomes increasingly
76 jumbled, until it's impossible to recover from without simply overwriting
77 it with a good backup. The problem may take weeks or months, even years,
78 to develop into a system stability compromising issue that's finally
79 noticed when something critical gets damaged. However, regularly running
80 those at-boot fscks ensures that doesn't happen. With a journaled
81 filesystem, it's not as if it takes hours to run those checks anyway. A
82 few extra seconds or a minute taken at boot, can save you a huge amount of
83 work later, because a small and initially insignificant error wasn't
84 caught until hundreds of files had been corrupted.
85
86 Of course, one is also expected to use fstab appropriately, turning off
87 fsck at boot for non-critical or not automounted filesystems. Here, I
88 have identical backup snapshots of all the filesystems I consider valuable
89 enough to want to retain. Those are not automounted, and are only written
90 to when I mkfs them and recopy over the data from the live filesystem
91 periodically as part of my backup routine. As such, there's no need to
92 fsck them at every boot, because they've most likely not even been touched
93 since the last boot, not written to, not read from, or even mounted.
94 Likewise, any partitions (like /tmp) that contain essentially throwaway
95 data, it's probably safe to skip the fsck, putting a zero in the
96 appropriate column of fstab.
97
98 For any partitions you depend on, however, while you can probably get away
99 with avoiding fsck at boot in the short term, to be safe, it's far better
100 just to do it. As mentioned by someone else, you can set ext3 partitions
101 to not fsck at every boot, if desired. That's a useful option. Set it to
102 every third boot, or every fifth, but don't turn it off entirely, at the
103 risk of not catching minor/insignificant damage until it's major and
104 causes you serious issues. Keep in mind that even a partition never
105 written to will develop "bit rot" over time, due to cosmic ray bitflipping
106 and the like. The reality is that on the single bit level hard drives
107 aren't nearly as reliable as we like to think they are. Awesome levels of
108 automated redundant information and error correction normally handle the
109 problems as they develop, correcting them behind the scenes. That's
110 normal and good, and generally suffices for partitions not normally
111 written to. However, once you start actively using a partition, writing
112 as well as reading, if one of those normally insignificant bitflips
113 happens in the wrong place, your write intended for one location on the
114 disk might end up at quite a different location. That's what automated
115 fscks at boot, even after proper shutdown, are designed to detect and
116 correct. Catch it early, and it's insignificant, background noise,
117 corrected by automated mechanisms such that you likely won't notice it at
118 all. Fail to do those automated boot-time fscks, and you are playing the
119 odds, risking your data. Setting the fscks to once every third boot is
120 still well within reasonable safety limits, Setting one in five should be
121 safe under normal conditions but is playing the odds a bit more. I'd not
122 recommend turning it off altogether, or setting it much less frequently
123 than one in five, as that's just undue risk, IMO. You may well have no
124 problems doing it that way for years, if ever. Another person may have
125 problems in a week or a month. It's up to you how much risk you want to
126 put your data at.
127
128 Meanwhile, back in the Gentoo init scripts, mandating checkroot and
129 checkfs as "critical" parts of the boot sequence remains the most sane
130 default. Gentoo provides the configurability to change those defaults for
131 those sysadmins that choose to do so, but setting anything else as the
132 default would simply not be the sane or responsible thing for Gentoo devs
133 to do.
134
135 --
136 Duncan - List replies preferred. No HTML msgs.
137 "Every nonfree program has a lord, a master --
138 and if you use the program, he is your master." Richard Stallman in
139 http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html
140
141
142 --
143 gentoo-amd64@g.o mailing list