1 |
Dieter Ries posted <200605081030.04439.clip2@×××.de>, excerpted below, on |
2 |
Mon, 08 May 2006 10:30:02 +0200: |
3 |
|
4 |
> I still dont understand why |
5 |
> Checking all filesystems |
6 |
> is running in the boot-up process without checkfs and checkroot in one of |
7 |
> the runlevels. |
8 |
|
9 |
There's two reasons for that. |
10 |
|
11 |
One, Gentoo has an initscript dependency system. If you had read the |
12 |
Working with Gentoo section of the handbook, you'd probably understand |
13 |
this a bit better. Unfortunately, many people apparently think the |
14 |
handbook is only for installation, and end up missing out on understanding |
15 |
a lot of the rest of Gentoo as covered in the rest of the handbook. |
16 |
Without that understanding, they are much less efficient at properly |
17 |
administrating their Gentoo system than they'd be otherwise, as they end |
18 |
up doing things the hard way, and making mistakes they'd not make had they |
19 |
read the documentation. Gentoo has a reputation for some of the best |
20 |
documentation in the community, so it's a shame when folks don't read it |
21 |
and end up doing things the hard way as a result. |
22 |
|
23 |
Anyway, what it amounts to is that other initscripts depend on checkfs and |
24 |
checkroot, so the system ensures they are run before these other |
25 |
initscripts run, even if checkroot and checkfs aren't directly listed to |
26 |
be run, themselves. Again, this is covered in the handbook, if you want |
27 |
to better understand how and why it works that way. |
28 |
|
29 |
Reason two is actually what's working here, however. Without it, it would |
30 |
fall back to reason one above, but reason two is the actual mechanism in |
31 |
play here. Unfortunately, this one is /not/ covered in the handbook, or |
32 |
wasn't last I looked, anyway. However, it's a logical extension of reason |
33 |
one, so understanding it makes following reason two easier. |
34 |
|
35 |
As actually implemented by the /sbin/rc initscript (which is run |
36 |
repeatedly by init, as configured in /etc/inittab, as part of the boot |
37 |
process), certain scripts are considered "critical" to the boot process, |
38 |
and thus, barring a local configuration that bypasses them, default to |
39 |
being run directly by /sbin/rc as part of the boot process, regardless of |
40 |
whether they are in the boot runlevel or not. |
41 |
|
42 |
Take a look at the "get_critical_services" routine in /sbin/rc. |
43 |
Basically, unless you have an /etc/runlevels/boot/.critical file, rc sets: |
44 |
|
45 |
CRITICAL_SERVICES="checkroot modules checkfs localmount clock" |
46 |
|
47 |
Those services are then started in exactly that order, directly by rc, |
48 |
previous to running the boot runlevel, regardless of whether they are set |
49 |
to be started by the boot runlevel or not. |
50 |
|
51 |
If you have the modules you need to mount your automatically mounted |
52 |
filesystems built into the kernel, you can eliminate modules from that |
53 |
list. You can also try eliminating checkroot and checkfs, and localmount |
54 |
in some cases, but the results won't always be quite what you expected. |
55 |
Certain other services might not start in the expected order, or at all, |
56 |
because stuff is missing that they depend on and assume is there. |
57 |
|
58 |
With my system, I can safely list only checkroot and clock in my |
59 |
/etc/runlevels/boot/.critical file. That works, altho I have checkfs and |
60 |
localmount in the boot runlevel so they get run anyway -- they just |
61 |
parallelize a bit better (I have RC_PARALLEL_STARTUP="yes" set in |
62 |
/etc/conf.d/rc). However, if I remove checkroot or clock from the |
63 |
.critical file, things don't work quite right -- they have to be there and |
64 |
started by rc directly or the rest of the services in the boot runlevel |
65 |
don't work as intended. |
66 |
|
67 |
|
68 |
The question then occurs... Why are these services considered so |
69 |
critical? In general, you will find your system remains much more stable |
70 |
if you run checkroot and checkfs at boot every time, for your normally |
71 |
mounted filesystems. The problem is that a hardware fault that would |
72 |
cause a small problem, if caught by an fsck at the next boot, may end up |
73 |
being a HUGE problem if the system is allowed to continue writing to that |
74 |
filesystem as if nothing were wrong. A single cross-linked file can soon |
75 |
become hundreds or thousands, as the metadata becomes increasingly |
76 |
jumbled, until it's impossible to recover from without simply overwriting |
77 |
it with a good backup. The problem may take weeks or months, even years, |
78 |
to develop into a system stability compromising issue that's finally |
79 |
noticed when something critical gets damaged. However, regularly running |
80 |
those at-boot fscks ensures that doesn't happen. With a journaled |
81 |
filesystem, it's not as if it takes hours to run those checks anyway. A |
82 |
few extra seconds or a minute taken at boot, can save you a huge amount of |
83 |
work later, because a small and initially insignificant error wasn't |
84 |
caught until hundreds of files had been corrupted. |
85 |
|
86 |
Of course, one is also expected to use fstab appropriately, turning off |
87 |
fsck at boot for non-critical or not automounted filesystems. Here, I |
88 |
have identical backup snapshots of all the filesystems I consider valuable |
89 |
enough to want to retain. Those are not automounted, and are only written |
90 |
to when I mkfs them and recopy over the data from the live filesystem |
91 |
periodically as part of my backup routine. As such, there's no need to |
92 |
fsck them at every boot, because they've most likely not even been touched |
93 |
since the last boot, not written to, not read from, or even mounted. |
94 |
Likewise, any partitions (like /tmp) that contain essentially throwaway |
95 |
data, it's probably safe to skip the fsck, putting a zero in the |
96 |
appropriate column of fstab. |
97 |
|
98 |
For any partitions you depend on, however, while you can probably get away |
99 |
with avoiding fsck at boot in the short term, to be safe, it's far better |
100 |
just to do it. As mentioned by someone else, you can set ext3 partitions |
101 |
to not fsck at every boot, if desired. That's a useful option. Set it to |
102 |
every third boot, or every fifth, but don't turn it off entirely, at the |
103 |
risk of not catching minor/insignificant damage until it's major and |
104 |
causes you serious issues. Keep in mind that even a partition never |
105 |
written to will develop "bit rot" over time, due to cosmic ray bitflipping |
106 |
and the like. The reality is that on the single bit level hard drives |
107 |
aren't nearly as reliable as we like to think they are. Awesome levels of |
108 |
automated redundant information and error correction normally handle the |
109 |
problems as they develop, correcting them behind the scenes. That's |
110 |
normal and good, and generally suffices for partitions not normally |
111 |
written to. However, once you start actively using a partition, writing |
112 |
as well as reading, if one of those normally insignificant bitflips |
113 |
happens in the wrong place, your write intended for one location on the |
114 |
disk might end up at quite a different location. That's what automated |
115 |
fscks at boot, even after proper shutdown, are designed to detect and |
116 |
correct. Catch it early, and it's insignificant, background noise, |
117 |
corrected by automated mechanisms such that you likely won't notice it at |
118 |
all. Fail to do those automated boot-time fscks, and you are playing the |
119 |
odds, risking your data. Setting the fscks to once every third boot is |
120 |
still well within reasonable safety limits, Setting one in five should be |
121 |
safe under normal conditions but is playing the odds a bit more. I'd not |
122 |
recommend turning it off altogether, or setting it much less frequently |
123 |
than one in five, as that's just undue risk, IMO. You may well have no |
124 |
problems doing it that way for years, if ever. Another person may have |
125 |
problems in a week or a month. It's up to you how much risk you want to |
126 |
put your data at. |
127 |
|
128 |
Meanwhile, back in the Gentoo init scripts, mandating checkroot and |
129 |
checkfs as "critical" parts of the boot sequence remains the most sane |
130 |
default. Gentoo provides the configurability to change those defaults for |
131 |
those sysadmins that choose to do so, but setting anything else as the |
132 |
default would simply not be the sane or responsible thing for Gentoo devs |
133 |
to do. |
134 |
|
135 |
-- |
136 |
Duncan - List replies preferred. No HTML msgs. |
137 |
"Every nonfree program has a lord, a master -- |
138 |
and if you use the program, he is your master." Richard Stallman in |
139 |
http://www.linuxdevcenter.com/pub/a/linux/2004/12/22/rms_interview.html |
140 |
|
141 |
|
142 |
-- |
143 |
gentoo-amd64@g.o mailing list |