Gentoo Archives: gentoo-dev

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-dev@l.g.o
Subject: [gentoo-dev] Re: rfc: Does OpenRC really need mount-ro
Date: Wed, 17 Feb 2016 02:20:26
Message-Id: pan$5775c$d5db16fe$36a3366$a29b8c4d@cox.net
In Reply to: Re: [gentoo-dev] rfc: Does OpenRC really need mount-ro by William Hubbs
1 William Hubbs posted on Tue, 16 Feb 2016 12:41:29 -0600 as excerpted:
2
3 > What I'm trying to figure out is, what to do about re-mounting file
4 > systems read-only.
5 >
6 > How does systemd do this? I didn't find an equivalent of the mount-ro
7 > service there.
8
9 For quite some time now, systemd has actually had a mechanism whereby the
10 main systemd process reexecs (with a pivot-root) the initr* systemd and
11 returns control to it during the shutdown process, thereby allowing a
12 more controlled shutdown than traditional init systems because the final
13 stages are actually running from the virtual-filesystem of the initr*,
14 such that after everything running on the main root is shutdown, the main
15 root itself can actually be unmounted, not just mounted read-only,
16 because there is literally nothing running on it any longer.
17
18 There's still a fallback to read-only mounting if an initr* isn't used or
19 if reinvoking the initr* version fails for some reason, but with an
20 initr*, when everything's working properly, while there are still some
21 bits of userspace running, they're no longer actually running off of the
22 main root, so main root can actually be unmounted much like any other
23 filesystem.
24
25 The process is explained a bit better in the copious blogposted systemd
26 documentation. Let's see if I can find a link...
27
28 OK, this isn't where I originally read about it, which IIRC was aimed
29 more at admins, while this is aimed at initr* devs, but that's probably a
30 good thing as it includes more specific detail...
31
32 https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/
33
34 And here's some more, this time in the storage daemon controlled root and
35 initr* context...
36
37 https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/
38
39
40 But... all that doesn't answer the original question directly, does it?
41 Where there's no return to initr*, how /does/ systemd handle read-only
42 mounting?
43
44 First, the nice ascii-diagram flow charts in the bootup (7) manpage may
45 be useful, in particular here, the shutdown diagram (tho IDK if you can
46 find such things useful or not??).
47
48 https://www.freedesktop.org/software/systemd/man/bootup.html
49
50 Here's the shutdown diagram described in words:
51
52 Initial shutdown is via two targets (as opposed to specific services),
53 shutdown.target, which conflicts with all (normal) system services
54 thereby shutting them down, and umount.target, which conflicts with file
55 mounts, swaps, cryptsetup device, etc. Here, we're obviously interested
56 in umount.target. Then after those two targets are reached, various low
57 level services are run or stopped, in ordered to reach final.target.
58 After final.target, the appropriate systemd-(reboot|poweroff|halt|kexec)
59 service is run, to hit the ultimate (reboot|poweroff|halt|kexec).target,
60 which of course is never actually evaluated, since the service actually
61 does the intended action.
62
63 The primary takeaway is that you might not be finding a specific systemd
64 remount-ro service, because it might be a target, defined in terms of
65 conflicts with mount units, etc, rather than a specific service.
66
67 Neither shutdown.target nor umount.target have any wants or requires by
68 default, but the various normal services and mount units conflict with
69 them, either via default or specifically, so are shut down before the
70 target can be reached.
71
72 final.target has the After=shutdown.target umount.target setting, so
73 won't be reached until they are reached.
74
75 The respective (reboot|poweroff|halt|kexec).target units Requires= and
76 After= their respective systemd-*.service units, and reboot and poweroff
77 (but not halt and kexec) have 30-minute timeouts after which they run
78 reboot-force or poweroff-force, respectively.
79
80 The respective systemd-(reboot|poweroff|halt|kexec).service units
81 Requires= and After= shutdown.target, umount.target and final.target, all
82 three, so won't be run until those complete. They simply
83 ExecStart=/usr/bin/systemctl --force their respective actions.
84
85 And here's what the systemd.special (7) manpage says about umount.target:
86
87 umount.target
88 A special target unit that umounts all mount and automount points
89 on system shutdown.
90
91 Mounts that shall be unmounted on system shutdown shall add
92 Conflicts dependencies to this unit for their mount unit,
93 which is implicitly done when DefaultDependencies=yes is set
94 (the default).
95
96 But that /still/ doesn't reveal what actually does the remount-ro, as
97 opposed to umount. I don't see that either, at the unit level, nor do I
98 see anything related to it in for instance my auto-generated from fstab
99 /run/systemd/generators/-.mount file or in the systemd-fstab-generator
100 (8) manpage.
101
102 Thus I must conclude that it's actually resolved in the mount-unit
103 conflicts handling in systemd's source code, itself.
104
105 And indeed... in systemd's tarball, we see in src/core/umount.c, in
106 mount_points_list_umount...
107
108 That the function actually remounts /everything/ (well, everything not in
109 a container) read-only, before actually trying to umount them. Indention
110 restandardized on two-space here, to avoid unnecessary wrapping as
111 posted. This is from systemd-228:
112
113 static int mount_points_list_umount(MountPoint **head, bool *changed, bool
114 log_error) {
115 MountPoint *m, *n;
116 int n_failed = 0;
117
118 assert(head);
119
120 LIST_FOREACH_SAFE(mount_point, m, n, *head) {
121
122 /* If we are in a container, don't attempt to
123 read-only mount anything as that brings no real
124 benefits, but might confuse the host, as we remount
125 the superblock here, not the bind mound. */
126 if (detect_container() <= 0) {
127 _cleanup_free_ char *options = NULL;
128 /* MS_REMOUNT requires that the data parameter
129 * should be the same from the original mount
130 * except for the desired changes. Since we want
131 * to remount read-only, we should filter out
132 * rw (and ro too, because it confuses the kernel) */
133 (void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL,
134 &options);
135
136 /* We always try to remount directories read-only
137 * first, before we go on and umount them.
138 *
139 * Mount points can be stacked. If a mount
140 * point is stacked below / or /usr, we
141 * cannot umount or remount it directly,
142 * since there is no way to refer to the
143 * underlying mount. There's nothing we can do
144 * about it for the general case, but we can
145 * do something about it if it is aliased
146 * somehwere else via a bind mount. If we
147 * explicitly remount the super block of that
148 * alias read-only we hence should be
149 * relatively safe regarding keeping the fs we
150 * can otherwise not see dirty. */
151 log_info("Remounting '%s' read-only with options '%s'.", m->path,
152 options);
153 (void) mount(NULL, m->path, NULL, MS_REMOUNT|MS_RDONLY, options);
154 }
155
156 /* Skip / and /usr since we cannot unmount that
157 * anyway, since we are running from it. They have
158 * already been remounted ro. */
159 if (path_equal(m->path, "/")
160 #ifndef HAVE_SPLIT_USR
161 || path_equal(m->path, "/usr")
162 #endif
163 )
164 continue;
165
166 /* Trying to umount. We don't force here since we rely
167 * on busy NFS and FUSE file systems to return EBUSY
168 * until we closed everything on top of them. */
169 log_info("Unmounting %s.", m->path);
170 if (umount2(m->path, 0) == 0) {
171 if (changed)
172 *changed = true;
173
174 mount_point_free(head, m);
175 } else if (log_error) {
176 log_warning_errno(errno, "Could not unmount %s: %m", m->path);
177 n_failed++;
178 }
179 }
180
181 return n_failed;
182 }
183
184
185 So the short answer ultimately is... Systemd has a single umount
186 function, which first does remount-ro, so it's actually remounting
187 (nearly) everything read-only, then tries umount.
188
189
190 Meanwhile, (semi-)answering the elsewhere implied question of why only
191 Linux needs the mount-ro service... I'm no BSD expert, but in my
192 wanderings I came across a remark that they didn't need it, because their
193 kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and-
194 remount-ro routine for anything that can't be unmounted, which Linux
195 lacks. They obviously consider this a Linux deficiency, but while I've
196 not come across the Linux reason for not doing it, an educated guess is
197 that it's considered putting policy into the kernel, and that's
198 considered a no-no, policy is userspace; the kernel simply enforces it as
199 directed (which is why kernel 2.4's devfs was removed for 2.6, to be
200 replaced with the userspace-based udev). Additionally, not kernel-
201 forcing the remount-ro bit does give developers a way to test results of
202 an uncontrolled shutdown, say on a specific testing filesystem only,
203 without exposing the rest of the system, which can still be shut down
204 normally, to it.
205
206 So on Linux userspace must do the final umounts and force-read-onlys,
207 because unlike the BSDs, the Linux kernel doesn't have builtin routines
208 that automatically force it, regardless of userspace.
209
210 But as others have said, on Linux the remount-ro is _definitely_
211 required, and "bad things _will_ happen" if it's not done. (Just how bad
212 depends on the filesystem and its mount options, and hardware, among
213 other things.)
214
215
216 Finally, one more thing to mention. On systems with magic-srq in the
217 kernel...
218
219 echo 0x30 > /proc/sys/kernel/sysrq
220
221 ... enables the sync (0x10) and remount-readonly (0x20) functions. (Of
222 course only do this at shutdown/reboot, as you don't want to disturb the
223 user's configured srq defaults in normal runtime.)
224
225 You can then force emergency sync (s) and remount-read-only (u) with...
226
227 echo s > /proc/sysrq-trigger
228 echo u > /proc/sysrq-trigger
229
230 As that's kernel emergency priority, it should force-sync and force
231 everything readonly (and quiesce mid-layer layer block devices such as md
232 and dm), even if it would normally refuse to do so due to files open for
233 writing. You might consider something like that as a fallback, if normal
234 mount-readonly fails. Of course it won't work if magic-srq functionality
235 isn't built into the kernel, but then you're no worse off than before,
236 and are far better off on kernels where it's supported, so it's certainly
237 worth considering. =:^)
238
239 --
240 Duncan - List replies preferred. No HTML msgs.
241 "Every nonfree program has a lord, a master --
242 and if you use the program, he is your master." Richard Stallman

Replies