1 |
William Hubbs posted on Tue, 16 Feb 2016 12:41:29 -0600 as excerpted: |
2 |
|
3 |
> What I'm trying to figure out is, what to do about re-mounting file |
4 |
> systems read-only. |
5 |
> |
6 |
> How does systemd do this? I didn't find an equivalent of the mount-ro |
7 |
> service there. |
8 |
|
9 |
For quite some time now, systemd has actually had a mechanism whereby the |
10 |
main systemd process reexecs (with a pivot-root) the initr* systemd and |
11 |
returns control to it during the shutdown process, thereby allowing a |
12 |
more controlled shutdown than traditional init systems because the final |
13 |
stages are actually running from the virtual-filesystem of the initr*, |
14 |
such that after everything running on the main root is shutdown, the main |
15 |
root itself can actually be unmounted, not just mounted read-only, |
16 |
because there is literally nothing running on it any longer. |
17 |
|
18 |
There's still a fallback to read-only mounting if an initr* isn't used or |
19 |
if reinvoking the initr* version fails for some reason, but with an |
20 |
initr*, when everything's working properly, while there are still some |
21 |
bits of userspace running, they're no longer actually running off of the |
22 |
main root, so main root can actually be unmounted much like any other |
23 |
filesystem. |
24 |
|
25 |
The process is explained a bit better in the copious blogposted systemd |
26 |
documentation. Let's see if I can find a link... |
27 |
|
28 |
OK, this isn't where I originally read about it, which IIRC was aimed |
29 |
more at admins, while this is aimed at initr* devs, but that's probably a |
30 |
good thing as it includes more specific detail... |
31 |
|
32 |
https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/ |
33 |
|
34 |
And here's some more, this time in the storage daemon controlled root and |
35 |
initr* context... |
36 |
|
37 |
https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/ |
38 |
|
39 |
|
40 |
But... all that doesn't answer the original question directly, does it? |
41 |
Where there's no return to initr*, how /does/ systemd handle read-only |
42 |
mounting? |
43 |
|
44 |
First, the nice ascii-diagram flow charts in the bootup (7) manpage may |
45 |
be useful, in particular here, the shutdown diagram (tho IDK if you can |
46 |
find such things useful or not??). |
47 |
|
48 |
https://www.freedesktop.org/software/systemd/man/bootup.html |
49 |
|
50 |
Here's the shutdown diagram described in words: |
51 |
|
52 |
Initial shutdown is via two targets (as opposed to specific services), |
53 |
shutdown.target, which conflicts with all (normal) system services |
54 |
thereby shutting them down, and umount.target, which conflicts with file |
55 |
mounts, swaps, cryptsetup device, etc. Here, we're obviously interested |
56 |
in umount.target. Then after those two targets are reached, various low |
57 |
level services are run or stopped, in ordered to reach final.target. |
58 |
After final.target, the appropriate systemd-(reboot|poweroff|halt|kexec) |
59 |
service is run, to hit the ultimate (reboot|poweroff|halt|kexec).target, |
60 |
which of course is never actually evaluated, since the service actually |
61 |
does the intended action. |
62 |
|
63 |
The primary takeaway is that you might not be finding a specific systemd |
64 |
remount-ro service, because it might be a target, defined in terms of |
65 |
conflicts with mount units, etc, rather than a specific service. |
66 |
|
67 |
Neither shutdown.target nor umount.target have any wants or requires by |
68 |
default, but the various normal services and mount units conflict with |
69 |
them, either via default or specifically, so are shut down before the |
70 |
target can be reached. |
71 |
|
72 |
final.target has the After=shutdown.target umount.target setting, so |
73 |
won't be reached until they are reached. |
74 |
|
75 |
The respective (reboot|poweroff|halt|kexec).target units Requires= and |
76 |
After= their respective systemd-*.service units, and reboot and poweroff |
77 |
(but not halt and kexec) have 30-minute timeouts after which they run |
78 |
reboot-force or poweroff-force, respectively. |
79 |
|
80 |
The respective systemd-(reboot|poweroff|halt|kexec).service units |
81 |
Requires= and After= shutdown.target, umount.target and final.target, all |
82 |
three, so won't be run until those complete. They simply |
83 |
ExecStart=/usr/bin/systemctl --force their respective actions. |
84 |
|
85 |
And here's what the systemd.special (7) manpage says about umount.target: |
86 |
|
87 |
umount.target |
88 |
A special target unit that umounts all mount and automount points |
89 |
on system shutdown. |
90 |
|
91 |
Mounts that shall be unmounted on system shutdown shall add |
92 |
Conflicts dependencies to this unit for their mount unit, |
93 |
which is implicitly done when DefaultDependencies=yes is set |
94 |
(the default). |
95 |
|
96 |
But that /still/ doesn't reveal what actually does the remount-ro, as |
97 |
opposed to umount. I don't see that either, at the unit level, nor do I |
98 |
see anything related to it in for instance my auto-generated from fstab |
99 |
/run/systemd/generators/-.mount file or in the systemd-fstab-generator |
100 |
(8) manpage. |
101 |
|
102 |
Thus I must conclude that it's actually resolved in the mount-unit |
103 |
conflicts handling in systemd's source code, itself. |
104 |
|
105 |
And indeed... in systemd's tarball, we see in src/core/umount.c, in |
106 |
mount_points_list_umount... |
107 |
|
108 |
That the function actually remounts /everything/ (well, everything not in |
109 |
a container) read-only, before actually trying to umount them. Indention |
110 |
restandardized on two-space here, to avoid unnecessary wrapping as |
111 |
posted. This is from systemd-228: |
112 |
|
113 |
static int mount_points_list_umount(MountPoint **head, bool *changed, bool |
114 |
log_error) { |
115 |
MountPoint *m, *n; |
116 |
int n_failed = 0; |
117 |
|
118 |
assert(head); |
119 |
|
120 |
LIST_FOREACH_SAFE(mount_point, m, n, *head) { |
121 |
|
122 |
/* If we are in a container, don't attempt to |
123 |
read-only mount anything as that brings no real |
124 |
benefits, but might confuse the host, as we remount |
125 |
the superblock here, not the bind mound. */ |
126 |
if (detect_container() <= 0) { |
127 |
_cleanup_free_ char *options = NULL; |
128 |
/* MS_REMOUNT requires that the data parameter |
129 |
* should be the same from the original mount |
130 |
* except for the desired changes. Since we want |
131 |
* to remount read-only, we should filter out |
132 |
* rw (and ro too, because it confuses the kernel) */ |
133 |
(void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL, |
134 |
&options); |
135 |
|
136 |
/* We always try to remount directories read-only |
137 |
* first, before we go on and umount them. |
138 |
* |
139 |
* Mount points can be stacked. If a mount |
140 |
* point is stacked below / or /usr, we |
141 |
* cannot umount or remount it directly, |
142 |
* since there is no way to refer to the |
143 |
* underlying mount. There's nothing we can do |
144 |
* about it for the general case, but we can |
145 |
* do something about it if it is aliased |
146 |
* somehwere else via a bind mount. If we |
147 |
* explicitly remount the super block of that |
148 |
* alias read-only we hence should be |
149 |
* relatively safe regarding keeping the fs we |
150 |
* can otherwise not see dirty. */ |
151 |
log_info("Remounting '%s' read-only with options '%s'.", m->path, |
152 |
options); |
153 |
(void) mount(NULL, m->path, NULL, MS_REMOUNT|MS_RDONLY, options); |
154 |
} |
155 |
|
156 |
/* Skip / and /usr since we cannot unmount that |
157 |
* anyway, since we are running from it. They have |
158 |
* already been remounted ro. */ |
159 |
if (path_equal(m->path, "/") |
160 |
#ifndef HAVE_SPLIT_USR |
161 |
|| path_equal(m->path, "/usr") |
162 |
#endif |
163 |
) |
164 |
continue; |
165 |
|
166 |
/* Trying to umount. We don't force here since we rely |
167 |
* on busy NFS and FUSE file systems to return EBUSY |
168 |
* until we closed everything on top of them. */ |
169 |
log_info("Unmounting %s.", m->path); |
170 |
if (umount2(m->path, 0) == 0) { |
171 |
if (changed) |
172 |
*changed = true; |
173 |
|
174 |
mount_point_free(head, m); |
175 |
} else if (log_error) { |
176 |
log_warning_errno(errno, "Could not unmount %s: %m", m->path); |
177 |
n_failed++; |
178 |
} |
179 |
} |
180 |
|
181 |
return n_failed; |
182 |
} |
183 |
|
184 |
|
185 |
So the short answer ultimately is... Systemd has a single umount |
186 |
function, which first does remount-ro, so it's actually remounting |
187 |
(nearly) everything read-only, then tries umount. |
188 |
|
189 |
|
190 |
Meanwhile, (semi-)answering the elsewhere implied question of why only |
191 |
Linux needs the mount-ro service... I'm no BSD expert, but in my |
192 |
wanderings I came across a remark that they didn't need it, because their |
193 |
kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and- |
194 |
remount-ro routine for anything that can't be unmounted, which Linux |
195 |
lacks. They obviously consider this a Linux deficiency, but while I've |
196 |
not come across the Linux reason for not doing it, an educated guess is |
197 |
that it's considered putting policy into the kernel, and that's |
198 |
considered a no-no, policy is userspace; the kernel simply enforces it as |
199 |
directed (which is why kernel 2.4's devfs was removed for 2.6, to be |
200 |
replaced with the userspace-based udev). Additionally, not kernel- |
201 |
forcing the remount-ro bit does give developers a way to test results of |
202 |
an uncontrolled shutdown, say on a specific testing filesystem only, |
203 |
without exposing the rest of the system, which can still be shut down |
204 |
normally, to it. |
205 |
|
206 |
So on Linux userspace must do the final umounts and force-read-onlys, |
207 |
because unlike the BSDs, the Linux kernel doesn't have builtin routines |
208 |
that automatically force it, regardless of userspace. |
209 |
|
210 |
But as others have said, on Linux the remount-ro is _definitely_ |
211 |
required, and "bad things _will_ happen" if it's not done. (Just how bad |
212 |
depends on the filesystem and its mount options, and hardware, among |
213 |
other things.) |
214 |
|
215 |
|
216 |
Finally, one more thing to mention. On systems with magic-srq in the |
217 |
kernel... |
218 |
|
219 |
echo 0x30 > /proc/sys/kernel/sysrq |
220 |
|
221 |
... enables the sync (0x10) and remount-readonly (0x20) functions. (Of |
222 |
course only do this at shutdown/reboot, as you don't want to disturb the |
223 |
user's configured srq defaults in normal runtime.) |
224 |
|
225 |
You can then force emergency sync (s) and remount-read-only (u) with... |
226 |
|
227 |
echo s > /proc/sysrq-trigger |
228 |
echo u > /proc/sysrq-trigger |
229 |
|
230 |
As that's kernel emergency priority, it should force-sync and force |
231 |
everything readonly (and quiesce mid-layer layer block devices such as md |
232 |
and dm), even if it would normally refuse to do so due to files open for |
233 |
writing. You might consider something like that as a fallback, if normal |
234 |
mount-readonly fails. Of course it won't work if magic-srq functionality |
235 |
isn't built into the kernel, but then you're no worse off than before, |
236 |
and are far better off on kernels where it's supported, so it's certainly |
237 |
worth considering. =:^) |
238 |
|
239 |
-- |
240 |
Duncan - List replies preferred. No HTML msgs. |
241 |
"Every nonfree program has a lord, a master -- |
242 |
and if you use the program, he is your master." Richard Stallman |