[gentoo-dev] Re: rfc: Does OpenRC really need mount-ro - gentoo-dev

From:	Duncan <1i5t5.duncan@×××.net>
To:	gentoo-dev@l.g.o
Subject:	[gentoo-dev] Re: rfc: Does OpenRC really need mount-ro
Date:	Wed, 17 Feb 2016 02:20:26
Message-Id:	`pan$5775c$d5db16fe$36a3366$a29b8c4d@cox.net`
In Reply to:	Re: [gentoo-dev] rfc: Does OpenRC really need mount-ro by William Hubbs

1

William Hubbs posted on Tue, 16 Feb 2016 12:41:29 -0600 as excerpted:

2

3

> What I'm trying to figure out is, what to do about re-mounting file

4

> systems read-only.

5

>

6

> How does systemd do this? I didn't find an equivalent of the mount-ro

7

> service there.

8

9

For quite some time now, systemd has actually had a mechanism whereby the 

10

main systemd process reexecs (with a pivot-root) the initr* systemd and 

11

returns control to it during the shutdown process, thereby allowing a 

12

more controlled shutdown than traditional init systems because the final 

13

stages are actually running from the virtual-filesystem of the initr*, 

14

such that after everything running on the main root is shutdown, the main 

15

root itself can actually be unmounted, not just mounted read-only, 

16

because there is literally nothing running on it any longer.

17

18

There's still a fallback to read-only mounting if an initr* isn't used or 

19

if reinvoking the initr* version fails for some reason, but with an 

20

initr*, when everything's working properly, while there are still some 

21

bits of userspace running, they're no longer actually running off of the 

22

main root, so main root can actually be unmounted much like any other 

23

filesystem.

24

25

The process is explained a bit better in the copious blogposted systemd 

26

documentation.  Let's see if I can find a link...

27

28

OK, this isn't where I originally read about it, which IIRC was aimed 

29

more at admins, while this is aimed at initr* devs, but that's probably a 

30

good thing as it includes more specific detail...

31

32

https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/

33

34

And here's some more, this time in the storage daemon controlled root and 

35

initr* context...

36

37

https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/

38

39

40

But... all that doesn't answer the original question directly, does it?  

41

Where there's no return to initr*, how /does/ systemd handle read-only 

42

mounting?

43

44

First, the nice ascii-diagram flow charts in the bootup (7) manpage may 

45

be useful, in particular here, the shutdown diagram (tho IDK if you can 

46

find such things useful or not??).

47

48

https://www.freedesktop.org/software/systemd/man/bootup.html

49

50

Here's the shutdown diagram described in words:

51

52

Initial shutdown is via two targets (as opposed to specific services), 

53

shutdown.target, which conflicts with all (normal) system services 

54

thereby shutting them down, and umount.target, which conflicts with file 

55

mounts, swaps, cryptsetup device, etc.  Here, we're obviously interested 

56

in umount.target.  Then after those two targets are reached, various low 

57

level services are run or stopped, in ordered to reach final.target.  

58

After final.target, the appropriate systemd-(reboot|poweroff|halt|kexec) 

59

service is run, to hit the ultimate (reboot|poweroff|halt|kexec).target, 

60

which of course is never actually evaluated, since the service actually 

61

does the intended action.

62

63

The primary takeaway is that you might not be finding a specific systemd 

64

remount-ro service, because it might be a target, defined in terms of 

65

conflicts with mount units, etc, rather than a specific service.

66

67

Neither shutdown.target nor umount.target have any wants or requires by 

68

default, but the various normal services and mount units conflict with 

69

them, either via default or specifically, so are shut down before the 

70

target can be reached.

71

72

final.target has the After=shutdown.target umount.target setting, so 

73

won't be reached until they are reached.

74

75

The respective (reboot|poweroff|halt|kexec).target units Requires= and 

76

After= their respective systemd-*.service units, and reboot and poweroff 

77

(but not halt and kexec) have 30-minute timeouts after which they run 

78

reboot-force or poweroff-force, respectively.

79

80

The respective systemd-(reboot|poweroff|halt|kexec).service units 

81

Requires= and After= shutdown.target, umount.target and final.target, all 

82

three, so won't be run until those complete.  They simply 

83

ExecStart=/usr/bin/systemctl --force their respective actions.

84

85

And here's what the systemd.special (7) manpage says about umount.target:

86

87

  umount.target

88

    A special target unit that umounts all mount and automount points

89

    on system shutdown.

90

91

    Mounts that shall be unmounted on system shutdown shall add

92

    Conflicts dependencies to this unit for their mount unit,

93

    which is implicitly done when DefaultDependencies=yes is set

94

    (the default).

95

96

But that /still/ doesn't reveal what actually does the remount-ro, as 

97

opposed to umount.  I don't see that either, at the unit level, nor do I 

98

see anything related to it in for instance my auto-generated from fstab 

99

/run/systemd/generators/-.mount file or in the systemd-fstab-generator 

100

(8) manpage.

101

102

Thus I must conclude that it's actually resolved in the mount-unit 

103

conflicts handling in systemd's source code, itself.

104

105

And indeed... in systemd's tarball, we see in src/core/umount.c, in 

106

mount_points_list_umount...

107

108

That the function actually remounts /everything/ (well, everything not in 

109

a container) read-only, before actually trying to umount them.  Indention 

110

restandardized on two-space here, to avoid unnecessary wrapping as 

111

posted.  This is from systemd-228:

112

113

static int mount_points_list_umount(MountPoint **head, bool *changed, bool 

114

log_error) {

115

  MountPoint *m, *n;

116

  int n_failed = 0;

117

118

  assert(head);

119

120

  LIST_FOREACH_SAFE(mount_point, m, n, *head) {

121

122

    /* If we are in a container, don't attempt to

123

       read-only mount anything as that brings no real

124

       benefits, but might confuse the host, as we remount

125

       the superblock here, not the bind mound. */

126

    if (detect_container() <= 0)  {

127

      _cleanup_free_ char *options = NULL;

128

      /* MS_REMOUNT requires that the data parameter

129

       * should be the same from the original mount

130

       * except for the desired changes. Since we want

131

       * to remount read-only, we should filter out

132

       * rw (and ro too, because it confuses the kernel) */

133

      (void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL, 

134

&options);

135

136

      /* We always try to remount directories read-only

137

       * first, before we go on and umount them.

138

*

139

       * Mount points can be stacked. If a mount

140

       * point is stacked below / or /usr, we

141

       * cannot umount or remount it directly,

142

       * since there is no way to refer to the

143

       * underlying mount. There's nothing we can do

144

       * about it for the general case, but we can

145

       * do something about it if it is aliased

146

       * somehwere else via a bind mount. If we

147

       * explicitly remount the super block of that

148

       * alias read-only we hence should be

149

       * relatively safe regarding keeping the fs we

150

       * can otherwise not see dirty. */

151

      log_info("Remounting '%s' read-only with options '%s'.", m->path, 

152

options);

153

      (void) mount(NULL, m->path, NULL, MS_REMOUNT|MS_RDONLY, options);

154

}

155

156

    /* Skip / and /usr since we cannot unmount that

157

     * anyway, since we are running from it. They have

158

     * already been remounted ro. */

159

    if (path_equal(m->path, "/")

160

#ifndef HAVE_SPLIT_USR

161

      || path_equal(m->path, "/usr")

162

#endif

163

)

164

      continue;

165

166

    /* Trying to umount. We don't force here since we rely

167

        * on busy NFS and FUSE file systems to return EBUSY

168

        * until we closed everything on top of them. */

169

    log_info("Unmounting %s.", m->path);

170

    if (umount2(m->path, 0) == 0) {

171

      if (changed)

172

        *changed = true;

173

174

      mount_point_free(head, m);

175

    } else if (log_error) {

176

      log_warning_errno(errno, "Could not unmount %s: %m", m->path);

177

      n_failed++;

178

}

179

}

180

181

  return n_failed;

182

}

183

184

185

So the short answer ultimately is... Systemd has a single umount 

186

function, which first does remount-ro, so it's actually remounting 

187

(nearly) everything read-only, then tries umount.

188

189

190

Meanwhile, (semi-)answering the elsewhere implied question of why only 

191

Linux needs the mount-ro service...  I'm no BSD expert, but in my 

192

wanderings I came across a remark that they didn't need it, because their 

193

kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and-

194

remount-ro routine for anything that can't be unmounted, which Linux 

195

lacks.  They obviously consider this a Linux deficiency, but while I've 

196

not come across the Linux reason for not doing it, an educated guess is 

197

that it's considered putting policy into the kernel, and that's 

198

considered a no-no, policy is userspace; the kernel simply enforces it as 

199

directed (which is why kernel 2.4's devfs was removed for 2.6, to be 

200

replaced with the userspace-based udev).  Additionally, not kernel-

201

forcing the remount-ro bit does give developers a way to test results of 

202

an uncontrolled shutdown, say on a specific testing filesystem only, 

203

without exposing the rest of the system, which can still be shut down 

204

normally, to it.

205

206

So on Linux userspace must do the final umounts and force-read-onlys, 

207

because unlike the BSDs, the Linux kernel doesn't have builtin routines 

208

that automatically force it, regardless of userspace.

209

210

But as others have said, on Linux the remount-ro is _definitely_ 

211

required, and "bad things _will_ happen" if it's not done.  (Just how bad 

212

depends on the filesystem and its mount options, and hardware, among 

213

other things.)

214

215

216

Finally, one more thing to mention.  On systems with magic-srq in the 

217

kernel...

218

219

echo 0x30 > /proc/sys/kernel/sysrq

220

221

... enables the sync (0x10) and remount-readonly (0x20) functions.  (Of 

222

course only do this at shutdown/reboot, as you don't want to disturb the 

223

user's configured srq defaults in normal runtime.)

224

225

You can then force emergency sync (s) and remount-read-only (u) with...

226

227

echo s > /proc/sysrq-trigger

228

echo u > /proc/sysrq-trigger

229

230

As that's kernel emergency priority, it should force-sync and force 

231

everything readonly (and quiesce mid-layer layer block devices such as md 

232

and dm), even if it would normally refuse to do so due to files open for 

233

writing.  You might consider something like that as a fallback, if normal 

234

mount-readonly fails.  Of course it won't work if magic-srq functionality 

235

isn't built into the kernel, but then you're no worse off than before, 

236

and are far better off on kernels where it's supported, so it's certainly 

237

worth considering. =:^)

238

239

--

240

Duncan - List replies preferred.   No HTML msgs.

241

"Every nonfree program has a lord, a master --

242

and if you use the program, he is your master."  Richard Stallman

Subject	Author
Re: [gentoo-dev] Re: rfc: Does OpenRC really need mount-ro	Rich Freeman <rich0@g.o>
Re: [gentoo-dev] Re: rfc: Does OpenRC really need mount-ro	Richard Yao <ryao@g.o>

Gentoo Archives: gentoo-dev

Replies

1	William Hubbs posted on Tue, 16 Feb 2016 12:41:29 -0600 as excerpted:
2
3	> What I'm trying to figure out is, what to do about re-mounting file
4	> systems read-only.
5	>
6	> How does systemd do this? I didn't find an equivalent of the mount-ro
7	> service there.
8
9	For quite some time now, systemd has actually had a mechanism whereby the
10	main systemd process reexecs (with a pivot-root) the initr* systemd and
11	returns control to it during the shutdown process, thereby allowing a
12	more controlled shutdown than traditional init systems because the final
13	stages are actually running from the virtual-filesystem of the initr*,
14	such that after everything running on the main root is shutdown, the main
15	root itself can actually be unmounted, not just mounted read-only,
16	because there is literally nothing running on it any longer.
17
18	There's still a fallback to read-only mounting if an initr* isn't used or
19	if reinvoking the initr* version fails for some reason, but with an
20	initr*, when everything's working properly, while there are still some
21	bits of userspace running, they're no longer actually running off of the
22	main root, so main root can actually be unmounted much like any other
23	filesystem.
24
25	The process is explained a bit better in the copious blogposted systemd
26	documentation. Let's see if I can find a link...
27
28	OK, this isn't where I originally read about it, which IIRC was aimed
29	more at admins, while this is aimed at initr* devs, but that's probably a
30	good thing as it includes more specific detail...
31
32	https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/
33
34	And here's some more, this time in the storage daemon controlled root and
35	initr* context...
36
37	https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/
38
39
40	But... all that doesn't answer the original question directly, does it?
41	Where there's no return to initr*, how /does/ systemd handle read-only
42	mounting?
43
44	First, the nice ascii-diagram flow charts in the bootup (7) manpage may
45	be useful, in particular here, the shutdown diagram (tho IDK if you can
46	find such things useful or not??).
47
48	https://www.freedesktop.org/software/systemd/man/bootup.html
49
50	Here's the shutdown diagram described in words:
51
52	Initial shutdown is via two targets (as opposed to specific services),
53	shutdown.target, which conflicts with all (normal) system services
54	thereby shutting them down, and umount.target, which conflicts with file
55	mounts, swaps, cryptsetup device, etc. Here, we're obviously interested
56	in umount.target. Then after those two targets are reached, various low
57	level services are run or stopped, in ordered to reach final.target.
58	After final.target, the appropriate systemd-(reboot\|poweroff\|halt\|kexec)
59	service is run, to hit the ultimate (reboot\|poweroff\|halt\|kexec).target,
60	which of course is never actually evaluated, since the service actually
61	does the intended action.
62
63	The primary takeaway is that you might not be finding a specific systemd
64	remount-ro service, because it might be a target, defined in terms of
65	conflicts with mount units, etc, rather than a specific service.
66
67	Neither shutdown.target nor umount.target have any wants or requires by
68	default, but the various normal services and mount units conflict with
69	them, either via default or specifically, so are shut down before the
70	target can be reached.
71
72	final.target has the After=shutdown.target umount.target setting, so
73	won't be reached until they are reached.
74
75	The respective (reboot\|poweroff\|halt\|kexec).target units Requires= and
76	After= their respective systemd-*.service units, and reboot and poweroff
77	(but not halt and kexec) have 30-minute timeouts after which they run
78	reboot-force or poweroff-force, respectively.
79
80	The respective systemd-(reboot\|poweroff\|halt\|kexec).service units
81	Requires= and After= shutdown.target, umount.target and final.target, all
82	three, so won't be run until those complete. They simply
83	ExecStart=/usr/bin/systemctl --force their respective actions.
84
85	And here's what the systemd.special (7) manpage says about umount.target:
86
87	umount.target
88	A special target unit that umounts all mount and automount points
89	on system shutdown.
90
91	Mounts that shall be unmounted on system shutdown shall add
92	Conflicts dependencies to this unit for their mount unit,
93	which is implicitly done when DefaultDependencies=yes is set
94	(the default).
95
96	But that /still/ doesn't reveal what actually does the remount-ro, as
97	opposed to umount. I don't see that either, at the unit level, nor do I
98	see anything related to it in for instance my auto-generated from fstab
99	/run/systemd/generators/-.mount file or in the systemd-fstab-generator
100	(8) manpage.
101
102	Thus I must conclude that it's actually resolved in the mount-unit
103	conflicts handling in systemd's source code, itself.
104
105	And indeed... in systemd's tarball, we see in src/core/umount.c, in
106	mount_points_list_umount...
107
108	That the function actually remounts /everything/ (well, everything not in
109	a container) read-only, before actually trying to umount them. Indention
110	restandardized on two-space here, to avoid unnecessary wrapping as
111	posted. This is from systemd-228:
112
113	static int mount_points_list_umount(MountPoint *head, bool changed, bool
114	log_error) {
115	MountPoint m, n;
116	int n_failed = 0;
117
118	assert(head);
119
120	LIST_FOREACH_SAFE(mount_point, m, n, *head) {
121
122	/* If we are in a container, don't attempt to
123	read-only mount anything as that brings no real
124	benefits, but might confuse the host, as we remount
125	the superblock here, not the bind mound. */
126	if (detect_container() <= 0) {
127	_cleanup_free_ char *options = NULL;
128	/* MS_REMOUNT requires that the data parameter
129	* should be the same from the original mount
130	* except for the desired changes. Since we want
131	* to remount read-only, we should filter out
132	* rw (and ro too, because it confuses the kernel) */
133	(void) fstab_filter_options(m->options, "rw\0ro\0", NULL, NULL,
134	&options);
135
136	/* We always try to remount directories read-only
137	* first, before we go on and umount them.
138	*
139	* Mount points can be stacked. If a mount
140	* point is stacked below / or /usr, we
141	* cannot umount or remount it directly,
142	* since there is no way to refer to the
143	* underlying mount. There's nothing we can do
144	* about it for the general case, but we can
145	* do something about it if it is aliased
146	* somehwere else via a bind mount. If we
147	* explicitly remount the super block of that
148	* alias read-only we hence should be
149	* relatively safe regarding keeping the fs we
150	* can otherwise not see dirty. */
151	log_info("Remounting '%s' read-only with options '%s'.", m->path,
152	options);
153	(void) mount(NULL, m->path, NULL, MS_REMOUNT\|MS_RDONLY, options);
154	}
155
156	/* Skip / and /usr since we cannot unmount that
157	* anyway, since we are running from it. They have
158	* already been remounted ro. */
159	if (path_equal(m->path, "/")
160	#ifndef HAVE_SPLIT_USR
161	\|\| path_equal(m->path, "/usr")
162	#endif
163	)
164	continue;
165
166	/* Trying to umount. We don't force here since we rely
167	* on busy NFS and FUSE file systems to return EBUSY
168	* until we closed everything on top of them. */
169	log_info("Unmounting %s.", m->path);
170	if (umount2(m->path, 0) == 0) {
171	if (changed)
172	*changed = true;
173
174	mount_point_free(head, m);
175	} else if (log_error) {
176	log_warning_errno(errno, "Could not unmount %s: %m", m->path);
177	n_failed++;
178	}
179	}
180
181	return n_failed;
182	}
183
184
185	So the short answer ultimately is... Systemd has a single umount
186	function, which first does remount-ro, so it's actually remounting
187	(nearly) everything read-only, then tries umount.
188
189
190	Meanwhile, (semi-)answering the elsewhere implied question of why only
191	Linux needs the mount-ro service... I'm no BSD expert, but in my
192	wanderings I came across a remark that they didn't need it, because their
193	kernel reboot/halt/poweroff routines have a built-in kernelspace sync-and-
194	remount-ro routine for anything that can't be unmounted, which Linux
195	lacks. They obviously consider this a Linux deficiency, but while I've
196	not come across the Linux reason for not doing it, an educated guess is
197	that it's considered putting policy into the kernel, and that's
198	considered a no-no, policy is userspace; the kernel simply enforces it as
199	directed (which is why kernel 2.4's devfs was removed for 2.6, to be
200	replaced with the userspace-based udev). Additionally, not kernel-
201	forcing the remount-ro bit does give developers a way to test results of
202	an uncontrolled shutdown, say on a specific testing filesystem only,
203	without exposing the rest of the system, which can still be shut down
204	normally, to it.
205
206	So on Linux userspace must do the final umounts and force-read-onlys,
207	because unlike the BSDs, the Linux kernel doesn't have builtin routines
208	that automatically force it, regardless of userspace.
209
210	But as others have said, on Linux the remount-ro is _definitely_
211	required, and "bad things _will_ happen" if it's not done. (Just how bad
212	depends on the filesystem and its mount options, and hardware, among
213	other things.)
214
215
216	Finally, one more thing to mention. On systems with magic-srq in the
217	kernel...
218
219	echo 0x30 > /proc/sys/kernel/sysrq
220
221	... enables the sync (0x10) and remount-readonly (0x20) functions. (Of
222	course only do this at shutdown/reboot, as you don't want to disturb the
223	user's configured srq defaults in normal runtime.)
224
225	You can then force emergency sync (s) and remount-read-only (u) with...
226
227	echo s > /proc/sysrq-trigger
228	echo u > /proc/sysrq-trigger
229
230	As that's kernel emergency priority, it should force-sync and force
231	everything readonly (and quiesce mid-layer layer block devices such as md
232	and dm), even if it would normally refuse to do so due to files open for
233	writing. You might consider something like that as a fallback, if normal
234	mount-readonly fails. Of course it won't work if magic-srq functionality
235	isn't built into the kernel, but then you're no worse off than before,
236	and are far better off on kernels where it's supported, so it's certainly
237	worth considering. =:^)
238
239	--
240	Duncan - List replies preferred. No HTML msgs.
241	"Every nonfree program has a lord, a master --
242	and if you use the program, he is your master." Richard Stallman