Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: /run not mounting on 3.7.10 kernel
Date: Tue, 30 Apr 2013 11:45:23
Message-Id: pan$89a92$184bfd53$ce36a5b8$963355cd@cox.net
In Reply to: [gentoo-amd64] /run not mounting on 3.7.10 kernel by Daiajo Tibdixious
1 Daiajo Tibdixious posted on Tue, 30 Apr 2013 18:30:36 +1000 as excerpted:
2
3 > During the startup on 3.7.10 /run fails to mount with this error:
4 > mount: wrong fs type, bad option, bad superblock on tmpfs, missing
5 > codepage or helper program, or other error
6 >
7 > Googling shows many people getting this error, and its something to do
8 > with openrc and moving from /var/run to /run.
9 >
10 > What I can't understand is why I can boot 3.7.9 without any problems,
11 > while 3.7.10 bombs. I have DEVTMPFS enabled in the kernel.
12 > Obviously there is something else wrong with my 3.7.10 kernel but I just
13 > can't figure what.
14 >
15 > Is /run supposed to be a physical directory? I thought it was supposed
16 > to be in ram. I got
17 > lrwxrwxrwx 1 root root 4 Dec 5 14:13 /var/run -> /run
18 > drwxr-xr-x 11 root root 360 Apr 30 18:17 /run
19 >
20 > Before the last upgrade of openrc (to 0.11.8) my 3.7.10 kernel was
21 > working fine.
22
23 Well, 3.7 is several months ago history here, as I'm running mainline-
24 linus-git and just rebooted to 3.9.0 yesterday, and I'm running live-git
25 openrc-9999 as well, but 0.11.8 is the only release in-tree, so I guess
26 it can't be too outdated, tho it does date from early December (07, Pearl
27 Harbor day). But...
28
29 Looking at kernel.org, first thing I note is that 3.7.10 is EOL for 3.7,
30 so you should be thinking about updating anyway... But a definitely-non-
31 kernel-coder look at its changelog...
32
33 There's one tmpfs commit listed for 3.7.10, and no block-layer or similar
34 commits that look like they might trigger it, so a first-guess is that
35 it's that tmpfs commit (the mentioned mpol option is numa related):
36
37 commit 95558dce307f5ac203cdd15192b8d9f028c0b6c4
38 Author: Greg Thelen <gthelen@××××××.com>
39 Date: Fri Feb 22 16:36:01 2013 -0800
40
41 tmpfs: fix use-after-free of mempolicy object
42
43 commit 5f00110f7273f9ff04ac69a5f85bb535a4fd0987 upstream.
44
45 The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
46 option is not specified in the remount request. A new policy can be
47 specified if mpol=M is given.
48
49 Before this patch remounting an mpol bound tmpfs without specifying
50 mpol= mount option in the remount request would set the filesystem's
51 mempolicy object to a freed mempolicy object.
52
53 [snip the reproducer and panic, you can look it up if curious]
54
55 Non-debug kernels will not crash immediately because referencing the
56 dangling mpol will not cause a fault. Instead the filesystem will
57 reference a freed mempolicy object, which will cause unpredictable
58 behavior.
59
60 The problem boils down to a dropped mpol reference below if
61 shmem_parse_options() does not allocate a new mpol:
62
63 config = *sbinfo
64 shmem_parse_options(data, &config, true)
65 mpol_put(sbinfo->mpol)
66 sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */
67
68 This patch avoids the crash by not releasing the mempolicy if
69 shmem_parse_options() doesn't create a new mpol.
70
71 How far back does this issue go? I see it in both 2.6.36 and 3.3. I
72 did not look back further.
73
74 Signed-off-by: Greg Thelen <gthelen@××××××.com>
75 Acked-by: Hugh Dickins <hughd@××××××.com>
76 Signed-off-by: Andrew Morton <akpm@××××××××××××××××.org>
77 Signed-off-by: Linus Torvalds <torvalds@××××××××××××××××.org>
78 Signed-off-by: Greg Kroah-Hartman <gregkh@×××××××××××××××.org>
79
80 FWIW, that "commit upstream" appears as v3.9-rc1~99^2~8 according to git
81 name-rev. ~N means Nth generation ancestor ^P refers to parent P, so we
82 99 commits previous to 3.9-rc1, 8 commits previous to that on the second
83 parent side. Or in plainer language, the first (mainline) tagged version
84 it appeared in was 3.9-rc1, with the commit obviously appearing in-tree
85 before that but after 3.8.0.
86
87 Which is to say it's about 8 weeks old in mainline, appearing in the
88 pre-3.9-rc1 commit window.
89
90 So... given that I'm running a git kernel anyway, my reaction here would
91 be to try reverting that patch and seeing if that fixed it. If not, I'd
92 given that we know 3.7.9 was fine and 3.7.10 wasn't, it's a 100%
93 reproducer for you, and there's a very limited number of commits between
94 the two, a git bisect would be child's play. (Well, for a child knowing
95 git anyway...)
96
97 Meanwhile, are you running a NUMA system? My first amd64 system was a
98 dual socket Opteron, so NUMA, tho my current system isn't. Do you mount
99 /run using custom mount-options or just take the default openrc options?
100 Here, I'm using the following (from fstab):
101
102 run /run tmpfs size=2m,nodev,nosuid,noexec,noauto,nr_inodes=4k 0 0
103
104 Oh, do you build tmpfs as a module or build it into the kernel
105 (monolithic)? If it's a module, maybe the kernel either can't find the
106 tmpfs module for some reason, or is confused somehow about what to load?
107 If so, could you try building in tmpfs and see if that changes things?
108
109 As to your question about what /run is supposed to be... what did you
110 /expect/ ls to show? It's a directory in the filesystem, so ls has to
111 show a directory, regardless of whether it's memory or local-disk or
112 network or ... that backs the filesystem.
113
114 There does have to be a physical /run directory on / to serve as a
115 mountpoint, so the directory must exist as a physical directory on /
116 before /run is mounted.
117
118 After mount, since tmpfs is still a filesystem, only in memory, it'll
119 still /appear/ as a directory to ls, even tho it's in memory. To see
120 what it actually is, you can use df /run, which tells you what filesystem
121 it's on, followed by grep <filesystem> /proc/mounts, to give you what the
122 kernel thinks about that filesystem and its mount options. Here:
123
124 $ ls -dl /run
125 drwxrwxrwt 7 root root 360 Apr 29 16:05 /run/
126
127 $ df /run
128 Filesystem Size Used Avail Use% Mounted on
129 run 2.0M 724K 1.3M 36% /run
130
131 $ grep run /proc/mounts
132 run /run tmpfs rw,nosuid,nodev,noexec,relatime,size=2048k,nr_inodes=4096
133 0 0
134
135 (FWIW, there's another unrelated run entry that shows in my grep as well,
136 the bind-mount for my chrooted named server. I didn't post that.)
137
138 As you can see ls shows /run as a normal dir... as it should since it's a
139 filesystem, that it happens to be a filesystem in memory doesn't matter
140 to ls. But df shows it as its own filesystem, and a grep of /proc/mounts
141 tells me what filesystem type (tmpfs, so it's in memory because that's
142 where tmpfs creates its filesystem) as well as the options the kernel
143 used to mount that filesystem.
144
145 Meanwhile, another way to tackle the problem, since you know the openrc
146 version where it shows up as well, would be to bisect openrc instead of
147 the kernel. That'd tell you exactly what openrc commit was the problem.
148 If necessary you could do both bisects, getting an even better picture of
149 what was triggering it, from both the kernel and openrc sides. Of course
150 that'd be easier if you knew which version of openrc you were running
151 previously, hopefully 0.11.7.x, as that'd give you the least commits to
152 bisect down.
153
154
155 --
156 Duncan - List replies preferred. No HTML msgs.
157 "Every nonfree program has a lord, a master --
158 and if you use the program, he is your master." Richard Stallman

Replies

Subject Author
[gentoo-amd64] Re: /run not mounting on 3.7.10 kernel Duncan <1i5t5.duncan@×××.net>