1 |
On Thu, Sep 15, 2011 at 11:00:47PM +0200, Joost Roeleveld wrote: |
2 |
> > See below on the existing udev retry queue that is hiding many of the |
3 |
> > issues from you. This hidden issues are also negatively affecting boot |
4 |
> > times (failures and retries take time). |
5 |
> I don't actually mind too much about the boot time. If there are retries like |
6 |
> this, I would expect this to be visible in the system logs. |
7 |
udev does not log rule failures to the best of my knowledge. |
8 |
|
9 |
> > The problem is that there is a bit of a catch-22 in running some udev |
10 |
> > rules: |
11 |
> > 0. You're going to have to declare interdependencies between ALL udev |
12 |
> > rules. This is because udev rules could be usable independently, or |
13 |
> > they could be interrelated (first rule sets some state variable or |
14 |
> > file, second one consumes it). |
15 |
> Either udev does this already and the execution sequence is always the same. |
16 |
> In which case my suggestion above would follow the same sequence as the queue |
17 |
> would be on a First-in First-out basis. |
18 |
> Or, if udev doesn't do this yet, udev will end up having the same problem. |
19 |
It doesn't do it presently. The worst case I've seen is having an early |
20 |
rule that succeeds, but gives different results w/ /var mounted vs. not |
21 |
mounted. This rule didn't actual fail, so it does not get retried... |
22 |
|
23 |
> > 1. While the binary invoked by a given rule might reside entirely on a |
24 |
> > mounting that is already available, how do you ensure that the other |
25 |
> > mountpoints required by said binary are ALSO available (the bluetooth |
26 |
> > and ALSA rules actually need /var, what if you have a bluetooth |
27 |
> > keyboard? [see footnote]). |
28 |
> This is why I would suggest the "actiond" process to be started after |
29 |
> localmount. |
30 |
That's too late. What about all the udev rules required to get the |
31 |
device nodes for localmount to succeed? |
32 |
|
33 |
Anyway, take your proposed split actiond/udev solution to the upstream |
34 |
udev list. I don't believe that we have the manpower to develop it here. |
35 |
If we did develop it here, I don't believe it will gain enough traction |
36 |
amongst other distros, and we'll be stuck supporting it. |
37 |
|
38 |
I personally don't think your split solution covers the usage cases well |
39 |
enough, but that's an actual decision best left to the upstream udev |
40 |
developers. Please take the discussion there, and don't pursue it on |
41 |
this list. |
42 |
|
43 |
> > The upstream discussions I've been party to previously (both on lists |
44 |
> > and in person), have been trying to avoid needing a full dependency |
45 |
> > system in udev, because it's a huge degree of additional complexity. |
46 |
> I don't see why it would not be possible to pause actioning of these scripts |
47 |
> till the boot-process says all required mounts are available. |
48 |
You still have to declare which scripts need to be paused, and/or which |
49 |
rules inside the scripts need to be paused. Even worse are cases where |
50 |
mounting some of localmount entries requires those scripts to have |
51 |
completed. |
52 |
|
53 |
> I see this as a "solution" for the situation where someone decides to use |
54 |
> less-common hardware and force this solution onto everyone else. |
55 |
I'm trying to get as little forced on us as possible. Gentoo is one of |
56 |
the few distros at this point that doesn't already require initramfs. |
57 |
While we're going to carry on supporting not requiring an initramfs as |
58 |
long as possible (and documenting what is needed for that), we also |
59 |
don't want this to become a stumbling block for users that just want |
60 |
their system to work, and with how upstream udev and other projects are |
61 |
going, there is a real possibility of breakage caused by their code, far |
62 |
more than the potential of breakage because /usr failed to mount. |
63 |
|
64 |
> If I would want to put my /usr filesystem on a bluetooth harddrive (for |
65 |
> instance my mobile phone), then I would not expect to have this work without a |
66 |
> lot of extra effort. |
67 |
While that is in the realms of extreme, having /usr or /var on NFS isn't |
68 |
extreme at all. |
69 |
|
70 |
> > udev has a retry queue already, see udev-postmount: |
71 |
> > === |
72 |
> > # Run the events that failed at first udev trigger |
73 |
> > udevadm trigger --type=failed -v |
74 |
> > === |
75 |
> This is a retry-queue. That's a good start already, but why not redo this |
76 |
> queue and put ALL the scripts into that queue untill after localmount? |
77 |
See above, about rules that are required for localmount to be able to |
78 |
complete. The most prevalent ones would probably be devices by-uuid and |
79 |
by-label. Those symlinks come from udev... |
80 |
|
81 |
> > > With a smaller udev, the chances of it failing should also be less. |
82 |
> > > (less |
83 |
> > > code-lines to check) and as long as the /dev-entries are created, these |
84 |
> > > can be used to manually mount the other partitions to get to the point |
85 |
> > > where the system can be fixed to get it back to a workable state. |
86 |
> > |
87 |
> > The problem is NOT in the udev codebase. It's in udev rules. Even at the |
88 |
> > rule level, it's mostly rules for packages other than udev itself. |
89 |
> |
90 |
> Yes, but as I already stated, the problem-rules do not exist on all systems. |
91 |
> My systems for instance don't have any pointing to anything other then |
92 |
> /etc/... |
93 |
> These scripts also don't call anything that isn't mounted at the time. |
94 |
Does your desktop use ALSA? |
95 |
/lib/udev/rules.d/90-alsa-restore.rules |
96 |
invokes |
97 |
"/usr/sbin/alsactl restore ..." |
98 |
Which in turn reads from /var/lib/alsa/asound.state. |
99 |
|
100 |
We presently have the restore() function in /etc/init.d/alsasound that |
101 |
repeats this, because that rule fails to work often during boot |
102 |
(non-existence of the state file causes it to use built-in defaults |
103 |
instead). |
104 |
|
105 |
udev runs that rule as soon as the hardware turns up, which is often |
106 |
before localmount. |
107 |
|
108 |
> That system has been running without incident for several years. Why do I |
109 |
> suddenly have to make that system more complex? |
110 |
Just because there are no visible errors, doesn't mean that they don't |
111 |
exist. This move to encourage initramfs is to ensure that there isn't |
112 |
any major breakage incidents soon. What if udev upstream suddenely |
113 |
starts hard requiring /usr to mounted, and not doing retries at all. |
114 |
How many systems are going to break, and users going to complain about |
115 |
needing to use livecds to recover? |
116 |
|
117 |
> |
118 |
> > > If, in the currently planned form, udev fails, it will be necessary to |
119 |
> > > use a rescue-cd/usb to boot the system, try to fix it inside a chroot |
120 |
> > > where it's not easy to check what is actually going wrong during the |
121 |
> > > boot-process as the different tools can then not be run with the |
122 |
> > > error-messages printed to the console. |
123 |
> > |
124 |
> > No, you're gotten the failure case wrong. Ok, so take the minimal |
125 |
> > initramfs as I proposed on this list as the "working" case. Let's say |
126 |
> > for some reason the initramfs doesn't load at all, so you have only / |
127 |
> > mounted when you go into the rootfs init. |
128 |
> > |
129 |
> > If you had a setup that was complex enough to require udev to come up |
130 |
> > for mounting /usr, you're going to end up at a real shell on your rootfs |
131 |
> > by one of the following means: |
132 |
> > - Pressing I for interactive boot, selecting shell (if you have not |
133 |
> > locked it down) |
134 |
> > - Passing init=/bin/sh to your boot loader. |
135 |
> > |
136 |
> > The problem case that does NOT exist here is anything more complicated; |
137 |
> > because if you have something like root-on-LVM, or encrypted root, you |
138 |
> > already have an initramfs. |
139 |
> > |
140 |
> > If the initramfs itself does exist, but fails to mount anything, you |
141 |
> > also get a rescue shell from the initramfs. |
142 |
> |
143 |
> >From my understanding, udev is needed to create the /dev-entries to be able to |
144 |
> > mount /usr. |
145 |
> If the changes proposed are actually done (moving everything out of / and into |
146 |
> /usr) then udev won't be available to create the /dev-entries. |
147 |
> A pre-populated set would work for most, but /dev/mapper used to require an |
148 |
> initramfs as this device would have different numbers upon boot. |
149 |
> If this is still the case, how would I be able to get LVM and MDADM to run to |
150 |
> get to my partitions? |
151 |
DEVTMPFS creates the first batch, and udev creates the rest. |
152 |
|
153 |
The deciding case then becomes: |
154 |
- Is the device for your /usr entry in fstab created by udev or |
155 |
something else? |
156 |
|
157 |
MD: done by devtmpfs |
158 |
LVM: done by udev+lvm |
159 |
by-uuid/by-label: done by udev |
160 |
|
161 |
by-uuid and by-label present a lot of annoyance to the minimal |
162 |
initramfs. We have to ensure that we explicitly support them, which has |
163 |
increased the complexity of the initramfs. |
164 |
|
165 |
> I'm sorry, but I see bluetooth-keyboards still as a minority. If someone |
166 |
> wants/has to use this, then an initramfs will be necessary. |
167 |
... |
168 |
> The vast majority doesn't use those. |
169 |
Likewise, we're NOT going to force you to use an initramfs. |
170 |
We're going to be providing it regardless. If the users choose not to |
171 |
use the initramfs, they get to keep the broken pieces of their systems. |
172 |
|
173 |
-- |
174 |
Robin Hugh Johnson |
175 |
Gentoo Linux: Developer, Trustee & Infrastructure Lead |
176 |
E-Mail : robbat2@g.o |
177 |
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 |