1 |
On 07/01/2013 03:23 PM, Greg KH wrote: |
2 |
> On Mon, Jul 01, 2013 at 08:45:16PM +0200, Tom Wijsman wrote: |
3 |
>>>> Q: What about my stable server? I really don't want to run this |
4 |
>>>> stuff! |
5 |
>>>> |
6 |
>>>> A: These options would depend on !CONFIG_VANILLA or |
7 |
>>>> CONFIG_EXPERIMENTAL |
8 |
>>> |
9 |
>>> What is CONFIG_VANILLA? I don't see that in the upstream kernel tree |
10 |
>>> at all. |
11 |
>>> |
12 |
>>> CONFIG_EXPERIMENTAL is now gone from upstream, so you are going to |
13 |
>>> have a problem with this. |
14 |
>> |
15 |
>> Earlier I mentioned "2) These feature should depend on a non-vanilla / |
16 |
>> experimental option." which is an option we would introduce under the |
17 |
>> Gentoo distribution menu section. |
18 |
> |
19 |
> Distro-specific config options, great :( |
20 |
> |
21 |
>>>> which would be disabled by default, therefore if you keep this |
22 |
>>>> option the way it is on your stable server; it won't affect you. |
23 |
>>> |
24 |
>>> Not always true. Look at aufs as an example. It patches the core |
25 |
>>> kernel code in ways that are _not_ accepted upstream yet. Now you all |
26 |
>>> are running that modified code, even if you don't want aufs. |
27 |
>> |
28 |
>> Earlier I mentioned "3) The patch should not affect the build by |
29 |
>> default."; if it does, we have to adjust it to not do that, this is |
30 |
>> something that can be easily scripted. It's just a matter of embedding |
31 |
>> each + block in the diff with a config check and updating the counts. |
32 |
> |
33 |
> Look at aufs as a specific example of why you can't do that, otherwise, |
34 |
> don't you think that the aufs developer(s) wouldn't have done so? |
35 |
|
36 |
I am accquainted with the developer of a stackable filesystem developer. |
37 |
According to what he has told me in person offline, the developers on |
38 |
the LKML cannot decide on how a stackable filesystem should be |
39 |
implemented. I was told three different variations on the design that |
40 |
some people liked and others didn't, which ultimately kept the upstream |
41 |
kernel from adopting anything. I specifically recall two variations, |
42 |
which were doing it as part of the VFS and doing it as part of ext4. If |
43 |
you want to criticize stackable filesystems, would you lay out a |
44 |
groundwork for getting one implemented upon which people will agree? |
45 |
|
46 |
> The goal of "don't touch any other kernel code" is a very good one, but |
47 |
> not always true for these huge out-of-tree kernel patches. Usually that |
48 |
> is the main reason why these patches aren't merged upstream, because |
49 |
> those changes are not acceptable. |
50 |
|
51 |
I was under the impression that there were several reasons for patches |
52 |
not being merged upstream: |
53 |
|
54 |
1. Lack of signed-off |
55 |
2. Code drop that no one will maintain |
56 |
3. Subsystem maintainers saying no simply because they do not like |
57 |
<insert non-technical reason here>. |
58 |
4. Risk of patent trolls |
59 |
5. Actual technical reasons |
60 |
|
61 |
> So be very careful here, you are messing with things that are rejected |
62 |
> by upstream. |
63 |
> |
64 |
> greg k-h |
65 |
> |
66 |
|
67 |
Only some of the patches were rejected. Others were never submitted. The |
68 |
PaX/GrSecurity developers prefer their code to stay out-of-tree. As one |
69 |
of the people hacking on ZFSOnLinux, I prefer that the code be |
70 |
out-of-tree. That is because fixes for other filesystems are either held |
71 |
back by a lack of system kernel updates or held hostage by regressions |
72 |
in newer kernels on certain hardware. |
73 |
|
74 |
With that said, being in Linus' tree does not make code fall under some |
75 |
golden standard for quality. There are many significant issues in code |
76 |
committed to Linus' the kernel, some of which have been problems for |
77 |
years. Just to name a few: |
78 |
|
79 |
1. Doing `rm -r /dir` on a directory tree containing millions of inodes |
80 |
(e.g. ccache) on an ext4 filesystem mounted with discard with the CFQ IO |
81 |
elevator will cause a system to hang for hours on pre-SATA 3.1 hardware. |
82 |
This is because TRIM is a non-queued command and is being interleaved |
83 |
with writes for "fairness". Incidentally, using noop turns a multiple |
84 |
hour hang into a laggy experience of a few minutes. |
85 |
|
86 |
2. aio_sync() is unimplemented, which means that there is no sane way |
87 |
for userland software like QEMU and TGT to be both fast and guarantee |
88 |
data integrity. A single crash and your guest is corrupted. It would |
89 |
have been better had AIO never been implemented. |
90 |
|
91 |
3. dm-crypt will reorder write requests across flushes. That is because |
92 |
upon seeing a write, it sends it to a work queue to be processed |
93 |
asynchronously and upon seeing a flush, it immediately processes it. A |
94 |
single kernel panic or sudden power loss can damage filesystems stored |
95 |
on it. |
96 |
|
97 |
4. Under low memory conditions with hundreds of concurrent threads (e.g. |
98 |
package builds), every thread will enter direct reclaim and there will |
99 |
be a remarkable drop in system throughput, assuming that the system does |
100 |
not lockup. There is a fairly substantial amount of time wasted after |
101 |
one thread finishes direct reclaim in other threads because they will |
102 |
still be performing direct reclaim afterward. |
103 |
|
104 |
5. The Linux 3.7 nouveau rewrite broke kexec support. The graphics |
105 |
hardware will not reinitialize properly. |
106 |
|
107 |
6. A throttle mechanism introduced for memory cgroups can cause the |
108 |
system to deadlock whenever it is holding a lock needed for swap and |
109 |
enters direct reclaim with a significant number of dirty pages. |
110 |
|
111 |
7. Code has been accepted on multiple occasions that does not compile |
112 |
and the build failures persist for weeks if not months after Linus' tag. |
113 |
I sent a patch to fix one failure. It was rejected because I had fixed |
114 |
code to compile with -Werror, people thought that -Werror should be |
115 |
removed (and therefore was no reason to fix the warnings) and we went 2 |
116 |
months until someone wrote a patch that people liked to fix it. For a |
117 |
current example of accepted code failing to build, look here: |
118 |
|
119 |
https://bugzilla.kernel.org/show_bug.cgi?id=38052 |
120 |
|
121 |
Note that I have not checked Linus' tree to see if that bug is still |
122 |
current, but the bug itself appears to be open as of this writing. |
123 |
|
124 |
There are plenty more technical issues, but these are just my pet |
125 |
peeves. If you want more examples, you could look at the patches people |
126 |
send you each day and ask yourself how many are things that could have |
127 |
been caught had people been more careful during review. For instance, |
128 |
look at the barrier patches that were done around Linux 2.6.30. What |
129 |
prevented those from being caught by review years earlier? |
130 |
|
131 |
Being outside Linus' tree is not synonymous with being bad and being bad |
132 |
is not synonymous with being rejected. It is perfectly reasonable to |
133 |
think that there are examples of good code outside Linus' tree. |
134 |
Furthermore, should the kernel kernel choose to engage that out-of-tree |
135 |
code, my expectation is that its quality will improve as they do testing |
136 |
and write patches. |