1 |
Tom <uebershark@××××××××××.com> posted |
2 |
20090326182608.5da93382@ViciousVincent, excerpted below, on Thu, 26 Mar |
3 |
2009 18:26:08 +0100: |
4 |
|
5 |
> I've upgraded to the 2.6.29-gentoo sources. I've build everything as |
6 |
> usual, and sofar, everything seems to be working. |
7 |
> Except that my network device 'dies' (not permanently) after working |
8 |
> flawlessly for maybe 10min. |
9 |
> |
10 |
> Booting a 2.6.28 kernel, I have no such issues. Restarting |
11 |
> /etc/init.d/net.eth0 has no effect, and using ifconfig up/down eth0 just |
12 |
> times out. |
13 |
> |
14 |
> The drivers are all there as they should be, could this be somekind of |
15 |
> weird regression? I'm using the Uli M526x driver, found under the |
16 |
> 'tulip-family' |
17 |
|
18 |
This is in fact a mainline regression, due to one of the last patches |
19 |
before the release that changed NAPI handling but apparently has |
20 |
interrupt implications as well. The LKML 2.6.29 announcement had a reply |
21 |
mentioning the regression and several confirmations, then discussion as |
22 |
they try to pin it down with various patches and repeated tests. They |
23 |
intend a fix for 2.6.29.1, even if it's simply reverting the late patch. |
24 |
However, that patch was itself a fix for a problem on other NICs, and |
25 |
other code intended to revert the effects of the patch still ends up |
26 |
tickling the interrupt problem so it's a bit more complex than they |
27 |
anticipated. But the normal rule is no breaking previously working |
28 |
hardware so had that patch made it even a day earlier it would have |
29 |
likely been reverted before release, and if they can't find a better |
30 |
solution, it almost certainly /will/ be reverted for .29.1. |
31 |
|
32 |
That was one of two subthreads generated by the announcement. The other |
33 |
one was related to the temporarily fixed for .29 ext4 data corruption bug |
34 |
that made big news in the -rc period. They did a temp fix for .29. Now |
35 |
that it's out, they're trying to come up with a more permanent solution, |
36 |
but there's a policy debate in the process, as to whether the (lack of) |
37 |
data stability guarantees in POSIX in the event of an improper shutdown |
38 |
is acceptable or not. The one side says POSIX doesn't require more and |
39 |
that the default data=ordered stability of ext3 was an "accident", while |
40 |
the other says that may be, but now that the stability expectation has |
41 |
been raised, changing it in the interest of "performance" isn't a good |
42 |
thing. The other bit of the debate is just how "ordered" data=ordered |
43 |
has to be. The performance side says if metadata is synced every five |
44 |
seconds (the default) while data is only synced every 30 seconds (again |
45 |
the default) with delayed allocation, and a crash causes loss of data, |
46 |
tough, it's POSIX compliant and the performance benefits are great. The |
47 |
other side says data=ordered means data=ordered, that metadata MUST wait |
48 |
to sync until after the data it covers is synced in data=ordered mode |
49 |
(the default), REGARDLESS of delayed allocation, even if the cost is loss |
50 |
of some of the vaunted performance gains of ext4 over ext3. |
51 |
|
52 |
Basically what the latter one boils down to for me and many others is |
53 |
that despite the rename of ext4dev to ext4, supposedly indicating it's |
54 |
stable now, it's NOT, at least not enough for mission critical data that |
55 |
in real life may or may not have up-to-date backups! Ext3 (or for me |
56 |
reiserfs in the same data=ordered default mode) continues to work well, |
57 |
and it's not time to go moving everything to ext4 just yet. |
58 |
|
59 |
Find the announcement thread on any LKML mirror, or covered in some |
60 |
kernel news discussions, for more. |
61 |
|
62 |
-- |
63 |
Duncan - List replies preferred. No HTML msgs. |
64 |
"Every nonfree program has a lord, a master -- |
65 |
and if you use the program, he is your master." Richard Stallman |