Gentoo Archives: gentoo-amd64

From: Duncan <1i5t5.duncan@×××.net>
To: gentoo-amd64@l.g.o
Subject: [gentoo-amd64] Re: kernel-2.6.29-gentoo, network device failing after maybe 10min
Date: Thu, 26 Mar 2009 18:59:27
Message-Id: pan.2009.03.26.18.59.13@cox.net
In Reply to: [gentoo-amd64] kernel-2.6.29-gentoo, network device failing after maybe 10min by Tom
1 Tom <uebershark@××××××××××.com> posted
2 20090326182608.5da93382@ViciousVincent, excerpted below, on Thu, 26 Mar
3 2009 18:26:08 +0100:
4
5 > I've upgraded to the 2.6.29-gentoo sources. I've build everything as
6 > usual, and sofar, everything seems to be working.
7 > Except that my network device 'dies' (not permanently) after working
8 > flawlessly for maybe 10min.
9 >
10 > Booting a 2.6.28 kernel, I have no such issues. Restarting
11 > /etc/init.d/net.eth0 has no effect, and using ifconfig up/down eth0 just
12 > times out.
13 >
14 > The drivers are all there as they should be, could this be somekind of
15 > weird regression? I'm using the Uli M526x driver, found under the
16 > 'tulip-family'
17
18 This is in fact a mainline regression, due to one of the last patches
19 before the release that changed NAPI handling but apparently has
20 interrupt implications as well. The LKML 2.6.29 announcement had a reply
21 mentioning the regression and several confirmations, then discussion as
22 they try to pin it down with various patches and repeated tests. They
23 intend a fix for 2.6.29.1, even if it's simply reverting the late patch.
24 However, that patch was itself a fix for a problem on other NICs, and
25 other code intended to revert the effects of the patch still ends up
26 tickling the interrupt problem so it's a bit more complex than they
27 anticipated. But the normal rule is no breaking previously working
28 hardware so had that patch made it even a day earlier it would have
29 likely been reverted before release, and if they can't find a better
30 solution, it almost certainly /will/ be reverted for .29.1.
31
32 That was one of two subthreads generated by the announcement. The other
33 one was related to the temporarily fixed for .29 ext4 data corruption bug
34 that made big news in the -rc period. They did a temp fix for .29. Now
35 that it's out, they're trying to come up with a more permanent solution,
36 but there's a policy debate in the process, as to whether the (lack of)
37 data stability guarantees in POSIX in the event of an improper shutdown
38 is acceptable or not. The one side says POSIX doesn't require more and
39 that the default data=ordered stability of ext3 was an "accident", while
40 the other says that may be, but now that the stability expectation has
41 been raised, changing it in the interest of "performance" isn't a good
42 thing. The other bit of the debate is just how "ordered" data=ordered
43 has to be. The performance side says if metadata is synced every five
44 seconds (the default) while data is only synced every 30 seconds (again
45 the default) with delayed allocation, and a crash causes loss of data,
46 tough, it's POSIX compliant and the performance benefits are great. The
47 other side says data=ordered means data=ordered, that metadata MUST wait
48 to sync until after the data it covers is synced in data=ordered mode
49 (the default), REGARDLESS of delayed allocation, even if the cost is loss
50 of some of the vaunted performance gains of ext4 over ext3.
51
52 Basically what the latter one boils down to for me and many others is
53 that despite the rename of ext4dev to ext4, supposedly indicating it's
54 stable now, it's NOT, at least not enough for mission critical data that
55 in real life may or may not have up-to-date backups! Ext3 (or for me
56 reiserfs in the same data=ordered default mode) continues to work well,
57 and it's not time to go moving everything to ext4 just yet.
58
59 Find the announcement thread on any LKML mirror, or covered in some
60 kernel news discussions, for more.
61
62 --
63 Duncan - List replies preferred. No HTML msgs.
64 "Every nonfree program has a lord, a master --
65 and if you use the program, he is your master." Richard Stallman