1 |
On Thu, 2004-09-30 at 15:54 +0300, Alex Efros wrote: |
2 |
> Hi! |
3 |
> |
4 |
> |
5 |
> My server hangs every 3-14 days without storing kernel oops message in logs |
6 |
> (this is dedicated server at hosting, so I've no physical access to console). |
7 |
> I've set up netconsole, and catch kernel oops by network on second server |
8 |
> (error message below). |
9 |
> |
10 |
> These hangs happens on different kernel versions (current is 2.6.8-gentoo-r3). |
11 |
> "SpiderAuto" process is my perl script which running using usual user account |
12 |
> and 24x7 downloading websites (there number (3-7) of such scripts running |
13 |
> doing parallel download of different websites). |
14 |
> |
15 |
> I suppose this is sort of "race condition" error related to huge number of |
16 |
> simultaneous download requests... |
17 |
> |
18 |
> Any ideas how to fix/workaround this error? Maybe try another kernel source |
19 |
> (I'm usually using gentoo-dev-sources)? |
20 |
|
21 |
[snip] |
22 |
|
23 |
Ouch, that's a nasty one. I suspect dabbling in sources variations will |
24 |
not help a great deal because the gentoo-dev-sources are so lightly |
25 |
patched in the first place. If anything, try 2.6.9-rc3. I performed a |
26 |
cusory glance over the ChangeLog for "[NETFILTER]" and, while a few |
27 |
patches have been applied, there was nothing that immediately suggested |
28 |
that it would alleviate your problem. |
29 |
|
30 |
Have you compiled in the ipchains/ipfwadm |
31 |
(CONFIG_IP_NF_COMPAT_IPCHAINS / CONFIG_IP_NF_COMPAT_IPFWADM) support by |
32 |
any chance? Apparently, it's rather buggy. For instance, this post |
33 |
mentions a bug in find_appropriate_src() which only occurs when the |
34 |
backward compatibility options are available: |
35 |
http://www.gelato.unsw.edu.au/linux-ia64/0310/7353.html. See this also: |
36 |
http://lists.netfilter.org/pipermail/netfilter-devel/2003- |
37 |
October/012872.html. |
38 |
|
39 |
Here's a description of the purpose of find_appropriate_src(): |
40 |
http://lists.netfilter.org/pipermail/netfilter-devel/2004- |
41 |
March/014418.html. |
42 |
|
43 |
If things persist, try stripping down you kernel to a bare-bones |
44 |
configuration. Enable CONFIG_DEBUG_KERNEL, CONFIG_FRAME_POINTER and |
45 |
CONFIG_MAGIC_SYSRQ. Avoid estoric options such as CONFIG_4KSTACKS, |
46 |
CONFIG_REGPARM etc if possible. A futher comparative could be to also |
47 |
avoid using modules where you know you need something (such as |
48 |
iptable_nat and e1000). |
49 |
|
50 |
You can emerge and use ksymoops to decode an oops message (such as the |
51 |
one you provided in your post). I believe it works best if the system is |
52 |
still operable after the oops, otherwise you can reboot and then decode |
53 |
the message. This is the sort of thing that any hardcore kernel hacker |
54 |
would need to see! |
55 |
|
56 |
If nothing seems to resolve the problem then it might be best if you |
57 |
prepare a slightly more detailed post for the netfilter mailing list. |
58 |
You might also want to review the lists for related posts: |
59 |
http://news.gmane.org/search.php?match=netfilter. You can use a |
60 |
newsreader on news.gmane.org also I believe. |
61 |
|
62 |
Good luck, |
63 |
|
64 |
--Kerin Francis Millar |