1 |
On Thursday, October 15, 2015 03:30:01 PM hw wrote: |
2 |
> Hi, |
3 |
> |
4 |
> I have a xen host with some HV guests which becomes unreachable via |
5 |
> the network after apparently random amount of times. I have already |
6 |
> switched the network card to see if that would make a difference, |
7 |
> and with the card currently installed, it worked fine for over 20 days |
8 |
> until it become unreachable again. Before switching the network card, |
9 |
> it would run a week or two before becoming unreachable. The previous |
10 |
> card was the on-board BCM5764M which uses the tg3 driver. |
11 |
> |
12 |
> There are messages like this in the log file: |
13 |
> |
14 |
> |
15 |
> Oct 14 20:58:02 moonflo kernel: ------------[ cut here ]------------ |
16 |
> Oct 14 20:58:02 moonflo kernel: WARNING: CPU: 10 PID: 0 at |
17 |
> net/sched/sch_generic.c:303 dev_watchdog+0x259/0x270() Oct 14 20:58:02 |
18 |
> moonflo kernel: NETDEV WATCHDOG: enp55s4 (r8169): transmit queue 0 timed |
19 |
> out Oct 14 20:58:02 moonflo kernel: Modules linked in: arc4 ecb md4 hmac |
20 |
> nls_utf8 cifs fscache xt_physdev br_netfilter iptable_filter ip_tables |
21 |
> xen_pciback xen_gntalloc xen_gntdev bridge stp llc zfs(PO) nouveau |
22 |
> snd_hda_codec_realtek snd_hda_codec_generic zunicode(PO) zavl(PO) |
23 |
> zcommon(PO) znvpair(PO) spl(O) zlib_deflate video backlight drm_kms_helper |
24 |
> ttm snd_hda_intel snd_hda_controller snd_hda_codec snd_pcm snd_timer snd |
25 |
> soundcore r8169 mii xts aesni_intel glue_helper lrw gf128mul ablk_helper |
26 |
> cryptd aes_x86_64 sha256_generic hid_generic usbhid uhci_hcd usb_storage |
27 |
> ehci_pci ehci_hcd usbcore usb_common Oct 14 20:58:02 moonflo kernel: CPU: |
28 |
> 10 PID: 0 Comm: swapper/10 Tainted: P O 4.0.5-gentoo #3 Oct 14 |
29 |
> 20:58:02 moonflo kernel: Hardware name: Hewlett-Packard HP Z800 |
30 |
> Workstation/0AECh, BIOS 786G5 v03.57 07/15/2013 Oct 14 20:58:02 moonflo |
31 |
> kernel: ffffffff8175a77d ffff880124d43d98 ffffffff814da8d8 |
32 |
> 0000000000000001 Oct 14 20:58:02 moonflo kernel: ffff880124d43de8 |
33 |
> ffff880124d43dd8 ffffffff81088850 ffff880124d43dd8 Oct 14 20:58:02 moonflo |
34 |
> kernel: 0000000000000000 ffff8800d45f2000 0000000000000001 |
35 |
> ffff8800d5294880 Oct 14 20:58:02 moonflo kernel: Call Trace: |
36 |
> Oct 14 20:58:02 moonflo kernel: <IRQ> [<ffffffff814da8d8>] |
37 |
> dump_stack+0x45/0x57 Oct 14 20:58:02 moonflo kernel: [<ffffffff81088850>] |
38 |
> warn_slowpath_common+0x80/0xc0 Oct 14 20:58:02 moonflo kernel: |
39 |
> [<ffffffff810888d1>] warn_slowpath_fmt+0x41/0x50 Oct 14 20:58:02 moonflo |
40 |
> kernel: [<ffffffff812b31c5>] ? add_interrupt_randomness+0x35/0x1e0 Oct 14 |
41 |
> 20:58:02 moonflo kernel: [<ffffffff8145b819>] dev_watchdog+0x259/0x270 Oct |
42 |
> 14 20:58:02 moonflo kernel: [<ffffffff8145b5c0>] ? |
43 |
> dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo kernel: |
44 |
> [<ffffffff8145b5c0>] ? dev_graft_qdisc+0x80/0x80 Oct 14 20:58:02 moonflo |
45 |
> kernel: [<ffffffff810d4047>] call_timer_fn.isra.30+0x17/0x70 Oct 14 |
46 |
> 20:58:02 moonflo kernel: [<ffffffff810d42a6>] |
47 |
> run_timer_softirq+0x176/0x2b0 Oct 14 20:58:02 moonflo kernel: |
48 |
> [<ffffffff8108bd0a>] __do_softirq+0xda/0x1f0 Oct 14 20:58:02 moonflo |
49 |
> kernel: [<ffffffff8108c04e>] irq_exit+0x7e/0xa0 Oct 14 20:58:02 moonflo |
50 |
> kernel: [<ffffffff8130e075>] xen_evtchn_do_upcall+0x35/0x50 Oct 14 |
51 |
> 20:58:02 moonflo kernel: [<ffffffff814e1e8e>] |
52 |
> xen_do_hypervisor_callback+0x1e/0x40 Oct 14 20:58:02 moonflo kernel: <EOI> |
53 |
> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct 14 20:58:02 |
54 |
> moonflo kernel: [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 Oct |
55 |
> 14 20:58:02 moonflo kernel: [<ffffffff810459e0>] ? xen_safe_halt+0x10/0x20 |
56 |
> Oct 14 20:58:02 moonflo kernel: [<ffffffff81053979>] ? |
57 |
> default_idle+0x9/0x10 Oct 14 20:58:02 moonflo kernel: [<ffffffff810542da>] |
58 |
> ? arch_cpu_idle+0xa/0x10 Oct 14 20:58:02 moonflo kernel: |
59 |
> [<ffffffff810bd170>] ? cpu_startup_entry+0x190/0x2f0 Oct 14 20:58:02 |
60 |
> moonflo kernel: [<ffffffff81047cd5>] ? cpu_bringup_and_idle+0x25/0x40 Oct |
61 |
> 14 20:58:02 moonflo kernel: ---[ end trace 98d961bae351244d ]--- Oct 14 |
62 |
> 20:58:02 moonflo kernel: r8169 0000:37:04.0 enp55s4: link up |
63 |
> |
64 |
> |
65 |
> After that, there are lots of messages about the link being up, one message |
66 |
> every 12 seconds. When you unplug the network cable, you get a message that |
67 |
> the link is down, and no message when you plug it in again. |
68 |
> |
69 |
> I was hoping that switching the network card (to one that uses a different |
70 |
> driver) might solve the problem, and it did not. Now I can only guess that |
71 |
> the network card goes to sleep and sometimes cannot be woken up again. |
72 |
> |
73 |
> I tried to reduce the connection speed to 100Mbit and found that accessing |
74 |
> the VMs (via RDP) becomes too slow to use them. So I disabled the power |
75 |
> management of the network card (through sysfs) and will have to see if the |
76 |
> problem persists. |
77 |
> |
78 |
> We'll be getting decent network cards in a couple days, but since the |
79 |
> problem doesn't seem to be related to a particular card/model/manufacturer, |
80 |
> that might not fix it, either. |
81 |
> |
82 |
> This problem seems to only occur on machines that operate as a xen server. |
83 |
> Other machines, identical Z800s, not running xen, run just fine. |
84 |
> |
85 |
> What would you suggest? |
86 |
|
87 |
More info required: |
88 |
|
89 |
- Which version of Xen |
90 |
- Does this only occur with HVM guests? |
91 |
- Which network-driver are you using inside the guest |
92 |
- Can you connect to the "local" console of the guest? |
93 |
- If yes, does it still have no connectivity? |
94 |
|
95 |
I saw the same on my lab machine, which was related to: |
96 |
- Not using correct drivers inside HVM guests |
97 |
- Switch hardware not keeping the MAC/IP/Port lists long enough |
98 |
|
99 |
-- |
100 |
Joost |