1 |
Just a final word on this... |
2 |
|
3 |
The problem is effectively resolved... i was able to rebuild the system, |
4 |
then world with zero issues. I then ran revdep-rebuild, no issues and no |
5 |
broken links found, I then recompiled pkgs with deps against glibc and ran |
6 |
revdep-rebuild again. The whole thing ran at full capacity and with zero |
7 |
errors. |
8 |
|
9 |
I don't know if I felt as good as this when I found the "root cause"... I |
10 |
just know that having "root" again feels great! ;) |
11 |
|
12 |
Okay... and now let's upgrade the kernel... ;P |
13 |
|
14 |
Thanks again, |
15 |
Simon |
16 |
|
17 |
|
18 |
|
19 |
On Sat, Jan 8, 2011 at 3:16 PM, Mark Knecht <markknecht@×××××.com> wrote: |
20 |
|
21 |
> Glad you have a root cause/solution. |
22 |
> |
23 |
> On Sat, Jan 8, 2011 at 10:49 AM, Simon <turner25@×××××.com> wrote: |
24 |
> <SNIP> |
25 |
> > The virtual HD is physically on a raid (unknown config). Mark, the |
26 |
> sector |
27 |
> > size issue you mention, does it have to do with aligning real HD sectors |
28 |
> > with filesystem sectors (so that stuff like read-ahead will get |
29 |
> > no-more-no-less than what the kernel wants)? I've read about this kind |
30 |
> of |
31 |
> > setup when I was interested in RAID long ago... Now that I know my hd is |
32 |
> > actually on a raid, maybe i could benefit some I/O performance |
33 |
> improvements |
34 |
> > by tuning this a bit! |
35 |
> > |
36 |
> |
37 |
> As it's RAID underneath it's likely set up correctly. The issue I had |
38 |
> in mind was the disk being a 4K/sector disk but the person who built |
39 |
> the partition not knowing to align the partition to a 4K boundary. |
40 |
> That can cause a _huge_ slowdown. |
41 |
> |
42 |
> I doubt that's the case here. As this is a hosting service they likely |
43 |
> know what they are doing in that area, and if it wasn't done correctly |
44 |
> you would have noticed it before I think. |
45 |
> |
46 |
> > Anyway, I was told by the support team that another user on the same |
47 |
> > physical machine (remember it's a xen VPS) was doing I/O intensive stuff |
48 |
> > which could have "I/O starved" my system. I don't understand how |
49 |
> starving |
50 |
> > or even doing some kind of DoS attack could lead to a complete freeze on |
51 |
> the |
52 |
> > console, but eh... |
53 |
> |
54 |
> Makes sense actually. The other guy took all the disk I/O leaving you |
55 |
> with none. If you can't get to the disk then you cannot read ebuilds |
56 |
> or write compiled code, or at least not fast. |
57 |
> |
58 |
> > They offered to migrate my system to another physical |
59 |
> > machine, and after that... I was able to perform a complete 'emerge -e |
60 |
> > system' in one shot without a scratch, I even did it with --jobs=2 and |
61 |
> > MAKEOPTS="-j4". After that, I started a complete "emerge --keep-going |
62 |
> > --jobs=2 world" with MAKEOPTS="-j8"... (i got 4 cores: dual xeon 2Ghz) |
63 |
> > |
64 |
> |
65 |
> So now you're in good shape...until some user on the new system starts |
66 |
> hogging all the disk I/O and holds you up again. |
67 |
> |
68 |
> > This last emerge is still going on as I write this and is emerging pkg |
69 |
> 522 |
70 |
> > of 620 !! And there were no build errors so far... |
71 |
> > |
72 |
> > It's emerging glibc at the moment, so once the big emerge is finished, |
73 |
> I'll |
74 |
> > probably recompile all pkgs that depend on glibc. I believe glibc was |
75 |
> > actually updated during my very initial update on monday and I haven't |
76 |
> come |
77 |
> > to do that... but I guess everything will go smoothly from here. |
78 |
> > |
79 |
> > Thanks again for all your help guys! |
80 |
> > Simon |
81 |
> |
82 |
> Good that you got to the root of the problem. |
83 |
> |
84 |
> Good luck, |
85 |
> Mark |
86 |
> |
87 |
> |