1 |
On Tuesday 21 Mar 2017 22:50:04 Kai Krakow wrote: |
2 |
> Am Tue, 21 Mar 2017 23:22:48 +0200 |
3 |
> |
4 |
> schrieb Alan McKinnon <alan.mckinnon@×××××.com>: |
5 |
> > On 21/03/2017 22:16, Kai Krakow wrote: |
6 |
> > > Test one by one... Either disable all, then enable one by one, or |
7 |
> > > vice-versa. |
8 |
> > > |
9 |
> > > Chances are that your FS may be blocking on sync. Do you maybe have |
10 |
> > > a very high value in /proc/sys/vm/dirty_background_{ratio,bytes}? |
11 |
> > > |
12 |
> > > If ratio is 0, then bytes is used. Ratio is a percent of your |
13 |
> > > physical RAM. With the default kernel value in modern systems, this |
14 |
> > > is ridiculously high for desktop systems. Maybe put a fixed value, |
15 |
> > > like 128MB. The dirty background value is the amount of outstanding |
16 |
> > > writes before a foreground process blocks on further writes. If |
17 |
> > > this value is high, a sync may cause processes to freeze for a long |
18 |
> > > time. Setting this to a lower value forces single processes to |
19 |
> > > block early and give the kernel a chance to write back dirty data. |
20 |
> > > |
21 |
> > > The next value to check is dirty_{ratio,bytes}. That is the combined |
22 |
> > > maximum of outstanding data before the cache must be flushed. If |
23 |
> > > this is hit, all writing processes freeze. So, having the |
24 |
> > > background value high gives a greater chance of hitting this early. |
25 |
> > > |
26 |
> > > The default values are 10% and 20% (ratio). I've made the 20% ratio |
27 |
> > > into 10% and put 128MB for background which works quite well: |
28 |
> > > Foreground processes are blocked for shorter times (because writing |
29 |
> > > 128MB can be a few seconds or less, where 1.6GB can be minutes in a |
30 |
> > > worse case, so if overall limit is hit, I'm screwed). The overall |
31 |
> > > dirty buffer is still big enough to let the system buffer writes of |
32 |
> > > multiple processes. My system has 16GB RAM, you may want to adjust |
33 |
> > > it or try different values. |
34 |
> > > |
35 |
> > > $ cat /etc/sysctl.d/98-caching.conf |
36 |
> > > vm.dirty_background_bytes = 134217728 |
37 |
> > > vm.dirty_ratio = 10 |
38 |
> > > |
39 |
> > > Maybe point your Firefox cache to a tmpfs. If you're using tmpfs, |
40 |
> > > don't put swappiness to low, otherwise data sitting in tmpfs cannot |
41 |
> > > be swapped out and will cause filesystem caches to be discarded to |
42 |
> > > early. I'm working with a 32GB tmpfs and standard swappiness for |
43 |
> > > emerge, and I see no problems although multiple gigabytes of emerge |
44 |
> > > build data may be swapped out. Still, emerge is so much faster now. |
45 |
> > > But then, my swaps are on different disks (and I have multiple for |
46 |
> > > getting some RAID-like striping of swap space). |
47 |
> > > |
48 |
> > > Also, depending on which FS you're using, trying deadline instead of |
49 |
> > > CFQ may greatly improve your desktop experience (browsers should |
50 |
> > > benefit most from this). |
51 |
> > |
52 |
> > You may be onto something here: |
53 |
> > |
54 |
> > This is an 8-core i7 latop, 16G RAM |
55 |
> > |
56 |
> > $ sudo cat /proc/sys/vm/dirty_background_bytes |
57 |
> > 0 |
58 |
> > $ sudo cat /proc/sys/vm/dirty_background_ratio |
59 |
> > 5 |
60 |
> > $ sudo cat /proc/sys/vm/dirty_bytes |
61 |
> > 0 |
62 |
> > $ sudo cat /proc/sys/vm/dirty_ratio |
63 |
> > 10 |
64 |
> |
65 |
> With a 16 GB machine, I recommend to not work with the ratio values and |
66 |
> stick to bytes values. 1% steps is just so coarse. Put some reasonable |
67 |
> values there. |
68 |
> |
69 |
> > browser cache is on a regular laptop spinning-rust 500G disk |
70 |
> |
71 |
> Try moving the cache to tmpfs just for the sake of eliminating that... |
72 |
> Nowadays, /tmp is usually mounted with tmpfs (at least it should), |
73 |
> otherwise mount tmpfs somewhere below /mnt and make it chmod 1777. Then |
74 |
> create a cache directory there, rename your browser cache (while the |
75 |
> browser is not running) and instead put a symlink to the newly created |
76 |
> directory. Now do some tests without rebooting. If it works, create an |
77 |
> fstab entry to mount a tmpfs directly to your firefox cache directory |
78 |
> with correct permissions. Of course, it would be lost on reboots. |
79 |
> |
80 |
> An alternative could be to put the cache on an FS with better write |
81 |
> performance, like NILFS2 (it does linear writes only but reading will |
82 |
> suffer, but reading is not that sensitive to blocking). Reiserfs can |
83 |
> also perform well when fsyncs are involved. But it doesn't scale well |
84 |
> to parallel accesses (which is not so relevant for desktop usage, and |
85 |
> especially as browser cache). Also, XFS always performed very well for |
86 |
> me (better than Ext4), for desktop and server usage. But that only |
87 |
> makes sense if you convert your whole system to that. And it cannot |
88 |
> play its benefits if used on single-disk systems. |
89 |
> |
90 |
> > IO scheduler is BFQ, I use it for ages now. |
91 |
> |
92 |
> Yes, good choice. I'd use it, too. But it causes troubles with btrfs |
93 |
> (results in system freezes with fs corruption when I run VirtualBox). |
94 |
> |
95 |
> > I did tests some years back and found it overall the best for an |
96 |
> > interactive desktop with a DE. I haven't repeated those tests since, |
97 |
> > has there been significant changes in this are last year or three? |
98 |
> |
99 |
> It still performance very well. The next best option for me was using |
100 |
> deadline. CFQ is an interactivity killer. |
101 |
> |
102 |
> I'm combining this with bcache. That's a cache between kernel and |
103 |
> filesystem that you put on SSD. Apparently, it requires repartitioning |
104 |
> to map your filesystem through bcache (it has to add a protective |
105 |
> superblock in front of your FS). So, a small SSD + bcache can make your |
106 |
> complete 500GB spinning rust act mostly like SSD perfomance-wise. |
107 |
> |
108 |
> I think there's a script that can move your FS 8 kB forward on HDD to |
109 |
> add that bcache superblock. But I wouldn't try that without backup and |
110 |
> some spare time. But it is a performance wonder. |
111 |
> |
112 |
> Using 3x 1TB btrfs RAID + 500GB bcache here. The system feels like an |
113 |
> SSD system but I don't have to decide what to put on a small SSD and |
114 |
> what to put on big slow storage. Is just automagic. ;-) |
115 |
> |
116 |
> BTW: Laptop disks are really slow usually because most manufacturers |
117 |
> only build them with 5400 RPM disks. Maybe get a hybrid disk instead if |
118 |
> you only have one slot. I think, Seagate still makes those. It should |
119 |
> have similar benefits like bcache. |
120 |
|
121 |
|
122 |
A desktop started having problems similar to Alan's since the last upgrade: |
123 |
|
124 |
Installed versions: 45.8.0^d(18:42:51 03/14/17)(dbus ffmpeg gmp- |
125 |
autoupdate gstreamer jemalloc3 jit pulseaudio startup-notification system- |
126 |
harfbuzz system-icu system-jpeg system-libevent system-libvpx system-sqlite - |
127 |
bindist -custom-cflags -custom-optimization -debug -hardened -hwaccel -neon - |
128 |
pgo -selinux -system-cairo -test -wifi L10N="en-GB -ach -af -an -ar -as -ast - |
129 |
az -be -bg -bn-BD -bn-IN -br -bs -ca -cs -cy -da -de -el -en-ZA -eo -es-AR - |
130 |
es-CL -es-ES -es-MX -et -eu -fa -fi -fr -fy -ga -gd -gl -gu -he -hi -hr -hsb - |
131 |
hu -hy -id -is -it -ja -kk -km -kn -ko -lt -lv -mai -mk -ml -mr -ms -nb -nl - |
132 |
nn -or -pa -pl -pt-BR -pt-PT -rm -ro -ru -si -sk -sl -son -sq -sr -sv -ta -te |
133 |
-th -tr -uk -uz -vi -xh -zh-CN -zh-TW") |
134 |
Homepage: http://www.mozilla.com/firefox |
135 |
Description: Firefox Web Browser |
136 |
|
137 |
I thought it may be related to profile-sync-daemon (psd) mapping the browser |
138 |
cache to /tmp, but have not found the cause of this problem. I noticed it has |
139 |
been happening when the user is creating new bookmarks, but I am not 100% |
140 |
sure. Unlike Alan's case, here the whole PC may lock up, or in any case the |
141 |
keyboard is lost and I have to ssh in. Typically one core is pegged at 100%. |
142 |
Killing firefox recovers the OS. Sometimes the crash is too far gone by the |
143 |
time I am called and a 3 finger salute is necessary. |
144 |
|
145 |
I'll ask the user to start it from a terminal next time in case something more |
146 |
meaningful shows up. |
147 |
-- |
148 |
Regards, |
149 |
Mick |