1 |
Kerin Millar <kerframil <at> fastmail.co.uk> writes: |
2 |
|
3 |
|
4 |
> The need for the OOM killer stems from the fact that memory can be |
5 |
> overcommitted. These articles may prove informative: |
6 |
|
7 |
> http://lwn.net/Articles/317814/ |
8 |
|
9 |
Yea I saw this article. Its dated February 4, 2009. How much has |
10 |
changed with the kernel/configs/userspace mechanism? Nothing, everything? |
11 |
|
12 |
|
13 |
> |
14 |
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html |
15 |
|
16 |
Nice to know. |
17 |
|
18 |
> In my case, the most likely trigger - as rare as it is - would be a |
19 |
> runaway process that consumes more than its fair share of RAM. |
20 |
> Therefore, I make a point of adjusting the score of production-critical |
21 |
> applications to ensure that they are less likely to be culled. |
22 |
|
23 |
Ok I see the manual tools for OOM-killer. Are there any graphical tools |
24 |
for monitoring, configuring, and control of OOM related files and target |
25 |
processes? All of this performed by hand? |
26 |
|
27 |
|
28 |
> If your cases are not pathological, you could increase the amount of |
29 |
> memory, be it by additional RAM or additional swap [1]. Alternatively, |
30 |
> if you are able to precisely control the way in which memory is |
31 |
> allocated and can guarantee that it will not be exhausted, you may elect |
32 |
> to disable overcommit, though I would not recommend it. |
33 |
|
34 |
I do not have a problem. It keeps popping up in my clustering research, |
35 |
frequently. Many of the clustering environments have heavy memory |
36 |
requirements, so this will eventually be monitored, diagnosed and managed, |
37 |
real time, in the cluser softwares, such as load balancing. These are |
38 |
very new technologies, hence my need to understand both legacy current |
39 |
issues and solutions. You cannot just always add resources. ONce set up |
40 |
you have to dynamically manage resource consumption, or at least that |
41 |
is what the current readings reveal. |
42 |
|
43 |
|
44 |
> With NUMA, things may be more complicated because there is the potential |
45 |
> for a particular memory node to be exhausted, unless memory interleaving |
46 |
> is employed. Indeed, I make a point of using interleaving for MySQL, |
47 |
> having gotten the idea from the Twitter fork. |
48 |
|
49 |
Well my first cluster is just (3) AMD-FX8350 with 32G ram each. |
50 |
Once that is working, reasonably well, I'm sure I'll be adding |
51 |
different (multi) processors to the mix, with differnt ram characteristis. |
52 |
There is a *huge interest* in heterogenous clusters, including but |
53 |
not limited to the GPU/APU hardware. So dynamic, real-time memory |
54 |
managment is quintessentially important for successful clustering. |
55 |
|
56 |
|
57 |
> Finally, make sure you are using at least Linux 3.12, because some |
58 |
> improvements have been made there [2]. |
59 |
|
60 |
yep, [1] I always set of gigs of swap and rarely use it, for critical |
61 |
computations that must be fast. Many cluster folks are building |
62 |
systems with both SSD and traditional (raid) HD setups. The SSD |
63 |
could be partitioned for the cluster and swap. Lots of experimentation |
64 |
on how best to deploy SSD with max_ram in systems for clusters is |
65 |
ongoing. |
66 |
|
67 |
|
68 |
Memory Management is a primary focus of Apache-Spark (in-memory) |
69 |
computations. Spark can be use with Python, Java and Scala; so it is very cool. |
70 |
|
71 |
|
72 |
> --Kerin |
73 |
> [1] At a pinch, additional swap may be allocated as a file |
74 |
> [2] https://lwn.net/Articles/562211/#oom |
75 |
|
76 |
(2) is also good to know. |
77 |
|
78 |
thx, |
79 |
James |