1 |
I just discovered the conversation about collectl and saw in a list archive |
2 |
and thought I'd jump in. |
3 |
When I first wrote collectl over 10 years ago the we felt we needed a more |
4 |
powerful/flexible tool than sar to work with out High Performance customers |
5 |
at HP. For example, we needed to record a lot more types of information |
6 |
than sar such as Infiniband and Lustre File System statistics. How about |
7 |
impi data such as temperatures or fan speeds? Power consumption? Anybody |
8 |
remember Quadrics interconnect? Collectl does that too, but there's a |
9 |
whole lot more to collectl than just types of data it collects. |
10 |
|
11 |
Rather than repeating what's on the website - |
12 |
http://collectl.sourceforge.net/, you can read some of the features |
13 |
yourselves. Suffice it to say it runs on some of the worlds largest |
14 |
clusters, sampling hundreds of data points every 10 seconds while using < |
15 |
0.1% of a CPU. |
16 |
|
17 |
But even more are 2 utilities that make it even more useful - |
18 |
http://collectl-utils.sourceforge.net/. colplot lets you produce high |
19 |
resolution plot for dozens (or more) of nodes via a browser. colmux allows |
20 |
you to monitor hundreds of nodes in real-time from a single window, much |
21 |
like top. but unlike top which only shows top processes, colmux can do |
22 |
that as well as show top-anything! at least anything collectl can report. |
23 |
for example, if you had dozens of servers, each with dozens of disks, you |
24 |
can use colmux to find the disks with the longest wait time. or how about |
25 |
the systems with the highest temps? |
26 |
|
27 |
anyhow, see for yourself and check it out. |
28 |
|
29 |
-mark |