The lltop from jhammond

*Lltop*

Lltop[0] is a command line utility which gathers I/O statistics from
Lustre[1] filesystem servers, along with job assignment data from
cluster batch schedulers, to give a job-by-job accounting of
filesystem load. Under typical usage, lltop is invoked with the name
of a filesystem, runs for a configurable interval (10 seconds say),
and outputs a table summarizing I/O and RPC loads indexed by job
identifier; for example:

$ lltop work
JOB WR_MB RD_MB REQS OWNER WORKDIR
12101 15925 67630 133694 jfourier /work/jfourier/fftw_run
10322 2254 1027 2504 claude /work/claude/viscous-flow-08
13007 756 21024 10007 ludwig /work/ludwig/boltzeq.mvapich2
...

Normally, lltop is run in response to observations of excessive load
on file servers or degraded filesystem performance, and is used to
assist system administrators in identifying jobs (and users) with
problematic I/O patterns. A potential secondary use is to determine
the I/O profiles of applications running at scale. lltop is designed
to run as a point and shoot diagnostic utility, and is not a
replacement for continuous monitoring tools such as LMT[2] or
Collectl[3].

*Overview*

Lltop has two executable components, lltop itself, and lltop-serv.
lltop is usually run directly and given the name of a filesystem to
query. From the filesystem name, it derives a list of servers (MDSs
and OSSs), and for each it forks and execs ssh to run a copy of
lltop-serv on the server.

On the server, lltop-serv scrapes the per-client stats files

/proc/fs/lustre/{mds,obdfilter}/<target>/exports/<client>/stats

to determine each client's load in terms of bytes written, bytes read,
and requests processed. It actually makes two passes through the
stats files[4], sleeping for a configurable interval between, and
returns the differences. The output of lltop-serv consists of lines
[5] of the form

<ipv4-addr>@<lnet-net-name> <wr_B> <rd_B> <reqs>

where

<ipv4-addr>@<lnet-net-name> is the client address according to Lustre,
for example 192.0.32.10@tcp,
<wr_B> and <rd_B> are the number of bytes written and read,
<reqs> is the number of request other than pings[6].

Lltop reads this output and translates client addresses to hostnames,
and hostnames to jobids[7, 8], to account for each client's load against
its current job. If lltop cannot find a job assignment for a given
client then considers the client to be the sole member of a job whose
jobid is the clients hostname. Similarly, if lltop cannot find a
hostname for a given client IP address, it uses the address as the
clients name and current jobid. This allows us to handle load
generated by login or admin nodes in the same band.

*Configuring lltop*

To get lltop to work on your site you probably need to override some
of the default configuration. Most of this can be accomplished
through command line options, but the source is organized so that the
same effects (and more) can be acheived by modifying the functions in
hooks.c. Here are the main things you may need to do, along with some
suggestions.

1. Tell lltop on which servers it should run lltop-serv. You have
three options:

a. Modify the function get_serv_list() in hooks.c, so that lltop may
be invoked with the filesystem name as an argument.

b. Use the -l (--server-list) option to specify a list of servers
directly:

lltop -l mds1.example.com oss{01..27}.example.com

c. Provided that FILESYSTEM is mounted on the current host, use some
crazy pipeline, like:

sed 's/@.*$//' /proc/fs/lustre/{mdc,osc}/FILESYSTEM-*/*_conn_uuid | sort | uniq | xargs lltop -l

2. Tell lltop how to translate Lustre client addresses (usually dotted
quads with the @<lnet-net-name> stripped off) to hostnames. How well
does reverse DNS work at your site? If the answer is "Uhhh, not real
well.", or if you have some weird LNET with a weird address format
like qswlnd, whatever that is, then keep reading, otherwise skip to 3.
The default address to host lookup uses getnameinfo(), which should
work fine given a correct site config. If not, here are three
possibilities:

a. Using getnameinfo_get_host() as a template, add the function
my_site_get_host() to hooks.c and tell lltop to use it.

b. Use the -g (--get-host) option to specify an external command
which should take the address as its only argument and print a
hostname. If it succeeds, your exernal command should return 0,
otherwise lltop will treat the dotted quad as if it is the client's
hostname.

c. Fix /etc/hosts, /etc/nsswitch.conf, /etc/resolv.conf,..., so
that getnameinfo() works on the host where you run lltop.

3. Tell lltop how to lookup the current job for a host. Lltop was
originally written for TACC Ranger which uses SGE for batch
scheduling. Under that setup the JOBID of the current job on HOST is
determined from the existence of a file

/share/sge6.2/execd_spool/HOST/active_jobs/JOBID.*

This is the default method in lltop. Otherwise:

a. If you run SGE but you need to override the execd_spool path then
do so by modifying hooks.c or passing --execd-spool=PATH.

b. Using execd_spool_get_job() as a template, add the function
my_site_get_job() to hooks.c and tell lltop to use it.

c. Use the -j (--get-job) option to specify an external command to
do job lookup. It should function like the external host lookup
command described above.

d. Use the -m (--job-map) option to specify an external command
which produces a "job map." This is useful if you use something
like qhost for job lookup, since using 'qhost -j -h <host>' to get
the current job of a single takes about the same time as calling
'qhost -j' to get the current job of all nodes at once. See the
attached script qhost_job_map.

*Installing lltop*

Run make, put lltop somewhere in your path on an admin node, put
lltop-serv somewhere in your path on the Lustre servers. Also see the
included script tacc_lltop which we use to add job owner and workdir
to the output of lltop.

*Getting Help*

$ lltop --help
Usage: lltop [OPTION]... FILESYSTEM
or: lltop [OPTION]... -l SERVER...
Report load by job for Lustre FILESYSTEM or SERVER(s).

Mandatory arguments to long options are mandatory for short options too.
-f, --fqdn use fully qualified domain names for clients
-g, --get-host=COMMAND use COMMAND for reverse DNS lookups
-h, --help display this help and exit
-i, --interval=NUMBER report load over NUMBER seconds
-j, --get-job=COMMAND use COMMAND for job lookup
-l, --server-list report load on servers given as arguments
-m, --job-map=COMMAND use COMMAND to get job map
-n, --limit=NUMBER limit output to NUMBER jobs
--no-header do not display header
--lltop-serv=PATH use lltop-serv at PATH on servers
--remote-shell=PATH use remote shell at PATH to execute lltop-serv
--execd-spool=PATH use execd_spool directory PATH for job lookup

lltop GitHub repository: <https://github.com/jhammond/lltop>

Otherwise, please send me any comments, questions, improvements. I am
especially interested in receiving/including any code/scripts to do
job lookup for batch schedulers other than SGE. Please, put lltop in
the subject line.

John L. Hammond
TACC, The University of Texas at Austin
<[email protected]>

0. lltop is a recursive anagram of lltop.

1. According to the headers, Lustre is a trademark of Sun
Microsystems.

2. Lustre Monitoring Tool: http://code.google.com/p/lmt/

3. Collectl: http://collectl.sourceforge.net/

4. Note that lltop-serv does not clear the stats files. In fact
clearing stats files while lltop-serv is running may cause it to
misreport or under report usage. Client evictions can also affect the
accuracy of the data returned, but lltop-serv does use some simple
heuristics to mitigate their effects. However it should be remembered
that lltop is not an exact tool and should be used with judgement.

5. Lltop-serv does not count pings because doing so tends to distort
the statistics for large jobs.

6. As an optimization, if a client fails to geterate any load during
the interval, then lltop-serv omits that client from its output.

7. Lltop keeps a cache of address to jobid mappings so that the
hostname and jobid lookups are done at most once per client.

8. If your site runs multiple concurrent jobs on single hosts then it
may be hard to adapt lltop. I welcome suggestions on how to handle
this case.

jhammond / lltop Goto Github PK

lltop's Introduction

lltop's People

Contributors

Stargazers

Watchers

Forkers

lltop's Issues

Very big values appearing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent