jhammond / lltop Goto Github PK
View Code? Open in Web Editor NEWLustre load monitor with batch scheduler integration
License: GNU General Public License v2.0
Lustre load monitor with batch scheduler integration
License: GNU General Public License v2.0
*Lltop* Lltop[0] is a command line utility which gathers I/O statistics from Lustre[1] filesystem servers, along with job assignment data from cluster batch schedulers, to give a job-by-job accounting of filesystem load. Under typical usage, lltop is invoked with the name of a filesystem, runs for a configurable interval (10 seconds say), and outputs a table summarizing I/O and RPC loads indexed by job identifier; for example: $ lltop work JOB WR_MB RD_MB REQS OWNER WORKDIR 12101 15925 67630 133694 jfourier /work/jfourier/fftw_run 10322 2254 1027 2504 claude /work/claude/viscous-flow-08 13007 756 21024 10007 ludwig /work/ludwig/boltzeq.mvapich2 ... Normally, lltop is run in response to observations of excessive load on file servers or degraded filesystem performance, and is used to assist system administrators in identifying jobs (and users) with problematic I/O patterns. A potential secondary use is to determine the I/O profiles of applications running at scale. lltop is designed to run as a point and shoot diagnostic utility, and is not a replacement for continuous monitoring tools such as LMT[2] or Collectl[3]. *Overview* Lltop has two executable components, lltop itself, and lltop-serv. lltop is usually run directly and given the name of a filesystem to query. From the filesystem name, it derives a list of servers (MDSs and OSSs), and for each it forks and execs ssh to run a copy of lltop-serv on the server. On the server, lltop-serv scrapes the per-client stats files /proc/fs/lustre/{mds,obdfilter}/<target>/exports/<client>/stats to determine each client's load in terms of bytes written, bytes read, and requests processed. It actually makes two passes through the stats files[4], sleeping for a configurable interval between, and returns the differences. The output of lltop-serv consists of lines [5] of the form <ipv4-addr>@<lnet-net-name> <wr_B> <rd_B> <reqs> where <ipv4-addr>@<lnet-net-name> is the client address according to Lustre, for example 192.0.32.10@tcp, <wr_B> and <rd_B> are the number of bytes written and read, <reqs> is the number of request other than pings[6]. Lltop reads this output and translates client addresses to hostnames, and hostnames to jobids[7, 8], to account for each client's load against its current job. If lltop cannot find a job assignment for a given client then considers the client to be the sole member of a job whose jobid is the clients hostname. Similarly, if lltop cannot find a hostname for a given client IP address, it uses the address as the clients name and current jobid. This allows us to handle load generated by login or admin nodes in the same band. *Configuring lltop* To get lltop to work on your site you probably need to override some of the default configuration. Most of this can be accomplished through command line options, but the source is organized so that the same effects (and more) can be acheived by modifying the functions in hooks.c. Here are the main things you may need to do, along with some suggestions. 1. Tell lltop on which servers it should run lltop-serv. You have three options: a. Modify the function get_serv_list() in hooks.c, so that lltop may be invoked with the filesystem name as an argument. b. Use the -l (--server-list) option to specify a list of servers directly: lltop -l mds1.example.com oss{01..27}.example.com c. Provided that FILESYSTEM is mounted on the current host, use some crazy pipeline, like: sed 's/@.*$//' /proc/fs/lustre/{mdc,osc}/FILESYSTEM-*/*_conn_uuid | sort | uniq | xargs lltop -l 2. Tell lltop how to translate Lustre client addresses (usually dotted quads with the @<lnet-net-name> stripped off) to hostnames. How well does reverse DNS work at your site? If the answer is "Uhhh, not real well.", or if you have some weird LNET with a weird address format like qswlnd, whatever that is, then keep reading, otherwise skip to 3. The default address to host lookup uses getnameinfo(), which should work fine given a correct site config. If not, here are three possibilities: a. Using getnameinfo_get_host() as a template, add the function my_site_get_host() to hooks.c and tell lltop to use it. b. Use the -g (--get-host) option to specify an external command which should take the address as its only argument and print a hostname. If it succeeds, your exernal command should return 0, otherwise lltop will treat the dotted quad as if it is the client's hostname. c. Fix /etc/hosts, /etc/nsswitch.conf, /etc/resolv.conf,..., so that getnameinfo() works on the host where you run lltop. 3. Tell lltop how to lookup the current job for a host. Lltop was originally written for TACC Ranger which uses SGE for batch scheduling. Under that setup the JOBID of the current job on HOST is determined from the existence of a file /share/sge6.2/execd_spool/HOST/active_jobs/JOBID.* This is the default method in lltop. Otherwise: a. If you run SGE but you need to override the execd_spool path then do so by modifying hooks.c or passing --execd-spool=PATH. b. Using execd_spool_get_job() as a template, add the function my_site_get_job() to hooks.c and tell lltop to use it. c. Use the -j (--get-job) option to specify an external command to do job lookup. It should function like the external host lookup command described above. d. Use the -m (--job-map) option to specify an external command which produces a "job map." This is useful if you use something like qhost for job lookup, since using 'qhost -j -h <host>' to get the current job of a single takes about the same time as calling 'qhost -j' to get the current job of all nodes at once. See the attached script qhost_job_map. *Installing lltop* Run make, put lltop somewhere in your path on an admin node, put lltop-serv somewhere in your path on the Lustre servers. Also see the included script tacc_lltop which we use to add job owner and workdir to the output of lltop. *Getting Help* $ lltop --help Usage: lltop [OPTION]... FILESYSTEM or: lltop [OPTION]... -l SERVER... Report load by job for Lustre FILESYSTEM or SERVER(s). Mandatory arguments to long options are mandatory for short options too. -f, --fqdn use fully qualified domain names for clients -g, --get-host=COMMAND use COMMAND for reverse DNS lookups -h, --help display this help and exit -i, --interval=NUMBER report load over NUMBER seconds -j, --get-job=COMMAND use COMMAND for job lookup -l, --server-list report load on servers given as arguments -m, --job-map=COMMAND use COMMAND to get job map -n, --limit=NUMBER limit output to NUMBER jobs --no-header do not display header --lltop-serv=PATH use lltop-serv at PATH on servers --remote-shell=PATH use remote shell at PATH to execute lltop-serv --execd-spool=PATH use execd_spool directory PATH for job lookup lltop GitHub repository: <https://github.com/jhammond/lltop> Otherwise, please send me any comments, questions, improvements. I am especially interested in receiving/including any code/scripts to do job lookup for batch schedulers other than SGE. Please, put lltop in the subject line. John L. Hammond TACC, The University of Texas at Austin <[email protected]> -- 0. lltop is a recursive anagram of lltop. 1. According to the headers, Lustre is a trademark of Sun Microsystems. 2. Lustre Monitoring Tool: http://code.google.com/p/lmt/ 3. Collectl: http://collectl.sourceforge.net/ 4. Note that lltop-serv does not clear the stats files. In fact clearing stats files while lltop-serv is running may cause it to misreport or under report usage. Client evictions can also affect the accuracy of the data returned, but lltop-serv does use some simple heuristics to mitigate their effects. However it should be remembered that lltop is not an exact tool and should be used with judgement. 5. Lltop-serv does not count pings because doing so tends to distort the statistics for large jobs. 6. As an optimization, if a client fails to geterate any load during the interval, then lltop-serv omits that client from its output. 7. Lltop keeps a cache of address to jobid mappings so that the hostname and jobid lookups are done at most once per client. 8. If your site runs multiple concurrent jobs on single hosts then it may be hard to adapt lltop. I welcome suggestions on how to handle this case.
I've been running LLTOP on a production Lustre system, and I noticed that every once in a while there are extremely high values appearing, I wonder if there's some counter which overflows or something alike. This is an example output, we run LLTOP with an interval of 50 seconds:
CLIENT-or-JOBID WR_MB RD_MB CLOSE XRENAME GETA GETXA LINK MKDIR MKNOD OPEN RENAME RMDIR SRENAME SETA SETXA STAT SYNC UNLINK OWNER NODES
1495209 138110882 54 256 0 1538 80928 0 0 192 283331 0 0 0 0 0 0 657 1009 someuser 62
1495165 111456 2 310 0 1732 0 0 0 0 310 0 0 0 0 0 0 1939 0 someuser 62
1495208 93836 3 260 0 1254 0 0 0 0 8540 0 0 0 0 0 0 722 0 someuser 52
This would mean that with the job in the first row, the user was writing at 138110882/50= 2,3TB/s ,which is not possible on out system.
Do you have any idea of why this is happening?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.