Code Monkey home page Code Monkey logo

mtt's Introduction

Copyright (c) 2004-2007 The Trustees of Indiana University and Indiana
                        University Research and Technology
                        Corporation.  All rights reserved.
Copyright (c) 2004-2005 The University of Tennessee and The University
                        of Tennessee Research Foundation.  All rights
                        reserved.
Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, 
                        University of Stuttgart.  All rights reserved.
Copyright (c) 2004-2005 The Regents of the University of California.
                        All rights reserved.
Copyright (c) 2006-2007 Cisco Systems, Inc.  All rights reserved.
Copyright (c) 2006-2007 Sun Microsystems, Inc.  All rights reserved.
Copyright (c) 2018      IBM Corporation.  All rights reserved.
Copyright (c) 2018      Intel, Inc.  All rights reserved.
$COPYRIGHT$

Additional copyrights may follow

This software includes code derived from software that is copyright
(c) 1996 Randal L. Schwartz, distributed under the Artistic License.
See the copyright and license notice in "mtt-relay" for details.

$HEADER$


What is this software?
----------------------

This is the MPI Testing Tool (MTT) software package.  It is a
standalone tool for testing the correctness and performance of
arbitrary MPI implementations.

The MTT is an attempt to create a single tool to download and build a
variety of different MPI implementations, and then compile and run any
number of test suites against each of the MPI installations, storing
the results in a back-end database that then becomes available for
historical data mining.  The test suites can be for both correctness
and performance analysis (e.g., tests such as nightly snapshot compile
results as well as the latency of MPI_SEND can be historically
archived with this tool).

The MTT provides the glue to obtain and install MPI installations
(e.g., download and compile/build source distributions such as nightly
snapshots, or copy/install binary distributions, or utilize an
already-existing MPI installation), and then obtain, compile, and run
the tests.  Results of each phase are submitted to a centralized
PostgresSQL database via HTTP/HTTPS.  Simply put, MTT is a common
infrastructure that can be distributed to many different sites in
order to run a common set of tests against a group of MPI
implementations that all feed into a common PostgresSQL database of
results.

The MTT client is written in Python; the MTT server side
is written almost entirely in PHP and relies on a back-end PostgresSQL
database.

The main (loose) requirements that we had for the MTT are:

- Use a back-end database / archival system.
- Ability to obtain arbitrary MPI implementations from a variety of
  sources (web/FTP download, filesystem copy, Subversion export,
  etc.).
- Ability to install the obtained MPI implementations, regardless of
  whether they are source or binary distributions.  For source
  distributions, include the ability to compile each MPI
  implementation in a variety of different ways (e.g., with different
  compilers and/or compile flags).
- Ability to obtain arbitrary test suites from a variety of sources
  (web/FTP download, filesystem copy, Subversion export, etc.).
- Ability to build each of the obtained test suites against each of
  the MPI implementation installations (e.g., for source MPI
  distributions, there may be more than one installation).
- Ability to run each of the built test suites in a variety of
  different ways (e.g, with a set of different run-time options).
- Ability to record the output from each of the steps above and
  submit securely them to a centralized database.
- Ability to run the entire test process in a completely automated
  fashion (e.g., via cron).
- Ability to run each of the steps above on physically different
  machines.  For example, some sites may require running the
  obtain/download steps on machines that have general internet access,
  running the compile/install steps on dedicated compile servers,
  running the MPI tests on dedicated parallel resources, and then
  running the final submit steps on machines that have general
  internet access.
- Use a component-based system (i.e., plugins) for the above steps so
  that extending the system to download (for example) a new MPI
  implementation is simply a matter of writing a new module with a
  well-defined interface.


How to cite this software
-------------------------
Hursey J., Mallove E., Squyres J.M., Lumsdaine A. (2007) An Extensible
Framework for Distributed Testing of MPI Implementations. In Recent
Advances in Parallel Virtual Machine and Message Passing Interface.
EuroPVM/MPI 2007. Lecture Notes in Computer Science, vol 4757. Springer,
Berlin, Heidelberg.
https://doi.org/10.1007/978-3-540-75416-9_15


Overview
--------

The MTT divides its execution into six phases:

1. MPI get: obtain MPI software package(s) (e.g., download, copy)
2. MPI install: install the MPI software package(s) obtained in phase 1.
   This may involve a binary installation or a build from source.
3. Test get: obtain MPI test(s)
4. Test build: build the test(s) against all MPI installations
   installed in phase 2.
5. Test run: run all the tests build in phase 4.
6. Report: report the results of phases 2, 4, and 5.

The phases are divided in order to allow a multiplicative effect.  For
example, each MPI package obtained in phase 1 may be installed in
multiple different ways in phase 2.  Tests that are built in phase 4
may be run multiple different ways in phase 5.  And so on.

This multiplicative effect allows testing many different code paths
through MPI even with a small number of actual tests.  For example,
the Open MPI Project uses the MTT for nightly regression testing.
Even with only several hundred MPI test source codes, Open MPI is
tested against a variety of different compilers, networks, number of
processes, and other run-time tunable options.  A typical night of
testing yields around 150,000 Open MPI tests.


Quick start
-----------

Testers run the MTT client on their systems to do all the work.  A
configuration file is used to specify which MPI implementations to use
and which tests to run.  

The Open MPI Project uses MTT for nightly regression testing.  A
sample Perl client configuration file is included in
samples/perl/ompi-core-template.ini.  This template will require
customization for each site's specific requirements.  It is also
suitable as an example for organizations outside of the Open MPI
Project.

Open MPI members should visit the MTT wiki for instructions on how to
setup for nightly regression testing:

    https://github.com/open-mpi/mtt/wiki/OMPITesting


Note that the INI file can be used to specify web proxies if
necessary.  See comments in the ompi-core-template.ini file for
details.


Running the MTT Perl client
---------------------------

Having run the MTT client across several organizations within the Open
MPI Project for quite a while, we have learned that even with common
goals (such as Open MPI nightly regression testing), MTT tends to get
used quite differently at each site where it is used.  The
command-line client was designed to allow a high degree of flexibility
for site-specific requirements.

The MTT client has many command line options; see the following for a
full list:

$ client/mtt --help

Some sites add an upper layer of logic/scripting above the invocation
of the MTT client.  For example, some sites run the MTT on
SLURM-maintained clusters.  A variety of compilers are tested,
yielding multiple unique (MPI get, MPI install, Test get, Test build)
tuples.  Each tuple is run in its own 1-node SLURM allocation,
allowing the many installations/builds to run in parallel.  When the
install/build tuple has completed, more SLURM jobs are queued for each
desired number of nodes/processes to test.  These jobs all execute in
parallel (pending resource availability) in order to achieve maximum
utilization of the testing cluster.

Other scenarios are also possible; the above is simply one way to use
the MTT.


Current status
--------------

This tool was initially developed by the Open MPI team for nightly and
periodic compile and regression testing.  However, enough other
parties have expressed [significant] interest that we have open-sourced
the tool and are eagerly accepting input from others.  Indeed, having
a common tool to help objectively evaluate MPI implementations may be
an enormous help to the High Performance Computing (HPC) community at
large.

We have no illusions of MTT becoming the be-all/end-all tool for
testing software -- we do want to keep it somewhat focused on the
needs and requires of testing MPI implementations.  As such, the usage
flow is somewhat structured towards that bias.

It should be noted that the software has been mostly developed internally
to the Open MPI project and will likely experience some growing pains
while adjusting to a larger community.


License
-------

Because we want MTT to be a valuable resource to the entire HPC
community, the MTT uses the new BSD license -- see the LICENSE file in
the MTT distribution for details.


Get involved
------------

We *want* your feedback.  We *want* you to get involved.

The main web site for the MTT is:

    http://www.open-mpi.org/projects/mtt/

User-level questions and comments should generally be sent to the
user's mailing list ([email protected]).  Because of spam, only
subscribers are allowed to post to this list (ensure that you
subscribe with and post from *exactly* the same e-mail address --
[email protected] is considered different than
[email protected]!).  Visit this page to subscribe to the
user's list:

     https://lists.open-mpi.org/mailman/listinfo/mtt-users

Developer-level bug reports, questions, and comments should generally
be sent to the developer's mailing list ([email protected]).
Please do not post the same question to both lists.  As with the
user's list, only subscribers are allowed to post to the developer's
list.  Visit the following web page to subscribe:

     https://lists.open-mpi.org/mailman/listinfo/mtt-devel
     http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel

When submitting bug reports to either list, be sure to include as much
extra information as possible.

Thanks for your time.

mtt's People

Contributors

ajagann avatar amaslenn avatar arsetion avatar atulkulk avatar beardeddog avatar bwbarrett avatar debrez avatar devreal avatar euloh avatar ggouaillardet avatar gvallee avatar hpcraink avatar hppritcha avatar igor-ivanov avatar jjhursey avatar jsquyres avatar karasevb avatar kojiwell avatar lonnystrunk avatar mike-dubman avatar mohanasudhan avatar noahv avatar ompiteam avatar petergottesman avatar rhc54 avatar ribab avatar rohlem avatar sunkuamzn avatar wenduwan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtt's Issues

Users should be able to delete results from database

It would be good for HTTP users to be able to delete some or all of their results from the database (not from the MTT client, but probably from a web page). For example, if a user screws up and submits a bad batch of results (e.g., a compiler license expired, so it falsely reported compile failures), it would be good if the user had a relatively simple method of being able to delete those results from the database rather than skew the results and reports in the database.

fix LD_LIBRARY_PATH for "make check"

In the file lib/MTT/MPI/Install/OMPI.pm MTT before running the check deletes the LD_BIRARY_PATH to avoid any problems with other librarires. I run into the problem that is mentioned in the comment with the compiler that needs libs of the LD_LIBRARY_PATH.

I think it should be possible to avoid the deletion of the LD_LIBRARY_PATH and and the problems with other libraries. If we simple prepend the MTT paths to the LD_LIBRARY_PATH then it is supposed to work, because in this case the MTT libs are always infront of all the others libs in the LD_LIBRARY_PATH.

HTTP auth is not working properly

When using .htaccess to protect the submit directory, the MTT client fails to submit properly, even though it seems to have the correct HTTP username/password in the ini file. The MTTDatabase reporter outputs messages similar to the following:

{{{

Failed to report to MTTDatabase: 401 Authorization Required

<title>401 Authorization Required</title>

Authorization Required

This server could not verify that you are authorized to access the document requested. Either you supplied the wrong credentials (e.g., bad password), or your browser doesn't understand how to supply the credentials required.


Apache/2.0.52 (Red Hat) Server at www.open-mpi.org Port 443 }}}

DB Cleanup "iu-odin"

As a point of cleanup, can you remove all entries for "iu-odin"? These were a bunch of getting the environment setup correctly for MTT runs, most of which failed.

Keep the entries for "Odin at IU - Testing" for the moment, as that is what the current version of the script will now report.

This is nothing major, just a bit of cleanup I wanted to note.

Track which resource manager is used for runs

It would be good for MTT to track which resource manager is used for test runs.

This is a little complicated, however, because it is possible for the MPI Details section to override which RM is used (e.g., to explicitly test, say, the native RM and rsh). For example:

{{{
[MPI Details: foo]
exec = mpirun --mca pls fork,&enumerate("rsh", "slurm") ....
}}}

So we'd somehow need to track which RM is used ''for each test run result.'''

ompi-core-template.ini: MPI cleanup not run on all nodes

The killall in the after_each_exec of the MPI Details section only runs on the node where mpirun was invoked (duh). It does not spread to all the other nodes where MPI was running.

Need to figure out how to make that go across all nodes.

Cut down on MTT perl module requirements

Rainer mentioned that we're requiring a bunch of Perl modules that aren't necessarily installed by default on some older machines (e.g., his). He installed them to make it work, but it might be nice if we can cut down on the number of requirements -- particularly when running on MTT on parallel compute nodes, where perl installs are likely to be minimal (i.e,. all we need to do is run thests and dump output to files there; no need for fancy downloading perl modules, etc.). From a mail from Rainer:

It seems that quite a few packets are required to build the ParallelUserAgent-2.56:

  • libwww-perl-5.803 /* which is considered to be too new */
    • depends on Compress-Zlib-2.000_05
  • URI-1.35
  • HTML-Parser
    • depends on HTML-Tagset-3.10

configure cannot find compiler libraries

When using the Intel compiler the condfigure breaks because it cannot executed the compiled executeable. Looks like that something bad happens to LD_LIBRARY_PATH because the lib directory is not in the default path. Configure called "by hand" works fine.

{{{
configure:4154: $? = 0
configure:4177: checking for C compiler default output file name
configure:4180: icc conftest.c >&5
configure:4183: $? = 0
configure:4229: result: a.out
configure:4234: checking whether the C compiler works
configure:4240: ./a.out
./a.out: error while loading shared libraries: libimf.so: cannot open shared object file: No such file or directory
configure:4243: $? = 127
configure:4252: error: cannot run C compiled programs.
If you meant to cross compile, use --host'. Seeconfig.log' for more details.
}}}

summary.php: test run failure rollup values don't match

The rollup values for how many test runs failed don't seem to match in the output from summary.php. I have attached an html snapshot of summary.php from right now. There are 5 rows in the executive summary table; they show 11 / 1 / 1 / 1 / 1 test run failures, respectively.

Similarly, the Cluster Sumary table has 6 rows, showing 11 / 0 / 1 / 1 / 1 / 1 test run failures, respectively.

However, in the Test Suites summary, it shows numbers much larger than 11 and 1 (E.g., 14, 3, 468, etc.).

Am I reading these numbers wrong? Are some of these "normalized"? If so, it would be good to notate that on the column head, and describe what "normalized" means.

HLRS machines not recognized by whatami

The output reported by "whatami" on the cacau cluster is "linux-unknown_please_send_us_a_patch-x86_64".

We need to fix this (and send a patch to the whatami guys).

No submit to database

after updating to r245 I have the problem that the MTT doesn't (even try) to
submit the results to the database. In the older version (r231) MTT at least
tried to send the results but failed with an error.
{{{
*** Reporter initializing
Got hostname: noco084.nec
Found whatami: /home/HLRS/hlrs/hpcstork/mtt/client/whatami/whatami
Evaluating: MTTDatabase

Initializing reporter module: MTTDatabase
Evaluating: require MTT::Reporter::MTTDatabase
Evaluating: $ret = &MTT::Reporter::MTTDatabase::Init(@Args)
Evaluating: hlrs
Evaluating: hlrsompi
Evaluating: https://localhost:4323/mtt/submit/
Evaluating: OMPI
Evaluating: Cacau at HLRS
Evaluating: TextFile
Initializing reporter module: TextFile
Evaluating: require MTT::Reporter::TextFile
Evaluating: $ret = &MTT::Reporter::TextFile::Init(@Args)
Evaluating: cacau-$phase-$section-$mpi_name-$mpi_version.txt
Evaluating:

----------------------------------------------------------<<<<
File reporter initialized
(/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-$
phase-$section-$mpi_name-$mpi_version.txt)
*** Reporter initialized

...

Command complete, exit status: 0
Evaluating: require MTT::Reporter::TextFile
Evaluating: $ret = &MTT::Reporter::TextFile::Submit(@Args)
File reporter
Writing to text file:
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt

Reported to text file
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Writing to text file:
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Writing to text file:
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Writing to text file:
/mscratch/ws/hpcstork-mtt-run-2006-08-28--10-40-07---hlrs-gcc-0/cacau-Te
st_Run-trivial-ompi-nightly-trunk-1.3a1r11451.txt
Test run [test run: intel]
Evaluating: intel
Found a match! intel [intelEvaluating: Simple

}}}

Extra "," in mpirun

HLRS is getting an extra "," in their mpirun command lines, preventing tests from being run. From a mail from Sven:


I configure ompi with TM. I'm using r229 and the tests are not executed. The
output of MTT is show below. Do you have an idea where the additional comma
after the "-np 4" comes from ?

{{{
String now: mpirun -np &test_np() --prefix &test_prefix()
&test_executable() &test_argv()
Got name: test_np
Got args:
_do: $ret = MTT::Values::Functions::test_np()
&test_np returning: 4,
String now: mpirun -np 4, --prefix &test_prefix() &test_executable()
&test_argv()
Got name: test_prefix
Got args:
_do: $ret = MTT::Values::Functions::test_prefix()
&test_prefix returning:
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
String now: mpirun -np 4, --prefix
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
&test_executable() &test_argv()
Got name: test_executable
Got args:
_do: $ret = MTT::Values::Functions::test_executable()
&test_executable returning: src/MPI_Allreduce_loc_f
String now: mpirun -np 4, --prefix
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
src/MPI_Allreduce_loc_f &test_argv()
Got name: test_argv
Got args:
_do: $ret = MTT::Values::Functions::test_argv()
&test_params returning
String now: mpirun -np 4, --prefix
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
src/MPI_Allreduce_loc_f
Evaluating: &max(30, &multiply(10, &test_np()))
Got name: test_np
Got args:
_do: $ret = MTT::Values::Functions::test_np()
&test_np returning: 4,
String now: &max(30, &multiply(10, 4,))
Got name: multiply
Got args: 10, 4,
_do: $ret = MTT::Values::Functions::multiply(10, 4,)
&multiply got: 10 4
&multiply returning: 40
String now: &max(30, 40)
Got name: max
Got args: 30, 40
_do: $ret = MTT::Values::Functions::max(30, 40)
&max got: 30 40
&max returning: 40
String now: 40
Evaluating:
Running command: mpirun -np 4, --prefix
/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/installs
/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install
src/MPI_Allreduce_loc_f
Timeout: 1 - 1156505332 (vs. now: 1156505292)
OUT:-----------------------------------------------------------------------


OUT:Could not execute the executable
"/mscratch/ws/hpcstork-mtt-run-2006-08-25--10-44-20---hlrs-gcc-0/install
s/ompi-nightly-v1.2/cacau_gcc_warnings/1.2a1r11420/install/bin/":
Permission denied
OUT:
OUT:This could mean that your PATH or executable name is wrong, or that you
do not
OUT:have the necessary permissions. Please ensure that the executable is
able to be
OUT:found and executed.
OUT:-----------------------------------------------------------------------


}}}

Many PHP warnings

I am seeing many PHP warnings in the web server logs, indicating problems with summary.php. Here's a snipit from the web logs (I am trying to get the IU admins to make these available to us in real-time; right now, you have to ask for them because the files are not readable by our logins) -- I'll attach the entire log that I have that shows all the problems:

{{{
[client 64.102.254.33] PHP Notice: Undefined index: debug in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 90, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: db in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 338, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined variable: argv in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 339, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: level in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 351, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: verbose in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 368, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: go in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 375, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: go in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 381, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 522, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 541, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 562, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 583, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 583, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 583, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 0 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 658, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 1 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 541, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined offset: 1 in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 583, referer: http://www.open-mpi.org/mtt/
[client 64.102.254.33] PHP Notice: Undefined index: verbose in /nfs/magrathea/home/user2/osl/www/www.open-mpi.org/mtt/summary.php on line 585, referer: http://www.open-mpi.org/mtt/
}}}

Add funclets that give access to stdout/stderr from test run

Sun may have some tests that require checking stdout / stderr to see if a test passed. So we need to provide funclets that give access to the stdout / stderr of a test run, and probably some simple string checking funclets (e.g., &grep(), &regexp(), ...).

This ticket is conditional; talk to Sun to see if it's worthwhile before implementing.

Make MTTDatabase submit more efficient

The MTTDatabase submit method can be made more efficient.

For example, a single run of the IBM test suite for 2 values of np (each with 1 variant), generates 362 results. This currently requires '''362 separate HTTP connections''', each of which averages around 2k of data transfer (combined send and receive). This is approximately 3/4 MB total transfer. It also takes '''several minutes''' to complete (submitting from a test cluster at Cisco).

I'm not so concerned about the total number of bytes transferred, but it could be significantly reduced. The MTT client currently sends a lot of repeated data for ''each result.'' The most obvious changes that I'm thinking of are:

  • Pack all the data into a single message that can be sent in a single connection
  • All the data that is currently repeated would therefore only need to be sent once (e.g., at the beginning of the message)
  • The server can marshal this data and generate 1 SQL INSERT for all the results

Both the client and the server would need to be modified to make this happen. It would probably make the whole process significantly more efficient in the following ways:

  • Only 1 HTTP connection
  • Significantly reduce the amount of data sent from the client to the server
  • Potentially make the database insert more efficient (1 INSERT vs. 362)

The cleanup command doesn't can't be executed

By looking the mtt output I saw that the cleanup script cannot be executed (see below). I assume that a "real" command is required and not a shell script.

{{{
Timeout: 1 - 1156432342 (vs. now: 1156432332)
OUT:Can't execute command:
OUT:# This scriptlet ensures that all remnants of the prior mpirun are
OUT:# gone. It kills all orteds running under this user and whacks any
OUT:# session directories that it finds. Hence, do not expect to be able
OUT:# to run on the same machine/user as a user who is running MTT tests.
OUT:
OUT:# This scriptlet is not fully tested yet. Needs testing on: Linux,
OUT:# OSX, Solaris.
OUT:
OUT:who=whoami
OUT:which killall > /dev/null 2> /dev/null
OUT:if test "$?" = "0"; then
OUT: # If we have killall, it's easy.
OUT: killall -9 orted
OUT:else
OUT: # We're on an OS without killall. Which variant of ps do we have?
OUT: ps auxw > /dev/null 2> /dev/null
OUT: if test "$?" = "0"; then
OUT: ps_args="auxww"
OUT: else
OUT: ps_args="-eadf"
OUT: fi
OUT: pids=ps $ps_args | grep $who | grep -v grep | grep orted awk '{ print $2 }'
OUT: if test "$pids" != ""; then
OUT: kill -9 $pids
OUT: fi
OUT:fi
OUT:
OUT:# Whack any remaining session directories. This is a workaround for
OUT:# current bugs in OMPI.
OUT:rm -rf /tmp/openmpi-sessions-${who}*
OUT:
Command complete, exit status: 512
}}}

If LD_LIBRARY_PATH not set, MTT seems to fail

Per Josh's comments on the MTT users list, if LD_LIBRARY_PATH is not initially set to ''something'' (even if it's blank), MTT runs of MPI tests will hang. Josh confirmed this by not having LD_LIBRARY_PATH set and seeing the hanging behavior. Then he set it to "" and the hanging behavior went away.

The relevant code in MTT is in lib/MTT/Test/Run.pm:
{{{
if ($mpi_install->{libdir}) {
if (exists($ENV{LD_LIBRARY_PATH})) {
$ENV{LD_LIBRARY_PATH} = "$mpi_install->{libdir}:" .
$ENV{LD_LIBRARY_PATH};
} else {
$ENV{LD_LIBRARY_PATH} = $mpi_install->{libdir};
}
}
}}}

So it ''looks'' like this should be handled correctly (but apparently is not). Will try to replicate this myself and dig into what is going on...

Add support for performance testing and historical data

Allow MTT to run performance tests and save the results in a historical database. For example, run NetPIPE and save the data over time. Be able to report the NetPIPE data in graphical form where relevant (e.g., look at the NetPIPE data for a given BTL from a given cluster over arbitrary time periods).

Should have support for at least the following test suites:

  • Intel benchmarks (used to be Pallas benchmarks)
  • NetPIPE

Probably want to add support for more over time, such as:

  • Presta
  • ...?

submit/index.php should INSERT data despite finding an unknown field(s)

Josh Hursey noticed that there are ''no'' test run results behind shown on summary.php.

We know that there are valid test run data in the db (e.g., he submitted some last night), but they aren't showing up on summary.php.

Could this be due to some mucking around that I did in summary.php?

Test submit connections during MTT client initialization

Mutliple users have been burned by running through all the tests but then failing to submit properly because of some kind of issue (e.g., not having SSL perl support, typing the URL wrong, etc.).

We should have a test submit URL that the mtt client can test connecting during its init phase and try connecting to the submit URL. If it fails to connect properly, we can abort right away in the beginning and not waste potentially hours of compute time before realizing that there's an error.

This is simple to implement in the MTTDatabase reporter; we just need to be sure that submit.php can safely handle HTTP GET connections with no data (which I think it already can, but want to be sure).

So this ticket represents two things:

  • ensure submit.php can safely handle GET connections
  • add functionality to MTTDatabase reporter to test a connection during its init phase and abort if it fails to connect

summary.php should use reporter.php as a backend

summary.php is basically a one-size-fits-all version of reporter.php. reporter.php should be used as a backend for summary.php such that patches applied to reporter.php will effectively be applied to both scripts.

Add new result types to Test Run

I added a new field in the Test Run report named "timed_out". This field is now sent to the MTT database via the MTTDatabase reporter. It's a logical value and will always be either 0 or 1.

This field indicates whether a test timed out or not (different than failing). The timeout in some OS's is somewhat fuzzy, so it's possible for a test to actually go [slightly] over its timeout value and still pass. Hence, this flag specifically indicates whether a test was killed because it had timed out.

More specifically:

  • (pass=1, timed_out=0): the test passed
  • (pass=1, timed_out=1): will never happen
  • (pass=0, timed_out=0): the test failed (i.e., it failed its "pass" criteria)
  • (pass=0, timed_out=1): the test timed out and was killed

The server side needs to now accept this flag and enter it into the data, and the reports need to be adjusted accordingly.

Need N1GE RM support

Need to make the appropriate extension to mtt to be able to use the N1GE RM to run tests.

Optimize mtt database for disk usage

The mtt database repeats many character strings thousands of times. For columns that contain such strings, a separate table should be created to index into from the main table. E.g., an entry that currently looks like:

||'''hostname''' ||'''test_name''' ||'''result''' ||
||somehost.com ||hello ||1 ||

Will instead look like:

||'''hostname''' ||'''test_name''' ||'''result''' ||
||index1 ||index2 ||1 ||

Where {{{hostname}}} and {{{test_name}}} tables exist that contain the following entries:

||'''index''' ||'''hostname''' ||
||index1 ||somehost.com ||
{{{}}}
||'''index''' ||'''test_name''' ||
||index2 ||hello ||

Will this significantly degrade performance?

Implement the Trim phase

The "trim" phase needs to be completed so that scratch directories do not grow out of control after running for a while.

Implement "Test Specify" phase

From the MTT developer's conference notes:

Implement 'test specify' phase - replaces current test run INI stuff

  • Should generate a list of tuples, which is:
    • exec - binary to execute
    • argv
    • np
    • pass - success return code
    • timeout
    • before_any
    • before_each
    • after_each
    • after_all
  • Test run phase should then accept this list
  • INI file looks like this:
    {{{
    [test_build: intel]
    test_get = intel
    module = intel
    intel_buildfile = coll

[test specify: intel]
test_build = intel
module = intel
}}}
The Test Run phase then becomes an engine that simply takes the output of the Test Specify phase (which is kinda how the code is currently organized anyway, but the name "Test Run" implies that the modules for this phase have more control than they really do).

Test Get phase needs versioning

The Test Get phase needs some kind of versioning, just like the MPI Get phase.

Without versioning, there is no way to know if there are new versions of tests that need to be downloaded/run (even if the MPI version has not changed).

Add capability to count "skipped" tests

Some tests are deliberately skipped (e.g., not enough/not the right number of processes to run the test) and should not be counted as "passed" or "failed" -- instead, there should be a new category called "skipped".

For the moment, the ompi-core-template.ini file -- at least in the IBM test run section -- checks for status 77 from a test and marks that as a "pass" (tests return status 77 when they want to be skipped; a prededent established by the GNU coding standards).

Send e-mail every morning of previous 24 hours' compile/test failures

Send a mail around providing executive summary of the previous day/night/24 hours/whatever failures (failed compiles, failed test runs, etc.). This mail should have some simple requirements:

  • Subject line indicates the failures (compiles, tests). Mail should either not be sent or indicate in the subject line if everything passed (!).
  • Mail should not be sent if there is no new information to share (e.g., no tests have been run in the past day/night/24 hours/whatever).
  • Contain HTTP links for more information. The links should be fixed such that if I bring up a report mail from 3 days ago and click on its links, I'll see the web reports from 3 days ago (not the most current reports).
  • Be "one page" or less of information (everyone's screen size is different, hence "one page" is in quotes -- the idea is to have just enough information in the mail to get a developer interested to click through to the real data)

This is a first cut at the requirements. Feel free to add/delete/edit.

More fine-grained MPI Details control

The current MPI details scheme might not be flexible enough for all scenarios. Here's one scenario that it does not do well. It's not an urgent problem, but it might be good to make MPI details be fleible enough to handle this kind of scenario:

  • cluster of 32 4-way SMPs
  • want to test several BTLs, including "sm"
  • but "sm" cannot be tested by itself except when we are running one one node

For example, the following MPI details definition, when spanning multiple nodes, will not work because multi-node jobs will be launched with "--mca btl self,sm":

{{{
[MPI Details: Open MPI]
exec = mpirun -np &test_np() --prefix &test_prefix() --mca btl self,@btl@ &test_executable() &test_argv()
btl = &enumerate("tcp", "sm")
}}}

Instead, it seems like we want to make the value of @btl@ be a bit more conditional -- in this case, we want it to be dependent upon how many nodes (''not'' the value of np!) the job will run across.

Print timing information at end of an mtt client run

It would be useful to print some basic timing information at the end of an mtt client run (e.g., start/stop/elapsed time of each phase) upon demand (e.g., --print-times, or somesuch). This will be helpful in determining how long a particular ini file takes to run, and can help with planning purposes for how much to test, how frequently, etc.

Need to properly handle tests that are supposed to fail

Although MTT allows the arbitrary definition of "pass" criteria, we have some large test suites where a small number of the tests are supposed to fail (e.g., IBM and Intel). I.e., most of them "pass" by having an exit status of 0, but some of them pass by having a non-zero exit status (e.g., testing MPI_ABORT).

Particularly when we find the test executables via &find_executables() (which finds ''all'' test executables -- both the ones that are supposed to pass and the ones that are supposed to fail), it's hard to have a global set of pass criteria for all of them. So a better scheme needs to be implemented to allow this kind of flexibility. Some ideas:

  • Add a functlet that takes the output of &find_executables() and remove a list of names from it. Perhaps something like the following, which finds all the IBM test executables and then excludes those with a base filename of "abort" or "already_finalized" (this allows multiple INI sections with different pass criteria):
    {{{
    simple_tests = &exclude_filename(&find_executables("collective", "communicator", "datatype",
    "dynamic", "environment", "group", "info",
    "io", "onesided", "pt2pt", "topology"), "abort", "already_finalized")
    }}}
  • Add more fields to the Simple module that allow excluding executables, similar to the &exclude_filename() functlet, above. This allows multiple INI sections with different pass criteria.
  • Allow Simple to allow the specification of multiple (np, pass, executables, ...) tuples. This may require more extensive changes to the TestRun infrastructure to be more in-line with what was discussed at the IU/LANL MTT meeting long long ago (i.e., allowing each test to specify its own np, pass criteria, etc.).

Save HTTP username with results

Now that we're using proper HTTP/Basic authentication to protect submitting MTT results, the HTTP username (and IP address?) should be stored with an incoming set of data in the database.

Have sever-side INI files with opt-out controls

Make the ability to have a centralized INI file with a global set of configurations to test that apply to a set of users (E.g., the OMPI core testers). This allows standardization of the set of tests that are run, etc.

Need to provide "opt-out" capabilities from the centralized INI file -- for example, the centralized INI file may list the trunk and all the release branches for OMPI (e.g., trunk, 1.0, 1.1, 1.2). But Sun only cares about the trunk and 1.2, so they should be able to opt-out of the 1.1 and 1.0 tests.

Additionally, each MTT site will need to be able to customize some fields, such as which compilers to use, etc.

If interrupted, some results don't get reported

MTT was designed to be able to be interrupted; if you re-start MTT with the same command line arguments and ini file and nothing has changed on the server side (e.g., no new version of MPI or version of tests), MTT should resume where it left off.

However, in some cases, results for all the tests won't be reported. For example, if you interrupt MTT in the middle of a long intel test run, although MTT has all the meta data for the tests that have already been run (and will properly resume where it left off if you restart MTT), it will only report the results of the tests that it executed during the current run. That is, the results of the tests of the previous run are not reported back to the database.

Make Intel tests run with the "wrong" number of processes return exit status 77

This was already done for the IBM test suite.

The idea is to have tests that require a specific number of processes to be tolerant of when they are not run with the right number. Hence, if the test needs 6 processes and it is run with 4 (or 8 or 3 or ...), it should shut down in an orderly fashion (MPI_FINALIZE), and exit with a status of 77 indicating that the test was skipped.

The value of 77 was taken from the GNU coding standards.

Add support for "disconnected" scenarios

Add support for MTT users who are behind firewalls or otherwise not directly connected to the internet. Specifically, allow scenarios like:

  • MPI/Test get phases need to be run on a machine connected to the internet
  • The results of these gets need to be proxied to back-end machines
  • Builds and installs occur on one set of back-end machines (e.g., compile nodes)
  • Test runs occur on a different set of back-end machines (e.g., cluster/compute nodes)
  • Result data need to be proxied back to the internet-connected machine
  • Results are then submitted via the internet

SVN "get" back-end functionality does not correctly detect "no new sources"

The back-end SVN "get" functionality currently always thinks that it has found new sources, even when it has not, in fact, obtained anything new.

This is repeatable by specifying a Test Get with an SVN checkout -- it will get a new version every time even if the SVN repository with the test has not changed at all.

Need to separate hostname from platform ID in results

The Cluster Summary table in summary.php currently combines the hostname where the results were submitted with the platform ID from the ini file. This is misleading in cases where MTT users are running on clusters with schedulers, meaning that they don't always run (and therefore submit) from the same host.

Case in point is HLRS who runs on some flavor of a PBS cluster (cacau). Right now, summary.php is showing a different entry in the Cluster Summary table for every run that they've done, when, in fact, they're all really from the same cluster (cacau@HLRS).

Hence, the Cluster Summary table should roll up all results from the same cluster, regardless of what node they were run on.

Sometimes the MPI version number is blank

This is something that Ethan reported last week and I thought I had fixed it. Blah!

Sometimes the MPI version number comes up either blank or has a string in it. For example, in sumary.php, I'm currently seeing some bad version number for the tests that I just ran on the 1.1.1rc2 tarball:

{{{
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 gnu 1 0 0 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 ibm 0 0 1 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 imb 0 0 1 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 intel 0 0 1 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 trivial 0 0 1 0 0 0
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 mtt_version_major: 0 intel 0 0 0 0 0 88
x86_64 linux-rhel4_AS-x86_64 Cisco MPI development cluster
svbu-mpi1.cisco.com ompi-rc-v1.1 2006-08-25 15:59:45 mtt_version_major: 0 trivial 0 0 0 0 0 4
}}}

I also see the following in Test Build output:

{{{

Test build [test build: trivial]
Already have a build for [ompi-rc-v1.1] / [] / [gnu] / [trivial]
}}}

So I think there's another place in the code that isn't doing the MPI version number properly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.