linkedinattic / zopkio Goto Github PK

View Code? Open in Web Editor NEW

160.0 28.0 36.0 1.33 MB

A Functional and Performance Test Framework for Distributed Systems

License: Apache License 2.0

Python 82.66% Java 4.27% HTML 10.83% JavaScript 0.60% CSS 1.63%

zopkio's Introduction

Zopkio - A Functional and Performance Test Framework for Distributed Systems

Zopkio is a test framework built to support at scale performance and functional testing.

Installation

Zopkio is distributed via pip

To install::: (sudo) pip install zopkio

If you want to work with the latest code:

git clone [email protected]:linkedin/zopkio.git
cd zopkio

Once you have downloaded the code you can run the zopkio unit tests:

python setup.py test

Or you can install zopkio and run the sample test:

(sudo) python setup.py install
zopkio examples/server_client/server_client.py

N.B the example code assumes you can ssh into your own box using your ssh keys so if your are having issues with the tests failing check your authorized_keys.

In the past there have been issues installing one of our dependencies (Naarad) if you encounter errors installing naarad see https://github.com/linkedin/naarad/wiki/Installation

Basic usage

Use the zopkio main script:

zopkio testfile

Zopkio takes several optional arguments:

--test-only [TEST_LIST [TEST_LIST ...]]
                      run only the named tests to help debug broken tests
--machine-list [MACHINE_LIST [MACHINE_LIST ...]]
                      mapping of logical host names to physical names
                      allowing the same test suite to run on different
                      hardware, each argument is a pair of logical name and
                      physical name separated by a =
--config-overrides [CONFIG_OVERRIDES [CONFIG_OVERRIDES ...]]
                      config overrides at execution time, each argument is a
                      config with its value separated by a =. This has the
                      highest priority of all configs
-d OUTPUT_DIR, --output-dir OUTPUT_DIR
                      Specify the output directory for logs and test results.
                      By default, Zopkio will write to the current directory.
--log-level LOG_LEVEL
                    Log level (default INFO)
--console-log-level CONSOLE_LEVEL
                      Console Log level (default ERROR)
--nopassword          Disable password prompt
--user USER           user to run the test as (defaults to current user)

Testing with Zopkio

Zopkio provides the ability to write tests that combine performance and functional testing across a distributed service or services. Writing tests using Zopkio should be nearly as simple as writing tests in xUnit or Nose etc. A test suite will consist of a single file specifying four required pieces:

A deployment file
One or more test files
A dynamic configuration file
A config directory

For simplicity in the first iteratation this is assumed to be json or a python file with a dictionary called test.

Deployment

The deployment file should be pointed to by an entry in the dictionary called deployment_code. Deplyoment is one of the key features of Zopkio. Developers can write test in which they bring up arbtrary sets of services on multiple machines and then within the tests exercise a considerable degree of control over these machines. The deployment section of code will be similar to deployment in other test frameworks but because of the increased complexity and the expectation of reuse across multiple test suites, it can be broken into its own file.

A deployment file can contain four functions:

setup_suite
setup
teardown
teardown_suite

As in other test frameworks, setup_suite will run before any of tests, setup will run before each test, teardown will run if setup ran successfully regardless of the test status, and teardown_suite will run if setup_suite ran successfully regardless of any other conditions. The main distinction in the case of this framework will be in the extended libraries to support deployment.

In many cases the main task of the deployment code is creating a Deployer. This can be done using the SSHDeployer provided by the framework or through custom code. For more information about deployers see the APIs. The runtime module provides a helpful set_deployer(service_name) and get_deployer(service_name). In addition to allowing the deployers to be easily shared across functions and modules, using these functions will allow the framework to automatically handle certain tasks such as copying logs from the remote hosts. Once the deployer is created it can be used in both the setup and teardown functions to start and stop the services.

Since the setup and teardown functions run before and after each test a typical use is to restore the state of the system between tests to prevent tests from leaking bugs into other tests. If the setup or teardown fails we will skip the test and mark it as a failure. In an effort to avoid wasting time with a corrupted stack there is a configuration max_failures_per_suite_before_abort which can be set to determine how many times the frameworke will skip tests before autmatically skipping the remaining tests in that suite.

In addition the entire suite is rerun parameterized by the configurations (See configs) there is a second config max_suite_failures_before_abort which behaves similarly.

Test Files

Test files are specified by an entry in the test dictionary called test_code, which should point to a list of test files. For each test file, the framework will execute any function with test in the name (no matter the case) and track if the function executes successfully. In addition if there is a function test_foo and a function validate_foo, after all cleanup and log collection is done, if test_foo executed successfully then validate_foo will be executed and tested for successful execution if it fails, the original test will fail and the logs from the post execution will be displayed. Test can be run in either a parallel mode or a serial mode. By default tests are run serially without any specified order. However each test file may specify an attribute test_phase. A test_phase of -1 is equivalent to serial testing. Otherwise all tests with the same test_phase will be run in parallel together. Phases proceed in ascending order.

Dynamic Configuration File

The dynamic configuration component may be specified as either dynamic_configuration_code or perf_code. This module contains a number of configurations that can be used during the running of the tests to provide inputs for the test runner. The required elements are a function to return Naarad configs, and functions to return the locations of the logs to fetch from the remote hosts. There are also several configs which can be placed either in this module as attributes or in the Master config file. The main focus of this module is support for Naarad. The output of the load generation can be any format supported by Naarad including JMeter and CSV. The performacnce file can also contain rules for Naarad to use to pass/fail the general performance of a run (beyond rules specific to individual tests). To get the most from Naarad, a Naarad config file can be provided (see https://github.com/linkedin/naarad/blob/master/README.md section Usage). In order to have Naarad support the module should provide a function naarad_config(). There are also two functons machine_logs() and naarad_logs() that should return dictionaries from unique_ids to the list of logs to collect. Machine logs are the set of logs that should not be processed by naarad.

Configs

Being able to test with different configurations is extremely important. The framework distinguishes between three types of configs:

master config

test configs

application configs

Master configs are properties which affect the way zopkio operates. Current properties that are supported include:

max_suite_failures_before_abort

max_failures_per_suite_before_abort

LOGS_DIRECTORY

OUTPUT_DIRECTORY

Test configs are properties which affect how the tests are run. They are specific to the tests test writer and accessible from runtime.get_config(config_name) which will return the stored value or the empty string if no property with that name is present. These are the properties that can be overrode by the config-overrides command line flag. some of the test configs that zopkio recognizes are:

loop_all_tests

show_all_iterations

verify_after_each_test

'loop_all_tests' repeats the entire test suite for that config for the specified number of times 'show_all_iterations' shows the result in test page for each iteration of the test. 'verify_after_each_test' forces the validation before moving onto the next test

Application configs are properties which affect how the remote services are configured. There is not currently an official way to copy these configs to remote hosts separately from the code, although there are several utilities to support it .

In order to allow the same tests to run over multiple configurations, the framework interprets configs accoriding to the following rules. All configs are grouped under a single folder. If this folder contains at least one subfolder, then the config files at the top level are considered defaults and for each subfolder of the top folder, the entire test suite will be run using the configs within that folder (plus the defaults and config overrides). This is the case in which max_suite_failures_before_abort will be considered. Otherwise the suite will be run once with the top level config files and overrides.

Example Tests

command : zopkio examples/server_client/server_client.py

Runs bunch of tests with multiple clients and servers deployed

command : zopkio examples/server_client/single_server_multipleiter_inorder.py --nopassword

The individual tests have the TEST_PHASE set to be 1,2,3 respectively. This enforces order.
To run multiple iterations set loop_all_tests to be <value> in config.json file
To validate each run of the test before moving to next one set verify_after_each_test in configs
To show the pass/fail for each iteration set show_all_iterations to be true in configs
sample settings to get mulitple runs for this test

"show_all_iterations":true,

"verify_after_each_test":true,

"loop_all_tests":2,

command : zopkio examples/server_client/server_client_multiple_iteration.py

The base_tests_multiple_iteration.py module has TEST_ITER parameter set to 2.
This repeats all the tests twice but does not enfore any ordering

command : zopkio examples/server_client/client_resilience.py

This is an example of the test recipe feature of zopkio. See test_recipes.py for recipe and test_resilience.py for example used here
This tests the kill_recovery recipe to which you pass the deployer, process list, optional restart func, recovery func and timeout
Zopkio will kill a random process of the deployer and verifies if the system can recover correctly based on recovery function before the timeout value

zopkio's People

Contributors

Stargazers

Watchers

zopkio's Issues

Programmatically set environment variables

When deploying things and running tests, it'd be nice if we could specify environment variables to be used on all remote machines. For example, we might wish to set JAVA_HOME to some specific path (e.g. if we want to run integration tests against Java 7). We might want to set YARN_HOME to some deployment directory, as well. Zopkio doesn't seem to provide a mechanism for me to do this.

Diff link seems to do nothing

Clicking on the "Diff" link goes nowhere.

Shutdown triggers exception

Occasionally, when Zopkio is shutting down after a test run, I see a stack trace at the very end of the execution.

Exception in thread Thread-33 (most likely raised during interpreter shutdown):
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 808, in __bootstrap_inner
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 761, in run
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/kafka/util.py", line 108, in _timer
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 618, in wait
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 354, in wait
<type 'exceptions.TypeError'>: 'NoneType' object is not callable

Everything appears to work. The report shows up in the browser with expected results. There's just an odd stack trace at the end.

Invalid JSON config results in silent failure

If you have invalid JSON config like:

{
  "foo": "bar"
  "baz": "meh"
}

And then you call runtime.get_active_config('foo'), you get a KeyError. This is pretty unintuitive. What's actually happening is Zopkio is unable to parse the configs, and swallows the config parse exception.

Support fetching directories in log aggregation

My YARN jobs have dynamically created folder names for their log directories. I attempted to configure the machine logs to point to a directory instead of a file.

This led to:

Traceback (most recent call last):
  File "/tmp/samza-test/samza-integration-tests/bin/zopkio", line 8, in <module>
    load_entry_point('zopkio==0.1.4', 'console_scripts', 'zopkio')()
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/__main__.py", line 114, in main
    test_runner.run()
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 117, in run
    self._copy_logs()
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 181, in _copy_logs
    deployer.get_logs(process.unique_id, logs, logs_dir)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/deployer.py", line 216, in get_logs
    ftp.get(f, new_file)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/paramiko/sftp_client.py", line 720, in get
    size = self.getfo(remotepath, fl, callback)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/paramiko/sftp_client.py", line 693, in getfo
    data = fr.read(32768)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/paramiko/file.py", line 169, in read
    new_data = self._read(read_size)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/paramiko/sftp_file.py", line 165, in _read
    data = self._read_prefetch(size)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/paramiko/sftp_file.py", line 146, in _read_prefetch
    self._check_exception()
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/paramiko/sftp_file.py", line 490, in _check_exception
    raise x
IOError: Failure

It appears that the directory stat passes, but the ftp.get call fails because the path is a directory. How am I supposed to handle this? It'd be nice if Zopkio would copy the whole folder for me.

Mark configuration test results as green in UI

Currently, the "configuration name" in the UI doesn't show green even when all tests in the config pass. If you drill down, you can see that individual tests are marked as green in the UI, but the overall configuration on the landing page isn't.

I want the landing page to show green if all tests in the config pass. This will make it easier to determine if there's an error.

Providing sufficient process isolation for running tests in a shared environment

If a user-defined deployer does terminate the test suite properly, it might leave behind some orphaned processes.. This can cause issues when the same or new test is deployed on the same host. Imagine a world where the test machines are shared.
It will be good if the framework provides a way to check for orphaned processes and clear the machine before setting up the test environment.
I can think about this in 2 ways:

Maintain a global state with all the pids that were generated by the test suite and manually, kill them after you teardown the suite.
Run each test suite in its own VM , although I am not sure how difficult it is to enforce this kind of isolation.

Support for running tests continuously

We more often want to run the same test or set of tests over and over again. The scenario looks something like:

LOOP (upto N times) {
  Run Test
  Validate Test
  [Optional] Reset System State
  Goto LOOP 
}

We can also extend this continuous behavior across multiple tests, although I think this will bring up the question of _maintaining ordering_ across the tests.

LOOP (upto N times) {
  Run test1
  Validate test1
  [Optional] Reset System State
  Run test2
  Validate test2
  ...
  Goto LOOP
}

By adding continuous testing support, we can, in future, extend support to run the Zopkio tests via Hudson or other CI tool.

Support test recipes

Many different systems have similar tests and adding extensible recipes to capture this idea would make it easier to get started

Show zopkio driver logs in report.html

It'd be nice if the logs from the driver host were also visible in the report.html (e.g. ./logs/zopkio_log_20141216_125445/zopkio_log_20141216_125445.log)

CSS is broken

Some CSS files seem to be missing. Chrome debugger shows:

GET file://private/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/web_resources/style.css net::ERR_FILE_NOT_FOUND report.html:25
GET file://private/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/web_resources/script.js net::ERR_FILE_NOT_FOUND report.html:28
GET file://code.jquery.com/jquery-1.11.0.min.js net::ERR_FILE_NOT_FOUND report.html:29
GET file://code.jquery.com/jquery-migrate-1.2.1.min.js net::ERR_FILE_NOT_FOUND report.html:30
Uncaught Error: Bootstrap's JavaScript requires jQuery bootstrap.min.js:6

When the front page is loaded.

make a gradle plugin

make a gradle plugin to integrate zopkio testing into the build system (should setup a new virtualenv install zopkio and run a list of tests)

Print logging to CLI

When I write a test, I'm faced with the choice to either write:

print "Doing some stuff"

Or:

logger.info("Doing some stuff")

If I do the former, I get to see the line in the CLI while the integration tests are running, but I don't get to see it in the logs after the execution finishes.

If I do the latter, I can't see the log line while the tests are running, but can see it after.

Recommend unifying this to just print all log lines to CLI, and then I can just use logger.info for everything, and see the output both during and after the execution.

Add support to fetch executables from a remote repository

SSHDeployer assumes that the executable jars/tar are located locally as a part of the test project. It will nice to add an option to fetch the executable from a remote server.

Incorrect directory structure gives odd error.

The top error, I get. The weird %s seems like a bug, though.

2015-02-25 16:26:20,643 zopkio [ERROR] incorrect dir structure testfile:/tmp/samza-tests/scripts/tests.py exist in same level as dir:/tmp/samza-tests/scripts/tests
Error in processing command line arguments:
 %s

Explore ClusterShell

Should look at using Clush instead of (or in addition to) Paramiko. @fintler claims it's quite fast for SSH execution. I don't know much about it, but it seems worth investigating a bit. Primary goal would be to speed up deployments.

Config table is too wide

The config table for suite reports (e.g. zopkio_20150105_093334/reports/tests_20150105_093334/resources/smoke-tests/smoke-tests_report.html) is wider than the other tables on the page.

Zopkio needs a website

parse_config_file should give better errors

I had a malformed .json file (extra , in it), and I was getting it:

  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/utils.py", line 120, in parse_config_file
    raise SyntaxError(e)
SyntaxError: Expecting property name: line 5 column 1 (char 195)

Would prefer getting "Unable to parse JSON file config/smoke-tests/smoke-tests.json due to malformed JSON. Aborting."

behvaior typo in SSHDeployer comments

Write docs on how to run the tests

There aren't any docs (that I can find) on how to run the unit tests inside the tests folder.

Support duplicate filenames when aggregating log directories

After #17, I am now able to aggregate logs by directory, but I my files are stomping on each other. If I have:

./deploy/yarn_nm/hadoop-2.4.0/logs
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001/container_1419298145209_0001_01_000001
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001/container_1419298145209_0001_01_000001/gc.log.0.current
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001/container_1419298145209_0001_01_000001/stderr
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001/container_1419298145209_0001_01_000001/stdout
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001/container_1419298145209_0001_01_000002
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001/container_1419298145209_0001_01_000002/gc.log.0.current
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001/container_1419298145209_0001_01_000002/stderr
./deploy/yarn_nm/hadoop-2.4.0/logs/userlogs/application_1419298145209_0001/container_1419298145209_0001_01_000002/stdout
./deploy/yarn_nm/hadoop-2.4.0/logs/yarn-criccomi-nodemanager-criccomi-mn.log
./deploy/yarn_nm/hadoop-2.4.0/logs/yarn-criccomi-nodemanager-criccomi-mn.out

Then I get only one gc.log.0.current file if I define ./deploy/yarn_nm/hadoop-2.4.0/logs as my log directory. This is because the two container dirs are stepping on eachother.

I would like to maintain the directory structure for these logs. Two ideas:

Rather than having the log dir be flat, have it be nested, and maintain exact nesting when recursively copying directories locally.
Prefix the entire dir path to the file (e.g. deploy_yarn_nm_hadoop-2.4.0_logs_userlogs_application_1419298145209_0001_container_1419298145209_0001_01_000001_stdout).

I'm OK with either of these

Use directories_to_clean in test_adhoc_deployer.py

It seems that test_adhoc_deployer.py is using a parameter called directories_to_remove, but SSHDeployer appears to use:

directories_to_remove = self.default_configs.get('directories_to_clean', [])

Give a better error when perf.py is misconfigured

I accidentally set 'samza_job_0' instead of 'samza_instance_0' as a key in my perf.py machine logs list. This led to this exception:

Traceback (most recent call last):
  File "/tmp/samza-test/samza-integration-tests/bin/zopkio", line 8, in <module>
    load_entry_point('zopkio==0.1.4', 'console_scripts', 'zopkio')()
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/__main__.py", line 114, in main
    test_runner.run()
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 117, in run
    self._copy_logs()
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 179, in _copy_logs
    logs = self.dynamic_config_module.machine_logs()[process.unique_id] + self.dynamic_config_module.naarad_logs()[process.unique_id]
KeyError: 'samza_job_0'

It'd be nice if a better error were thrown (e.g. Unknown process id 'samza_job_0' found in machine log list).

get_active_config should allow a default to be specified

It'd be nice if I could call runtime.get_active_config('config-that-might-not-exist', False) to specify a default if a config isn't set.

Report test results via CLI

It'd be nice if the CLI dumped the test results in summary form as an INFO-level log to the CLI. I'm running the tests remotely on my Linux box via SSH, and opening the browser doesn't do anything for me. Having a simple # test passed, # tests failed report would be handy.

UI should show config that was used when test suite was run

One of my test suites failed, and there's no easy way to determine (from the UI) what configs were used when the failure occurred.

CTRL-C during test should still run teardown

If CTRL-C'ing the driver CLI during the test phase, the teardown should still be invoked.

Zopkio doesn't support init.py in test directories

I tried to add an init.py file in my test directory (./tests/init.py), and I get:

Error setting up testrunner:
Traceback (most recent call last):
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/__main__.py", line 133, in main
    test_runner = TestRunner(args.testfile, args.test_list, config_overrides)
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 76, in __init__
    test_runner_helper.get_modules(testfile, tests_to_run, config_overrides)
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner_helper.py", line 108, in get_modules
    test_dic = _parse_input(testfile)
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner_helper.py", line 195, in _parse_input
    test_dic = utils.load_module(testfile).test
AttributeError: 'module' object has no attribute 'test'

I'd like to be able to have my Zopkio tests be properly structured python packages, but I can't really do this without init.py files.

deployment.py server_client example should use configs

deployment.py in server_client example uses hard-coded configs in deployment.py file. Should demo how to properly use configs.

Report should show start/stop times

When I run my integration tests, I can see how long they took, but not when they started/stopped.

Replace [in] icon with a Zopkio icon

SSH passwords fail

Executing Zopkio with my SSH password, and no ~/.ssh/authorized_keys leads to:

2015-01-14 10:03:02,437 zopkio.test_runner [ERROR] Aborting smoke-tests due to setup_suite failure:
Traceback (most recent call last):
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 107, in run
    self.deployment_module.setup_suite()
  File "/tmp/samza-tests/scripts/deployment.py", line 88, in setup_suite
    'executable': c('samza_executable'),
  File "/tmp/samza-tests/scripts/samza_job_yarn_deployer.py", line 77, in install
    with get_ssh_client(host) as ssh:
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/remote_host_helper.py", line 200, in get_ssh_client
    ssh.connect(hostname, username=username, password=password)
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/paramiko/client.py", line 307, in connect
    look_for_keys, gss_auth, gss_kex, gss_deleg_creds, gss_host)
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/paramiko/client.py", line 519, in _auth
    raise saved_exception
PasswordRequiredException: Private key file is encrypted

Running the same tests with --nopassword, and authroized_keys containing my public key works.

NOTE: some of my deployment HAS already passed at the point that this failure occurs, so it appears that SFTP and SSH do work, but somehow this specific invocation is causing a problem.

Reports don't open after testing completes

I expected my reports to auto-open after testing completes, but then don't. I have to open them manually.

Mac OSX 10.8
Chrome Version 38.0.2125.104

Make zopkio's SSH work better out of the box

Out of the box, when running Zopkio, we get:

setup_suite() failed. See below for the trace.
Traceback (most recent call last):
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 106, in run
    self.deployment_module.setup_suite()
  File "/tmp/samza-test/scripts/deployment.py", line 76, in setup_suite
    'hostname': host
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/deployer.py", line 76, in deploy
    self.install(unique_id, configs)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/adhoc_deployer.py", line 99, in install
    with get_ssh_client(hostname) as ssh:
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/zopkio/remote_host_helper.py", line 180, in get_ssh_client
    ssh.connect(hostname)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/paramiko/client.py", line 307, in connect
    look_for_keys, gss_auth, gss_kex, gss_deleg_creds, gss_host)
  File "/tmp/samza-test/samza-integration-tests/lib/python2.7/site-packages/paramiko/client.py", line 519, in _auth
    raise saved_exception
PasswordRequiredException: Private key file is encrypted

To fix this, the user has to manually add their public key to .ssh/authorized_keys on all hosts, and SSH to the boxs to ensure that the keyring password is cached (at least on OSX).

It'd be nice if:

The password that Zopkio prompts for gets passed into the ssh.connect in remote_host_helper.py. Might also be nice if username were accepted as a param as well.
Some docs were written to describe how to handle this issue (either with authorized_keys, or via the prompt).
Maybe there's some better way of doing this? In the case where Zopkio is actually deploying to remote hosts, I can see where it'd b necessary to add your public key to authorized_hosts, but for localhost deployments, it seems a bit excessive.

Auto-download logs

It'd be nice of zopkio would automatically download logs from the install path based on a pattern. Ideally, I just want to set [*.log, *.out, *.err] as the extensions to download. It should find all files in all subdirs that match that pattern, and download them to the driver machine for display in the UI.

Right now, I'm having to manually hard-code each individual file. It's really tedious.

Disable password prompt

My tests don't use passwords, but I'm prompted to enter it every time I execute my tests. It'd be nice if I could shut this off.

Publish Zopkio pydocs somewhere

Use Travis for CI

zopkio.com

I'm squatting on zopkio.com. Please let me know if I should transfer it to someone (30 day waiting period).

Allow manual override for log level when running zopkio

I'd like to be able to do:

$ zopkio --log=DEBUG my.py

Give a better error message when Zopkio can't load a module

It's impossible for me to debug this without going into Zopkio, and adding log lines.

Error setting up testrunner:
Traceback (most recent call last):
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/__main__.py", line 133, in main
    test_runner = TestRunner(args.testfile, args.test_list, config_overrides)
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 76, in __init__
    test_runner_helper.get_modules(testfile, tests_to_run, config_overrides)
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner_helper.py", line 108, in get_modules
    test_dic = _parse_input(testfile)
  File "/tmp/samza-tests/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner_helper.py", line 195, in _parse_input
    test_dic = utils.load_module(testfile).test
AttributeError: 'module' object has no attribute 'test'

install_path created with 777 permissions in SSHDeployer

It seems that the SSHDeployer will create the install_path directory with 777 permissions:

$ ls -l /tmp/samza-tests/deploy/
total 16
drwxrwxrwx 4 criccomi criccomi 4096 Jan  5 22:43 kafka
drwxrwxrwx 3 criccomi criccomi 4096 Jan  5 22:43 yarn_nm
drwxrwxrwx 3 criccomi criccomi 4096 Jan  5 22:42 yarn_rm
drwxrwxrwx 4 criccomi criccomi 4096 Jan  5 22:42 zookeeper

It seems like it'd be a bit safer to have it be 755, or something.

Show unified timeline view of logs

The current log view for Zopkio shows a list of files that get aggregated to the driver machine. I can pick and choose which file to look at from there. This view is suboptimal. What I really want is a single view that merges all logs into a single timeline, sorted by machine timestamp. I think this would make it much easier to debug issues.

For example, instead of having:

test1.log
test2.log
zookeeper.log
kafka.log

What I'd rather have is:

[12:45] test1 log line
[12:46] test1 log line
[12:47] zookeeper log line
[12:48] zookeeper log line
[12:48] test1 log line
[12:50] kafka log line
[12:54] test2 log line
[13:37] kafka log line

This would allow me to figure out when failures on one machine/log file relate to failures on another.

Log better_exec_command STDOUT/STDERR

When I use the better_exec_command, I can't seem to find where the out/err logs go. I suspect they're swallowed. It'd be nice if, by default, their output were logged to logger.

Config table keys should be in alphabetical ascending order

The config table in zopkio_20150105_093334/reports/tests_20150105_093334/resources/smoke-tests/smoke-tests_report.html should be in key-sorted order to make it easier to find keys.

Support failure injection

A lot of distributed systems want to test for resilience/recovery. This means Zopkio should have the ability to bring down/kill system components and wait for recovery before validating the state of the system. Scenario would look something like:

Setup environment & start all processes
components <- list of components to fail
LOOP (upto n times) {
  Kill a random, say component[i]
  [Optional] Restart component[i]            // -> This can be for cases where the component does not auto-recover, unless it is specifically restarted. For example, Samza depends on Kafka brokers. If we kill a broker, they don't recover automatically. It has to be restarted. 
  Wait for recovery
  Verify system state [ can also validate recovery time, etc ]
  [Optional] Reset system
  Goto LOOP
}

User must be able to specify how to fail each component. A component can be failed by:

using kill cmd with a pid
using an application specific kill operation / function definition

Be verbose on failures in adhoc deployer

It'd be nice if Zopkio gave more detail when adhoc deployer fails.

We saw this recently:

2015-01-14 17:51:55,266 deployment [INFO] Deploying zookeeper_instance_0 on host: localhost
2015-01-14 17:51:58,150 deployment [INFO] Deploying yarn_rm_instance_0 on host: localhost
2015-01-14 17:52:03,546 zopkio.remote_host_helper [ERROR] 
2015-01-14 17:52:03,548 zopkio.test_runner [ERROR] Aborting smoke-tests due to setup_suite failure:
Traceback (most recent call last):
  File "/tmp/samza-test2/samza-integration-tests/lib/python2.7/site-packages/zopkio/test_runner.py", line 107, in run
    self.deployment_module.setup_suite()
  File "/tmp/samza-test2/scripts/deployment.py", line 76, in setup_suite
    'hostname': host
  File "/tmp/samza-test2/samza-integration-tests/lib/python2.7/site-packages/zopkio/deployer.py", line 78, in deploy
    self.start(unique_id, configs)
  File "/tmp/samza-test2/samza-integration-tests/lib/python2.7/site-packages/zopkio/adhoc_deployer.py", line 212, in start
    chan = exec_with_env(ssh, command, msg="Failed to start", env=env, sync=configs.get('sync', False))
  File "/tmp/samza-test2/samza-integration-tests/lib/python2.7/site-packages/zopkio/remote_host_helper.py", line 86, in exec_with_env
    return better_exec_command(ssh, new_command, msg)
  File "/tmp/samza-test2/samza-integration-tests/lib/python2.7/site-packages/zopkio/remote_host_helper.py", line 115, in better_exec_command
    raise ParamikoError(msg, err_msg)
ParamikoError: Failed to start

Since adhoc deployer has the full channel when the failure occurs, it'd be nice if it dumped stderr and stdout so we knew what was going on. Right now it basically just says, "I got a non-zero exit code."

remote_host_helper is double-spacing log lines

It looks like the log line helper is double spacing newlines:

2014-12-22 16:35:06,072 zopkio.remote_host_helper [INFO] stopping resourcemanager

2014-12-22 16:35:06,995 zopkio.remote_host_helper [INFO] Stopping zookeeper ... STOPPED

2014-12-22 16:35:12,743 zopkio.remote_host_helper [INFO] stopping nodemanager
nodemanager did not stop gracefully after 5 seconds: killing with kill -9

2014-12-22 16:35:14,192 zopkio.test_runner [INFO] Execution of configuration: smoke-tests complete

Likely just need a trim().

Write a lot of examples for SSHDeployer

It'd be nice if there were a lot of examples of how to use SSHDeployer. It's a little tough to figure out what everything does, and how to deploy things. The docs/index.rst is good for an intro, but I want to know all the params, what they all do, and see examples of how to use them. The pydocs are OK, but not great for this.