Code Monkey home page Code Monkey logo

accumulo-testing's Introduction

Apache Accumulo Testing Suite

Build Status

The Apache Accumulo testing suite contains applications that test and verify the correctness of Accumulo.

Installation

In order to run the Apache Accumulo testing suite, you will need Java 8 and Maven installed on your machine as well as an Accumulo instance to use for testing.

  1. First clone this repository.
git clone [email protected]:apache/accumulo-testing.git
cd accumulo-testing
  1. All configuration files for the test suite are in conf/. Only the accumulo-testing.properties configuration file needs to be edited as all other configuration files are optional. In accumulo-testing.properites, review the properties with test.common.* prefix as these are used by all tests.
cd conf/
vim accumulo-testing.properties

Run tests locally

Tests are run using the following scripts in bin/:

  • cingest - Runs continuous ingest tests
  • rwalk - Runs random walk tests
  • performance - Runs performance test
  • agitator - Runs agitator
  • gcs - Runs garbage collection simulation
  • monitor - Runs availability monitor probe

Run the scripts without arguments to view usage.

Run tests in Docker

While test scripts can be run from a single machine, they will put more stress if they are run from multiple machines. The easiest way to do this is using Docker. However, only the tests below can be run in Docker:

  • cingest - All applications can be run except verify & moru which launch a MapReduce job.
  • rwalk - All modules can be run.
  • monitor - All modules can be run.
  1. To create the accumulo-testing docker image, make sure the following files exist in your clone:

    • conf/accumulo-client.properties - Configure this file from your Accumulo install
    • conf/accumulo-testing.properties - Configure this file for testing
    • target/accumulo-testing-2.1.0-SNAPSHOT-shaded.jar - Can be created using ./bin/build

    Run the following command to create the image. HADOOP_HOME should be where Hadoop is installed on your cluster. HADOOP_USER_NAME should match the user running Hadoop on your cluster.

    docker build --build-arg HADOOP_HOME=$HADOOP_HOME --build-arg HADOOP_USER_NAME=`whoami` -t accumulo-testing .
  2. The accumulo-testing image can run a single command:

    docker run --network="host" accumulo-testing cingest createtable
  3. Multiple containers can also be run (if you have Docker Swarm enabled):

    # the following can be used to get the image on all nodes if you do not have a registry.
    for HOST in node1 node2 node3; do
      docker save accumulo-testing | ssh -C $HOST docker load &
    done
    
    docker service create --network="host" --replicas 2 --name ci accumulo-testing cingest ingest

Random walk test

The random walk test generates client behavior on an Apache Accumulo instance by randomly walking a graph of client operations.

Before running random walk, review the test.common.* properties in accumulo-testing.properties file. A test module must also be specified. See the modules directory for a list of available ones.

The command below will start a single random walker in a local process using the Image.xml module.

./bin/rwalk Image.xml

Continuous Ingest & Query

The Continuous Ingest test runs many ingest clients that continually create linked lists of data in Accumulo. During ingest, query applications can be run to continuously walk and verify the linked lists and put a query load on Accumulo. At some point, the ingest clients are stopped and a MapReduce job is run to ensure that there are no holes in any linked list.

The nodes in the linked list are random. This causes each linked list to spread across the table. Therefore, if one part of a table loses data, then it will be detected by references in another part of table.

Before running any of the Continuous Ingest applications, make sure that the accumulo-testing.properties file exists in conf/ and review all properties with the test.ci.* prefix.

First, run the command below to create an Accumulo table for the continuous ingest tests. The name of the table is set by the property test.ci.common.accumulo.table (its value defaults to ci) in the file accumulo-testing.properties:

./bin/cingest createtable {-o test.<prop>=<value>}

The continuous ingest tests have several applications that start a local application which will run continuously until you stop using ctrl-c:

./bin/cingest <application> {-o test.<prop>=<value>}

Below is a list of available continuous ingest applications. You should run the ingest application first to add data to your table.

  • ingest - Inserts data into Accumulo that will form a random graph.
  • walk - Randomly walks the graph created by ingest application using scanner. Each walker produces detailed statistics on query/scan times.
  • batchwalk - Randomly walks the graph created by ingest using a batch scanner.
  • scan - Scans the graph
  • verify - Runs a MapReduce job that verifies all data created by continuous ingest. Before running, review all test.ci.verify.* properties. Do not run ingest while running this command as it will cause erroneous reporting of UNDEFINED nodes. Each entry, except for the first batch of entries, inserted by continuous ingest references a previously flushed entry. Since we are referencing flushed entries, they should always exist. The MapReduce job checks that all referenced entries exist. If it finds any that do not exist it will increment the UNDEFINED counter and emit the referenced but undefined node. The MapReduce job produces two other counts: REFERENCED and UNREFERENCED. It is expected that these two counts are non-zero. REFERENCED counts nodes that are defined and referenced. UNREFERENCED counts nodes that defined and unreferenced, these are the latest nodes inserted.
  • bulk - Runs a MapReduce job that generates data for bulk import. See bulk-test.md.
  • moru - Runs a MapReduce job that stresses Accumulo by reading and writing the continuous ingest table. This MapReduce job will write out an entry for every entry in the table (except for ones created by the MapReduce job itself). Stop ingest before running this MapReduce job. Do not run more than one instance of this MapReduce job concurrently against a table.

Check out ingest-test.md for pointers on running a long-running ingest and verification test.

Garbage Collection Simulator

See gcs.md.

Agitator

The agitator will periodically kill the Accumulo manager, tablet server, and Hadoop data node processes on random nodes. Before running the agitator you should create accumulo-testing-env.sh in conf/ and review all the agitator settings. The command below will start the agitator:

./bin/agitator start

Running this script as root will properly start processes as the user you configured in env.sh (AGTR_HDFS_USER for the data node and AGTR_ACCUMULO_USER for Accumulo processes). If you run it as yourself and the AGTR_HDFS_USER and AGTR_ACCUMULO_USER values are the same as your user, the agitator will not change users. In the case where you run the agitator as a non-privileged user which isn't the same as AGTR_HDFS_USER or AGTR_ACCUMULO_USER, the agitator will attempt to sudo to these users, which relies on correct configuration of sudo. Also, be sure that your AGTR_HDFS_USER has password-less ssh configured.

Run the command below stop the agitator:

./bin/agitator stop

Performance Test

To run performance test a cluster-control.sh script is needed to assist with starting, stopping, wiping, and configuring an Accumulo instance. This script should define the following functions.

function get_hadoop_client {
  # TODO return hadoop client libs in a form suitable for appending to a classpath
}

function get_version {
  case $1 in
    ACCUMULO)
      # TODO echo accumulo version
      ;;
    HADOOP)
      # TODO echo hadoop version
      ;;
    ZOOKEEPER)
      # TODO echo zookeeper version
      ;;
    *)
      return 1
  esac
}

function start_cluster {
  # TODO start Hadoop and Zookeeper if needed
}

function setup_accumulo {
  # TODO kill any running Accumulo instance
  # TODO setup a fresh install of Accumulo w/o starting it
}

function get_config_file {
  local file_to_get=$1
  local dest_dir=$2
  # TODO copy $file_to_get from Accumulo conf dir to $dest_dir
}

function put_config_file {
  local config_file=$1
  # TODO copy $config_file to Accumulo conf dir
}

function put_server_code {
  local jar_file=$1
  # TODO add $jar_file to Accumulo's server side classpath. Could put it in $ACCUMULO_HOME/lib/ext
}

function start_accumulo {
  # TODO start accumulo
}

function stop_cluster {
  # TODO kill Accumulo, Hadoop, and Zookeeper
}

An example script for Uno is provided. To use this, do the following and set UNO_HOME after copying.

cp conf/cluster-control.sh.uno conf/cluster-control.sh

After the cluster control script is set up, the following will run performance test and produce JSON result files in the provided output directory.

./bin/performance run <output dir>

The example above will run all performance tests in order. To run a single test, a filter can be applied. The example below will run just the DurabilityWriteSpeedPT.

./bin/performance run <output dir> DurabilityWriteSpeedPT

Some performance tests alter the system properties of the cluster it is trying to test on. These may require fine-tuning in order to work on some hardware.

There are some utilities for working with the JSON result files, run the performance script with no options to see them.

Availability Monitor

Monitor class aims at verifying availability of overall accumulo cluster by continually doing scans of random values across various tablet servers and capturing timing information related to how long such scans take.

Automated Cluster Testing

See the readme.md.

accumulo-testing's People

Contributors

albertwhitlock avatar arvindshmicrosoft avatar billierinaldi avatar brianloss avatar busbey avatar cshannon avatar ctubbsii avatar ddanielr avatar dependabot[bot] avatar dlmarion avatar domgarguilo avatar edcoleman avatar elinaawise avatar jmark99 avatar jzgithub1 avatar keith-turner avatar lbschanno avatar mandarinamdar avatar manno15 avatar mikewalch avatar milleruntime avatar rcarterjr avatar rchoksi-hw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

accumulo-testing's Issues

YieldingScanExecutorPT fails to run. Throws AccumuloServerException

YieldingScanExecutorPT fails to complete with the following output:

Shell output
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.accumulo.core.clientImpl.AccumuloServerException: Error on server thor:9997
	at org.apache.accumulo.testing.performance.util.TestExecutor.lambda$stream$0(TestExecutor.java:52)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.LongPipeline.collect(LongPipeline.java:491)
	at java.base/java.util.stream.LongPipeline.summaryStatistics(LongPipeline.java:468)
	at org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT.runShortScans(YieldingScanExecutorPT.java:215)
	at org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT.runTest(YieldingScanExecutorPT.java:114)
	at org.apache.accumulo.testing.performance.impl.PerfTestRunner.main(PerfTestRunner.java:51)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: org.apache.accumulo.core.clientImpl.AccumuloServerException: Error on server thor:9997
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at org.apache.accumulo.testing.performance.util.TestExecutor.lambda$stream$0(TestExecutor.java:50)
	... 11 more
Caused by: java.lang.RuntimeException: org.apache.accumulo.core.clientImpl.AccumuloServerException: Error on server thor:9997
	at org.apache.accumulo.core.clientImpl.ScannerIterator.getNextBatch(ScannerIterator.java:185)
	at org.apache.accumulo.core.clientImpl.ScannerIterator.hasNext(ScannerIterator.java:110)
	at com.google.common.collect.Iterators.size(Iterators.java:163)
	at com.google.common.collect.Iterables.size(Iterables.java:126)
	at org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT.scan(YieldingScanExecutorPT.java:170)
	at org.apache.accumulo.testing.performance.tests.YieldingScanExecutorPT.lambda$runShortScans$1(YieldingScanExecutorPT.java:212)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.accumulo.core.clientImpl.AccumuloServerException: Error on server thor:9997
	at org.apache.accumulo.core.clientImpl.ThriftScanner.scan(ThriftScanner.java:324)
	at org.apache.accumulo.core.clientImpl.ScannerIterator.readBatch(ScannerIterator.java:156)
	at org.apache.accumulo.core.clientImpl.ScannerIterator.getNextBatch(ScannerIterator.java:174)
	... 9 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing startScan
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_startScan(TabletClientService.java:249)
	at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.startScan(TabletClientService.java:221)
	at org.apache.accumulo.core.clientImpl.ThriftScanner.scan(ThriftScanner.java:453)
	at org.apache.accumulo.core.clientImpl.ThriftScanner.scan(ThriftScanner.java:317)
	... 11 more
Server logs
java.lang.NullPointerException at 
org.apache.accumulo.server.conf.TableConfiguration.createScanDispatcher(TableConfiguration.java:215) at 
org.apache.accumulo.server.conf.TableConfiguration.lambda$new$1(TableConfiguration.java:82) at 
org.apache.accumulo.core.conf.AccumuloConfiguration$DeriverImpl.derive(AccumuloConfiguration.java:482) at 
org.apache.accumulo.server.conf.TableConfiguration.getScanDispatcher(TableConfiguration.java:270) at 
org.apache.accumulo.tserver.ThriftClientHandler.getScanDispatcher(ThriftClientHandler.java:272) at 
org.apache.accumulo.tserver.ThriftClientHandler.continueScan(ThriftClientHandler.java:378) at 
org.apache.accumulo.tserver.ThriftClientHandler.startScan(ThriftClientHandler.java:342) at 
jdk.internal.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at 

java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at 
java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$6(TraceUtil.java:235) at 
com.sun.proxy.$Proxy38.startScan(Unknown Source) at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:2944) at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:2923) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at 
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at 
org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63) at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:114) at org.apache.thrift.server.Invocation.run(Invocation.java:18) at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at 
java.base/java.lang.Thread.run(Thread.java:829) 

java.lang.ClassNotFoundException: org.apache.accumulo.testing.performance.tests.TimedScanDispatcher at 
java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471) at 
java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589) at 
org.apache.accumulo.start.classloader.AccumuloClassLoader$1.loadClass(AccumuloClassLoader.java:213) at 
java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522) at 
org.apache.accumulo.core.classloader.ClassLoaderUtil.loadClass(ClassLoaderUtil.java:85) at 
org.apache.accumulo.core.conf.ConfigurationTypeHelper.getClassInstance(ConfigurationTypeHelper.java:203) at 
org.apache.accumulo.core.conf.ConfigurationTypeHelper.getClassInstance(ConfigurationTypeHelper.java:176) at 
org.apache.accumulo.core.conf.Property.createTableInstanceFromPropertyName(Property.java:1747) at 
org.apache.accumulo.server.conf.TableConfiguration.createScanDispatcher(TableConfiguration.java:209) at 
org.apache.accumulo.server.conf.TableConfiguration.lambda$new$1(TableConfiguration.java:82) at 
org.apache.accumulo.core.conf.AccumuloConfiguration$DeriverImpl.derive(AccumuloConfiguration.java:482) at 
org.apache.accumulo.server.conf.TableConfiguration.getScanDispatcher(TableConfiguration.java:270) at 
org.apache.accumulo.tserver.ThriftClientHandler.getScanDispatcher(ThriftClientHandler.java:272) at 
org.apache.accumulo.tserver.ThriftClientHandler.continueScan(ThriftClientHandler.java:378) at 
org.apache.accumulo.tserver.ThriftClientHandler.startScan(ThriftClientHandler.java:342) at 
jdk.internal.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at 
java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$6(TraceUtil.java:235) at 
com.sun.proxy.$Proxy38.startScan(Unknown Source) at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:2944) at 
org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Processor$startScan.getResult(TabletClientService.java:2923) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38) at 
org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38) at 
org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63) at 
org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524) at 
org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:114) at org.apache.thrift.server.Invocation.run(Invocation.java:18) at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at 
java.base/java.lang.Thread.run(Thread.java:829) 

Create upgrade testing framework

See https://issues.apache.org/jira/browse/ACCUMULO-2145 for a description of work and prior work done.

This could use Uno. For testing apache/accumulo#1111 I wrote the following script that uses Uno.

#! /usr/bin/env bash

ACCUMULO_DIR=~/git/accumulo
UNO_DIR=~/git/uno
BULK=/tmp/upt

cd $ACCUMULO_DIR
git checkout 1.9
git clean -xfd
cd $UNO_DIR
./bin/uno fetch accumulo
./bin/uno setup accumulo
(
  eval "$(./bin/uno env)"

  hadoop fs -ls /accumulo/version


  hadoop fs -rmr "$BULK"
  hadoop fs -mkdir -p "$BULK/fail"
  accumulo org.apache.accumulo.test.TestIngest -i uno -u root -p secret --rfile $BULK/bulk/test --timestamp 1 --size 50 --random 56 --rows 200000 --start 200000 --cols 1

  accumulo org.apache.accumulo.test.TestIngest -i uno -u root -p secret --timestamp 1 --size 50 --random 56 --rows 200000 --start 0 --cols 1  --createTable --splits 10

  accumulo shell -u root -p secret <<EOF
   table test_ingest
   importdirectory $BULK/bulk $BULK/fail false
   createtable foo
   config -t foo -s table.compaction.major.ratio=2
   insert r1 f1 q1 v1
   flush -t foo -w
   scan -t accumulo.metadata -c file
   insert r1 f1 q2 v2
   insert r2 f1 q1 v3
EOF
)
pkill -9 -f accumulo\\.start
cd $ACCUMULO_DIR
git checkout accumulo-1111
git clean -xfd
cd $UNO_DIR
./bin/uno fetch accumulo
./bin/uno install accumulo --no-deps
./install/accumulo*/bin/accumulo-cluster start
(
  eval "$(./bin/uno env)"
  hadoop fs -ls /accumulo/version
  accumulo shell -u root -p secret <<EOF
    config -t foo -f table.compaction.major.ratio
    scan -t foo -np
    scan -t accumulo.metadata -c file
    compact -t foo -w
    scan -t foo -np
    scan -t accumulo.metadata -c file
EOF

  accumulo org.apache.accumulo.test.VerifyIngest --size 50 --timestamp 1 --random 56 --rows 400000 --start 0 --cols 1
)

Migrate to log4j2

Follow on from issue #130 / PR #140

This testing repository should migrate to log4j2, and any configured console logging should be configured to use STDERR instead of STDOUT in the log4j2 configuration files, so that console output won't interfere with output from executed commands used for scripts (see #130).

Revisit MapReduce configuration changes for Hadoop 3

Accumulo-testing has MapReduce jobs that needed the following configuration added to work with Hadoop 3. This ticket is revisit this configuration added in #49 so clients don't have to specify location of Hadoop home directory on servers.

hadoopConfig.set("yarn.app.mapreduce.am.env", "HADOOP_MAPRED_HOME=" + hadoopHome);

AssertionError thrown where never should happen

I saw this while running a rwalk on a 5 node cluster.

2019-09-12 21:26:51,842 [testing.randomwalk.Module] ERROR: Caught error executing BulkImport
java.util.concurrent.ExecutionException: java.lang.AssertionError: org.apache.accumulo.core.client.TableNotFoundException: Table (Id=a0) does not exist (Table (Id=a0) does not exist)
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:206)
	at org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:318)
	at org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:48)
	at org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:92)
Caused by: java.lang.AssertionError: org.apache.accumulo.core.client.TableNotFoundException: Table (Id=a0) does not exist (Table (Id=a0) does not exist)
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doBulkFateOperation(TableOperationsImpl.java:334)
	at org.apache.accumulo.core.clientImpl.bulk.BulkImport.load(BulkImport.java:142)
	at org.apache.accumulo.testing.randomwalk.concurrent.BulkImport.visit(BulkImport.java:130)
	at org.apache.accumulo.testing.randomwalk.Module$1.call(Module.java:303)
	at org.apache.accumulo.testing.randomwalk.Module$1.call(Module.java:298)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.accumulo.core.client.TableNotFoundException: Table (Id=a0) does not exist (Table (Id=a0) does not exist)
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:376)
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:342)
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doBulkFateOperation(TableOperationsImpl.java:329)
	... 11 more
Caused by: ThriftTableOperationException(tableId:a0, tableName:null, op:BULK_IMPORT, type:NOTFOUND, description:Table (Id=a0) does not exist)
	at org.apache.accumulo.core.master.thrift.FateService$executeFateOperation_result$executeFateOperation_resultStandardScheme.read(FateService.java:3474)
	at org.apache.accumulo.core.master.thrift.FateService$executeFateOperation_result$executeFateOperation_resultStandardScheme.read(FateService.java:3451)
	at org.apache.accumulo.core.master.thrift.FateService$executeFateOperation_result.read(FateService.java:3385)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
	at org.apache.accumulo.core.master.thrift.FateService$Client.recv_executeFateOperation(FateService.java:124)
	at org.apache.accumulo.core.master.thrift.FateService$Client.executeFateOperation(FateService.java:105)
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.executeFateOperation(TableOperationsImpl.java:270)
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.doFateOperation(TableOperationsImpl.java:353)
	... 13 more

Then looking at where the error is thrown, it seems like this could be bad:

public String doBulkFateOperation(List<ByteBuffer> args, String tableName)
      throws AccumuloSecurityException, AccumuloException {
    try {
      return doFateOperation(FateOperation.TABLE_BULK_IMPORT2, args, Collections.emptyMap(),
          tableName);
    } catch (TableExistsException | TableNotFoundException | NamespaceNotFoundException
        | NamespaceExistsException e) {
      // should not happen
      throw new AssertionError(e);
    }
  }

Drop the 1.9 testing branch

It was raised on apache/accumulo#1312 that the testing repo's 1.9 branch was not in a good state, and it was argued that we shouldn't be updating it, instead keeping the 1.9 testing code with the main 1.9 code base, like it has been.

We could merge the 1.9 branch into the master branch with -sours to preserve the 1.9 history before deleting it.

I'll leave this issue open for at least a few days, for comment, before taking any action.

RandomCachedLookupsPT will error and then hang if not enough memory is available

Describe the bug
RandomCachedLookupsPT alters some of the tserver configs. If the cluster that is running the test lacks the memory it will throw the exception below but will continuously hang until manually canceled. The main issue here is the exception is hidden in the logs and isn't properly handled by the PT to check for it.

2021-09-27T12:21:11,595 [start.Main] ERROR: Thread 'tserver' died.
java.lang.IllegalArgumentException: Maximum tablet server map
 memory 265,751,101 block cache sizes 3,301,756,108 and mutation
 queue size 40,265,318 is too large for this JVM configuration 805,306,368

To Reproduce

  1. Use the default (smaller) performance profile for fluo-uno
  2. Run the RandomCachedLookupsPT
  3. It will appear that the test is hanging. Checking the tserver logs will show the error above.

Expected behavior
Ideally, the PT should check beforehand to make sure that the system it is testing on can handle the system config changes it makes or at least exit nicely when an exception is thrown.

Additional context
This PT does pass as expected when using the larger performance profile for fluo-uno. It is possible that the simplest solution of documenting the need for users to configure these settings before running the PT might also be good enough.

Add validation that 'sudo' works for a given user inside of the agitator scripts

Original Jira ticket: https://issues.apache.org/jira/browse/ACCUMULO-1982

Main snippet:

It would be nice if we tested that sudoing to the desired user worked 
for the current user and gave better error messages in the case of failure.

The code here has changed significantly since the original ticket was created so it is possible the need for this has lessened but from my quick investigation I did not find any validation checks.

ConditionalMutationsPT fails with NumberFormatException

When running the performance tests (via ./bin/performance run), ConditionalMutationsPT errors out with the following:

Exception in thread "main" java.lang.NumberFormatException: For input string: "โˆž"
	at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)
	at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.base/java.lang.Double.parseDouble(Double.java:543)
	at org.apache.accumulo.testing.performance.tests.ConditionalMutationsPT.runConditionalMutationsTest(ConditionalMutationsPT.java:104)
	at org.apache.accumulo.testing.performance.tests.ConditionalMutationsPT.runTest(ConditionalMutationsPT.java:76)
	at org.apache.accumulo.testing.performance.impl.PerfTestRunner.main(PerfTestRunner.java:51)

I think the issue is occuring here:


where a nanosecond value is being converted to a second value. The error happens when the two nanosecond values, t1 and t2, have a difference of less than 1000000000 (1 second). TimeUnit.NANOSECONDS.toSeconds will convert the nanos to the nearest second rounding down. So anything less than 1000000000 (1 second) will result in 0 seconds which, in this case, is in the denominator which returns "โˆž" and throws an error when being parsed.

I think this can be corrected by manually converting from nanos to seconds.

RW Error running Sequential

2019-06-26 16:43:06,014 [testing.randomwalk.Framework] ERROR: Error during random walk
java.lang.Exception: Error running node seq.MapRedVerify
        at org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:370)
        at org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:48)
        at org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:92)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
        at org.apache.accumulo.testing.randomwalk.sequential.MapRedVerify.visit(MapRedVerify.java:48)
        at org.apache.accumulo.testing.randomwalk.Module$1.call(Module.java:303)
        at org.apache.accumulo.testing.randomwalk.Module$1.call(Module.java:298)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
        at java.lang.Thread.run(Thread.java:748)

Incorrect authorizations selected by continous scanners and walkers

Authorizations configured for continous scanners and walkers in accumulo-testing.properties in "test.ci.common.auths" incorrectly interpreted as single character each instead of | delimited. Like if
test.ci.common.auths=SYS,HR,ADM|SYS,ADM is defined. Authorizations chosen are S,Y,S,H,R ... and so on instead of "SYS,HR,ADM" , "SYS,ADM" and so on.

The issue looks to be in authValue.split("|") call in ContinuousEnv.java. The code is not considering | as a metacharacter in regex. It will require escape character to define | in the split function call. something like authValue.split("\|") to handle the desired functionality correctly.

Create script that automates setting up EC2 test cluster

When testing Accumulo I often go through the following task manually.

  • Build snapshot version of Accumulo locally and copy it to Muchos dir
  • Use Muchos to setup a cluster with that snapshot tarball
  • After Muchos sets up the cluster :
    • git clone and mvn install the same snapshot version of Accumulo
    • git clone accumulo testing repo and build it.

It may be nice to have script that does this. The argument for the script would be the following :

  • URL for Accumuo git repo and a branch in that repo
  • URL for Accumuo testing repo and a branch in that repo
  • Local dir where muchos is setup.

This script would setup the EC2 cluster with the version of Accumulo from the git repo. It would also setup accumulo testing on the cluster from the git repo.

Performance cluster control script no longer working with Uno

Something has changed where the cluster control script for uno no longer seems to work. This script can be used by performance test.

The following commands are being run by the scripts. These used to work and no longer work with the latest Uno. Not sure what changed. The goal of all of these commands is to avoid setting up hadoop and zookeeper from scratch between performance test.

  uno install accumulo
  uno run zookeeper
  uno run hadoop
  uno setup accumulo --no-deps
  uno stop accumulo --no-deps 
  uno start accumulo --no-deps 

Performance tests not found with list command

Running ./bin/performance list gives the following error:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/accumulo/testing/performance/tests/SplitBalancingPT (wrong name: target/classes/org/apache/accumulo/testing/performance/tests/SplitBalancingPT)
	at java.base/java.lang.ClassLoader.defineClass1(Native Method)
	at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
	at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
	at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:802)
	at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:700)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:623)
	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
	at com.google.common.reflect.ClassPath$ClassInfo.load(ClassPath.java:328)
	at org.apache.accumulo.testing.performance.impl.ListTests.main(ListTests.java:34)

Create smoke test suite

It would be nice if there was a test suite that developers and users could run on an Accumulo instance to sanity check an upgrade/install and verify the basic functionality of Accumulo.

Add stop here to agitation

To exercise more code paths during testing it would be nice to make the agitation scripts call stop-here.sh sometimes instead of killing processes.

Create herding performance test

I wrote the following test while working on apache/accumulo#990. This could be turned into a performance test.

public class HerdTest {

  private static final byte[] E = new byte[] {};
  private static final byte[] FAM = "pinky".getBytes();

  private static final int NUM_ROWS = 1_000_000;
  private static final int NUM_COLS = 10;

  public static void main(String[] args) throws Exception {

    Connector conn = CmdUtil.getConnector();

    if (!conn.tableOperations().exists("herd")) {
      conn.tableOperations().create("herd", new NewTableConfiguration().setProperties(
          Collections.singletonMap(Property.TABLE_BLOCKCACHE_ENABLED.getKey(), "true")));
      write(conn);
      conn.tableOperations().flush("herd", null, null, true);
    }

    testHerd(conn, 32);
  }

  private static void testHerd(Connector conn, int nt)
      throws InterruptedException, ExecutionException {
    ExecutorService tp = Executors.newFixedThreadPool(nt);
    final CyclicBarrier cb = new CyclicBarrier(nt);

    long t1 = System.currentTimeMillis();
    List<Future<?>> futures = new ArrayList<>();
    for (int i = 0; i < nt; i++) {
      Future<?> f = tp.submit(new Runnable() {
        @Override
        public void run() {
          try {
            Scanner scanner = conn.createScanner("herd", Authorizations.EMPTY);

            for (int i = 0; i < 1000; i++) {

              // System.out.println(Thread.currentThread().getId()+" "+i);

              cb.await();

              byte[] row = FastFormat.toZeroPaddedString(i * 1000, 8, 16, E);

              scanner.setRange(Range.exact(new String(row)));
              for (Entry<Key,Value> entry : scanner) {

              }
            }
          } catch (Exception e) {
            e.printStackTrace();
          }
        }
      });
      futures.add(f);
    }

    for (Future<?> future : futures) {
      future.get();
    }

    long t2 = System.currentTimeMillis();

    System.out.println(t2 - t1);

    // scanner.close();
    tp.shutdown();
  }

  private static void write(Connector conn) throws Exception {

    try (BatchWriter bw = conn.createBatchWriter("herd", new BatchWriterConfig())) {
      Random rand = new Random();

      for (int r = 0; r < NUM_ROWS; r++) {
        byte[] row = FastFormat.toZeroPaddedString(r, 8, 16, E);
        Mutation m = new Mutation(row);
        for (int c = 0; c < NUM_COLS; c++) {
          byte[] qual = FastFormat.toZeroPaddedString(c, 4, 16, E);

          byte[] val = new byte[32];
          rand.nextBytes(val);

          m.put(FAM, qual, val);
        }

        bw.addMutation(m);
      }
    }
  }
}

Support non default HDFS path as a parameter for continuous bulk ingest

Support non default HDFS (in case of multiple volumes) path as a parameter for continuous bulk ingest. For e.g
bin/cingest bulk abfs://[email protected]/azbulk-multi

In this case abfs://[email protected]/ HDFS filesystem is not the default HDFS and has been added as a additional volume to accumulo
The default HDFS fileystem configured is hdfs://accucluster

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: abfs://[email protected]/azbulk-multi, expected: hdfs://accucluster:8020

Permission Denied when running Rwalk scripts

With the recent changes in accumulo #1828, some of the modules for RWalk now throw permission exceptions.

The specific one I ran into while running ./bin./rwalk All.xml and specifically./bin/rwalk Security.xml is below:

ThriftSecurityException(user:system_flash_superheroes_local, code:PERMISSION_DENIED)
	at org.apache.accumulo.server.security.SecurityOperation.authenticateUser(SecurityOperation.java:238)
	at org.apache.accumulo.server.client.ClientServiceHandler.authenticateUser(ClientServiceHandler.java:150)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.accumulo.core.trace.TraceUtil.lambda$wrapService$6(TraceUtil.java:235)
	at com.sun.proxy.$Proxy38.authenticateUser(Unknown Source)
	at org.apache.accumulo.core.clientImpl.thrift.ClientService$Processor$authenticateUser.getResult(ClientService.java:2608)
	at org.apache.accumulo.core.clientImpl.thrift.ClientService$Processor$authenticateUser.getResult(ClientService.java:2587)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.accumulo.server.rpc.TimedProcessor.process(TimedProcessor.java:63)
	at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
	at org.apache.accumulo.server.rpc.CustomNonBlockingServer$CustomFrameBuffer.invoke(CustomNonBlockingServer.java:114)
	at org.apache.thrift.server.Invocation.run(Invocation.java:18)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

There are also several warnings thrown now for each permission in our auditing. Some examples below:

operation:  failed; user: system_flash_superheroes_local; 
action:  changeAuthorizations; targetUser: system_flash_superheroes_local;  
authorizations:  Towels,Paper,Brush,Asparagus,Fishsticks,PotatoSkins,Ribs,Celery;  
exception: ThriftSecurityException(user:system_flash_superheroes_local,  
code:PERMISSION
operation: failed; user: root; checking permission DROP_USER on table_flash_superheroes_local denied; 
exception: ThriftSecurityException(user:table_flash_superheroes_local, 
code:USER_DOESNT_EXIST)
operation:  failed; user: table_flash_superheroes_local; 
action:  revokeTablePermission;
permission: BULK_IMPORT; targetTable:  security_flash_superheroes_local; targetUser:  system_flash_superheroes_local;; 
exception:  ThriftSecurityException(user:table_flash_superheroes_local,  
code:PERMISSION_DENIED)

This seems to happen for each permission type, either with Permission_Denied or for User_Doesn't_Exist.
New one below:

ERROR Framework Error during random walk
 java.lang.Exception: Error running node Security.xml
	at org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:370)
	at org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:48)
	at org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:92)
Caused by: org.apache.accumulo.core.client.AccumuloSecurityException: Error BAD_CREDENTIALS for user system_flash_superheroes_local - Username or Password is Invalid

Add performance test for long running scans

In #24 a PT was added for lots of short randoms scans. It would be nice to have another PT for long running scans. For example measure the performance of reading X million entries from an Accumulo table with multiple tablets. Could also measure running 1,2,4,8,and 16 concurrent long running scans.

bin/cingest verify fails with 'Output directory hdfs://localhost:8020/tmp/ci-verify already exists'

If I've already run a continuous ingest verify and I want to run it again, I get the following error:

Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:8020/tmp/ci-verify already exists
	at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:164)
	at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:277)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1588)
	at org.apache.accumulo.testing.continuous.ContinuousVerify.run(ContinuousVerify.java:193)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.accumulo.testing.continuous.ContinuousVerify.main(ContinuousVerify.java:204)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

It'd be better if the script noticed that a verify job had already run, and asked to clean up this directory for me, so I didn't have to manually execute: hdfs dfs -rm -r /tmp/ci-verify

cannot run ci-createtable

Building /home/charbel/accumulo-testing/core/target/accumulo-testing-core-2.0.0-SNAPSHOT-shaded.jar
[INFO] Scanning for projects...
[WARNING] 
[WARNING] Some problems were encountered while building the effective model for org.apache.accumulo:accumulo-testing-core:jar:2.0.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-shade-plugin is missing. @ org.apache.accumulo:accumulo-testing-core:[unknown-version], /home/charbel/accumulo-testing/core/pom.xml, line 93, column 19
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Build Order:
[INFO] 
[INFO] Apache Accumulo Testing Parent
[INFO] Apache Accumulo Testing Core
[INFO] Apache Accumulo Testing YARN
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Apache Accumulo Testing Parent 2.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] --- maven-clean-plugin:3.0.0:clean (default-clean) @ accumulo-testing ---
[INFO] Deleting /home/charbel/accumulo-testing/target
[INFO] 
[INFO] --- formatter-maven-plugin:0.5.2:format (default) @ accumulo-testing ---
[INFO] Using 'UTF-8' encoding to format source files.
[INFO] Number of files to be formatted: 0
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (process-resource-bundles) @ accumulo-testing ---
[INFO] 
[INFO] --- maven-site-plugin:3.5.1:attach-descriptor (attach-descriptor) @ accumulo-testing ---
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building Apache Accumulo Testing Core 2.0.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[WARNING] The POM for org.apache.accumulo:accumulo-client-mapreduce:jar:1.8.0 is missing, no dependency information available
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Accumulo Testing Parent ..................... SUCCESS [  1.505 s]
[INFO] Apache Accumulo Testing Core ....................... FAILURE [  1.160 s]
[INFO] Apache Accumulo Testing YARN ....................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.688 s
[INFO] Finished at: 2018-05-24T17:23:08-07:00
[INFO] Final Memory: 26M/278M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project accumulo-testing-core: Could not resolve dependencies for project org.apache.accumulo:accumulo-testing-core:jar:2.0.0-SNAPSHOT: Failure to find org.apache.accumulo:accumulo-client-mapreduce:jar:1.8.0 in https://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of central has elapsed or updates are forced -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :accumulo-testing-core
Error: Could not find or load main class org.apache.accumulo.testing.core.continuous.CreateTable

accumulo version : 1.8.0

Clarify how to run individual performance tests

While working on #156 I realized it would be helpful to be able to run individual performance tests. As it stands, the only option is to run all the tests in order which takes quite a while. If any of the tests hang or error out, it stops the run of the remaining tests. Would be nice to be able to pass a parameter, the name of an individual test to run, similar to how cinigest works with its multiple components.

It doesn't seem there is any place that explains how to run a single Performance test. It would be nice if that was explained somewhere.

Could not run continuous ingest

I was unable to run continuous ingest for 1.9.2RC1 because of the following problem. I was using hadoop 2.8.4.

java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.HAUtil.isLogicalUri(Lorg/apache/hadoop/conf/Configuration;Ljava/net/URI;)Z
	at org.apache.twill.filesystem.FileContextLocation.toURI(FileContextLocation.java:149)
	at org.apache.twill.yarn.YarnTwillPreparer.createLocalFile(YarnTwillPreparer.java:446)
	at org.apache.twill.yarn.YarnTwillPreparer.createLocalFile(YarnTwillPreparer.java:442)
	at org.apache.twill.yarn.YarnTwillPreparer.createAppMasterJar(YarnTwillPreparer.java:468)
	at org.apache.twill.yarn.YarnTwillPreparer.access$100(YarnTwillPreparer.java:111)
	at org.apache.twill.yarn.YarnTwillPreparer$1.call(YarnTwillPreparer.java:338)
	at org.apache.twill.yarn.YarnTwillPreparer$1.call(YarnTwillPreparer.java:329)
	at org.apache.twill.yarn.YarnTwillController.doStartUp(YarnTwillController.java:97)
	at org.apache.twill.internal.AbstractZKServiceController.startUp(AbstractZKServiceController.java:75)
	at org.apache.twill.internal.AbstractExecutionServiceController$ServiceDelegate.startUp(AbstractExecutionServiceController.java:175)
	at com.google.common.util.concurrent.AbstractIdleService$1$1.run(AbstractIdleService.java:43)
	at java.lang.Thread.run(Thread.java:748)

New debug logging broke scripts

When running the ./bin/cingest script, maven will fail to build the shaded jar. This is due to the accumulo version command now logging debug information. The conf/env.sh script gets the version from the command and after printing it to a file, I saw it was setting accumulo.version to this:

2020-12-03T10:53:35,524 [classloader.AccumuloClassLoader] DEBUG: Using Accumulo configuration at /home/mike/workspace/uno/install/accumulo-2.1.0-SNAPSHOT/conf/accumulo.properties
2020-12-03T10:53:35,593 [classloader.AccumuloClassLoader] DEBUG: Create 2nd tier ClassLoader using URLs: []
2.1.0-SNAPSHOT

Hadoop dependencies not properly converged in shaded jar

When building the testing shaded jar using Accumulo 2.0.0-SNAP and Hadoop 2.8.4 the hadoop dependencies are not properly converged in the shaded jar. Seeing warnings like the following.

[WARNING] hadoop-client-api-3.0.2.jar, hadoop-hdfs-client-2.8.4.jar define 1642 overlapping classes: 
[WARNING]   - org.apache.hadoop.hdfs.protocol.CacheDirectiveInfo$Expiration
[WARNING]   - org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$SetOwnerResponseProto$Builder
[WARNING]   - org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileLinkInfoRequestProto$Builder
[WARNING]   - org.apache.hadoop.hdfs.web.URLConnectionFactory$1
[WARNING]   - org.apache.hadoop.hdfs.protocol.proto.XAttrProtos$SetXAttrRequestProto$1
[WARNING]   - org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ModifyCacheDirectiveResponseProto$1
[WARNING]   - org.apache.hadoop.fs.XAttr$1
[WARNING]   - org.apache.hadoop.hdfs.protocol.CachePoolStats$Builder
[WARNING]   - org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ReportBadBlocksResponseProtoOrBuilder
[WARNING]   - org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetBlockLocationsResponseProto
[WARNING]   - 1632 more...

Add performance test for group commit

When many clients write data to the same tablet server at around the same time their data should be synced to the write ahead log as a group. If group commit is not working properly it can cause performance problem for many clients that will not be seen for a single client. It would be nice to have a performance test that specifically checks for this.

I created a project to do this in the past. Not sure what state it is in.

https://github.com/keith-turner/mutslam

Would be nice to test group commit performance for mutations and conditional mutations.

GC does not restart during agitation

I ran CI locally with agitation and everything seemed to start back up except the Garbage Collector. It looks like the issue is in <testing-home>/libexec/master-agitator.pl

Error running RW in Docker

This error occurs when running any module in rwalk running within docker:

java.lang.RuntimeException: Failed to connect to zookeeper (localhost:2181) within 2x zookeeper timeout period 30000
	at org.apache.accumulo.fate.zookeeper.ZooSession.connect(ZooSession.java:157)
	at org.apache.accumulo.fate.zookeeper.ZooSession.getSession(ZooSession.java:201)
	at org.apache.accumulo.fate.zookeeper.ZooReader.getSession(ZooReader.java:42)
	at org.apache.accumulo.fate.zookeeper.ZooReader.getZooKeeper(ZooReader.java:46)
	at org.apache.accumulo.fate.zookeeper.ZooCache.getZooKeeper(ZooCache.java:148)
	at org.apache.accumulo.fate.zookeeper.ZooCache.access$900(ZooCache.java:48)
	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:406)
	at org.apache.accumulo.fate.zookeeper.ZooCache$2.run(ZooCache.java:379)
	at org.apache.accumulo.fate.zookeeper.ZooCache$ZooRunnable.retry(ZooCache.java:271)
	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:434)
	at org.apache.accumulo.fate.zookeeper.ZooCache.get(ZooCache.java:364)
	at org.apache.accumulo.core.clientImpl.ClientContext.getInstanceID(ClientContext.java:398)
	at org.apache.accumulo.core.clientImpl.Tables.getTableMap(Tables.java:179)
	at org.apache.accumulo.core.clientImpl.Tables.getTableMap(Tables.java:167)
	at org.apache.accumulo.core.clientImpl.Tables.getNameToIdMap(Tables.java:151)
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.exists(TableOperationsImpl.java:192)
	at org.apache.accumulo.testing.randomwalk.bulk.Setup.visit(Setup.java:49)
	at org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:237)
	at org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:48)
	at org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:92)

Scripts print error in Docker

I belive the dockerfile for creating a docker image was broken by changes to the scripts in edbc7cd. Trying to run the cingest script in the docker image gives this error:

/opt/at/bin/cingest: line 19: /opt/at/bin/build: No such file or directory

RW Concurrent test errors on BulkImport for 2.0

I am testing the 2.0 branch at commit 9e32263e7 using Uno and keep seeing this same error while trying to run RW Concurrent module:

2019-06-20 14:29:52,354 [testing.randomwalk.Framework] INFO : Running random walk test with module: Concurrent.xml
2019-06-20 14:30:19,161 [testing.randomwalk.Framework] ERROR: Error during random walk
java.lang.Exception: Error running node ct.BulkImport
	at org.apache.accumulo.testing.randomwalk.Module.visit(Module.java:370)
	at org.apache.accumulo.testing.randomwalk.Framework.run(Framework.java:48)
	at org.apache.accumulo.testing.randomwalk.Framework.main(Framework.java:92)
Caused by: org.apache.accumulo.core.client.AccumuloException: Bulk import  directory /tmp/concurrent_bulk/b_4640b9365aa878f3 does not exist!
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.checkPath(TableOperationsImpl.java:1173)
	at org.apache.accumulo.core.clientImpl.TableOperationsImpl.importDirectory(TableOperationsImpl.java:1197)
	at org.apache.accumulo.testing.randomwalk.concurrent.BulkImport.visit(BulkImport.java:134)
	at org.apache.accumulo.testing.randomwalk.Module$1.call(Module.java:303)
	at org.apache.accumulo.testing.randomwalk.Module$1.call(Module.java:298)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
	at java.lang.Thread.run(Thread.java:748)
2019-06-20 14:30:19,163 [testing.randomwalk.Framework] INFO : Test finished

The BulkImport module is currently using the old deprecated importDirectory but it should still work.

Update CI to use Compaction Planner

Continuous Ingest is still using BasicCompactionStrategy, which has been deprecated in favor of the new compaction code. This needs to be replaced with a CompactionPlanner, most likely DefaultCompactionPlanner will work fine. This may just be a configuration change.

This would be great to do for the 2.1 release testing.

Support generating bulk import data that covers a subset of the table.

Would be nice to be able to generate bulk import data for the CI test that covers a subset of the table instead of the entire table. This may be possible with a -o min=y -o max=z config command line options, not sure. If it is possible, could update example test scripts to suggest using it.

Update Scalability test

It was suggested that the scalability test be run and updated. After a bit of investigation it seems like there is a lot that has changed since the test was in working order and subsequently may take a bit of work to get it in a usable state again.

Before I put too much work into it wanted to make a ticket to see if anyone has an opinion whether this test is still useful and would be worth reviving.

Explore storing continuous ingest bulk import files in S3

When running bulk import continuous ingest test it can take a while to generate a good bit of data to start testing. Not sure, but it may be faster to generate a data set once and store it in S3. Then future test could possibly use that data set.

I think it would be interesting to experiment with this and if it works well add documentation to the bulk import test docs explaining how to do it. One gotcha with this approach is that anyone running a test needs to be consistent with split points. A simple way to address this problem would be store a file of split points in S3 with the data.

Build errors and warnings due to recent property name changes

With the recent property and class name changes to Manager (here I think) builds of accumulo-testing now fail. There are also a few warnings that relate to other recent changes.

[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/Module.java:[46,37] cannot find symbol
  symbol:   class SimpleThreadPool
  location: package org.apache.accumulo.core.util
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Replication.java:[20,1] cannot find symbol
  symbol:   static MASTER_REPLICATION_SCAN_INTERVAL
  location: enum org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/bulk/Setup.java:[29,37] cannot find symbol
  symbol:   class SimpleThreadPool
  location: package org.apache.accumulo.core.util
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/Module.java:[227,35] cannot find symbol
  symbol:   class SimpleThreadPool
  location: class org.apache.accumulo.testing.randomwalk.Module
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Replication.java:[77,22] cannot find symbol
  symbol:   variable MASTER_REPLICATION_SCAN_INTERVAL
  location: class org.apache.accumulo.testing.randomwalk.concurrent.Replication
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Config.java:[87,35] cannot find symbol
  symbol:   variable MASTER_BULK_THREADPOOL_SIZE
  location: class org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Config.java:[88,35] cannot find symbol
  symbol:   variable MASTER_BULK_RETRIES
  location: class org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Config.java:[89,35] cannot find symbol
  symbol:   variable MASTER_BULK_TIMEOUT
  location: class org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Config.java:[90,35] cannot find symbol
  symbol:   variable MASTER_FATE_THREADPOOL_SIZE
  location: class org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Config.java:[91,35] cannot find symbol
  symbol:   variable MASTER_RECOVERY_DELAY
  location: class org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Config.java:[92,35] cannot find symbol
  symbol:   variable MASTER_LEASE_RECOVERY_WAITING_PERIOD
  location: class org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Config.java:[93,35] cannot find symbol
  symbol:   variable MASTER_THREADCHECK
  location: class org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/concurrent/Config.java:[94,35] cannot find symbol
  symbol:   variable MASTER_MINTHREADS
  location: class org.apache.accumulo.core.conf.Property
[ERROR] /home/jeffreymanno/git/accumulo-testing/src/main/java/org/apache/accumulo/testing/randomwalk/bulk/Setup.java:[64,32] cannot find symbol
  symbol:   class SimpleThreadPool
  location: class org.apache.accumulo.testing.randomwalk.bulk.Setup
[INFO] 14 errors 
WARNING] COMPILATION WARNING : 
[INFO] -------------------------------------------------------------
[WARNING] Cannot find annotation method 'since()' in type 'java.lang.Deprecated'
[WARNING] Cannot find annotation method 'since()' in type 'java.lang.Deprecated'
[WARNING] Cannot find annotation method 'since()' in type 'java.lang.Deprecated'
[WARNING] Cannot find annotation method 'since()' in type 'java.lang.Deprecated'
[WARNING] Cannot find annotation method 'since()' in type 'java.lang.Deprecated'
[WARNING] Invalid project model for artifact [commons-vfs2:org.apache.commons:2.6.0]. It will be ignored by the remote resources Mojo.
[WARNING] Invalid project model for artifact [accumulo-hadoop-mapreduce:org.apache.accumulo:2.1.0-SNAPSHOT]. It will be ignored by the remote resources Mojo.
[WARNING] Invalid project model for artifact [accumulo-core:org.apache.accumulo:2.1.0-SNAPSHOT]. It will be ignored by the remote resources Mojo.
[WARNING] Invalid project model for artifact [accumulo-start:org.apache.accumulo:2.1.0-SNAPSHOT]. It will be ignored by the remote resources Mojo.

Versions:

  • Affected version(s) of this project: recent changes of 2.1.0-SNAPSHOT

To Reproduce

  1. Use fluo-uno to start up a cluster with the most recent changes in 2.1.0-snapshot
  2. Configure accumulo-testing to use that cluster
  3. rebuild with maven if necessary and then run the build script in accumulo-testing/bin

Make continuous ingest delete data

apache/accumulo#537 was a WAL recovery bug where deleted data could possibly come back. This flaw was not found through testing. It would be nice to delete data in the continuous ingest test to cover this case. This could be done by periodically deleting a previously written set of linked list. The linked list would need to be deleted in reverse order to avoid false positives in the test. Could do something like the following.

while (true) {
   //write 1,000,000 linked list of 25 nodes
   if(random.nextInt(10) == 0) {
       //delete previously written 1,000,000 linked list of 25 nodes in reverse order
   }

Explore writng a specialized summarizer for bulk ingest

To debug a recent bulk ingest test I wrote the following summarizer. This summarizer counted the number of times each UUID was seen. I used to count the number of entries each map reduce job had created.

package test.ci;

import org.apache.accumulo.core.client.summary.CountingSummarizer;

public class CiUuidSummarizer extends CountingSummarizer<String>{

  @Override
  protected Converter<String> converter() {
    return (k,v,c) -> c.accept(v.toString().split(":")[0]);
  }

}

Simplify accumulo-testing command and create Docker image

It would be great if accumulo-testing could be simplified and run in Docker. Below is possible usage.

$ ./bin/accumulo-testing 

Usage: accumulo-testing <test> (<argument>)

Available tests:

   ci <application>    Runs continous ingest <application>.
                                Possible applications: createtable, ingest, walk, batchwalk, scan, verify, moru

   rw <module>         Runs random walk <module>
                                Modules located in core/src/main/resources/randomwalk/modules

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.