kensuenobu / scattersphere Goto Github PK

View Code? Open in Web Editor NEW

32.0 4.0 3.0 337 KB

Simple Job Workflow API for Tasks

License: Apache License 2.0

Scala 99.27% Shell 0.73%

scala job-scheduler dag completablefuture-api spark mesos-client

scattersphere's People

Contributors

Stargazers

Watchers

Forkers

pgaref placrosse infinityxx

scattersphere's Issues

Implement auto-incrementing Job ID

Implement a Job ID using an AtomicInteger or AtomicLong. This feature will not immediately be used, but it will be available for debugging and testing, as well as distribution, later down the line.

Cleanup-3

Cleanup code, move println statements to Scala logger.

Handle Job Cancellation

Add the ability to cancel a job, recording the state in the Tasks and the Job root.

Tasks:

Obey job.queue().cancel()
Add hooks to each RunnableTask to handle onCancel for Task cleanup
Add TaskCanceled TaskStatus object
Add CancelationJobTest to test the cancelation code

Add Task Statistics

Add the ability to see how long each Task takes to run - elapsed, and include start/end times.

Add builder to Task

Add the builder pattern to Task:

Task.builder()
    .addTask(mytask)
    .dependsOn(task)
    .dependsOn(task)
    .build()

Add Spark-based Integration Test

Add a Spark integration test to the codebase. Need to create a Workflow in CircleCI so that it starts up an instance of Spark (via Docker) in a separate workflow, then the integration tests submits jobs to the local Spark instance, waiting for jobs to finish.

Publish to Maven Central

Publish Scattersphere to the Maven Central repository at the end of the development cycle for this milestone.

Change Enumeration Classes to Sealed Trait

Instead of using an object TaskStatus extends Enumeration, use the following pattern:

sealed trait TaskStatus {
  def reason: String
  def cause: Option(Throwable)
}
final case class Queued extends TaskStatus {
  val reason = "Queued"
  val cause = None
}
final case class Canceled(reason: String) extends TaskStatus {
  val cause = None
}
final case class Failed ...
final case class Completed ...

... et al ...

Investigate better methods to build task tree

Indicated in a Reddit post, an algebraic way of building the task tree would be a more preferred method. This does not mean strictly a = b + c - it means combining functional programming practices to build the tree.

Create Status Classes for Tasks

Create a status class for a task when running in the Scattersphere space. This way, you one can tell if the task is queued, running, or in a finished state. This queued state should be queryable at any point.

Statuses should include, but not be limited to:

QUEUED,
RUNNING,
FINISHED

The following behavioral changes to JobExecutor need to be implemented:

Change all tasks that have been added to QUEUED state
Before runAsync, thenRunAsync or thenRun need to be changed to RUNNING state
After the onFinished function is called, change state to FINISHED

Add ability to load Jars into JRE that contain jobs

Add the ability to register a Job or Task set from a Jar file into the JobExecutor.

Scattersphere Documentation Website

Create a website for the Scattersphere documentation that is slightly more complete than the Wiki, and better formatted. Adopt the same style as Orchestra and others?

Make RunnableTask easier to use

Instead of using the notation:

    JobBuilder()
        .withTask(new MyTask() with RunnableTask)

I would like to use:

    JobBuilder()
        .withTask(RunnableTask(new MyTask()))

or something a'la Monix Task:

    JobBuilder()
        .withTask(RunnableTask { ... code here })

Create Pause-controllable Runnable in JobExecutor

Instead of using a single CompletableFuture per JobExecutor instance, make it so that multiple jobs can be created and executed. Every time a queue() call is used, this will queue up the Job to run by the name, returning the controller for that Job.

JobExecutor should introduce an additional function called registerJob which registers the Job by the name specified in the Job.name field. queue(jobName: String) will change to take a new parameter, which queues up the Job specified by the name.

queue(jobName: String) on a Job that does not exist by name will throw a JobNotFoundException with the Job's name.

Create DelayedTask to run a task after a timeout

Add the ability to run a task after a timeout in milliseconds.

Implement a better Execution Service

Testing on servers that have fewer cores makes the jobs hang due to the fact that they cannot fire off jobs/threads appropriately. Make an Execution Service (or decide on a flexible one - that does not drop threads) that is more accommodating.

Register Job with JobExecutor for execution by name

Extend JobExecutor so that Jobs can be registered by name, and executed by name.

Create SparkTask for Spark Server Testing

Create a SparkTask that takes a SparkConf or SparkContext, and allows submission of the Task to it. Anything in the body of the Task gets submitted to Spark and run on that server. Once the server finishes running the Task, the Task terminates as per usual.

Add RetryableTask

Add the ability for a RunnableTask to be retried if a failure occurs. Failures on some RunnableTasks can be transient errors (network down, file not found, etc), so this option needs to be available.

Create MesosTask for Mesos Server Testing

Similar to SparkTask, will take a Mesos configuration object, submit the contents of the job in the Task run() method, then wait for it to complete. Needs more discussion.

Create a RepeatingTask

RepeatingTask should provide the ability to run a Task multiple times. There should be a counter for the number of times the Task has repeated, and a limit.

Sanity Check on JobExecutor when adding tasks

Add a sanity check on the JobExecutor to make sure:

Task names are not duplicated
Tasks have been created when dependsOn is referred

Create new Exception classes:

DuplicateTaskNameException
InvalidTaskDependencyException

Add builder to Job

Add the builder pattern to Job:

Job.builder()
    .withName("myName")
    .withTask( Task.builder()
         .addTask() ... dependsOn() ... build() )
    .withTask ( Task.builder()
         .addTask() ... build() )
    .build()

Create a shell task

Create a task that can be easily used for invoking a shell command.

Implement auto-incrementing Task ID

Task IDs can be auto-increment using an AtomicInteger or AtomicLong for the time being. This will come in use later when debugging/persistence of data is implemented.

Cleanup 7

Cleanup code, make some methods a little easier to use when building tasks and jobs. Use defaults when building if a name is not supplied.

Convert CompletableFuture to Scala Future/Promise

Instead of using Java's CompletableFuture which will potentially become incompatible due to Java/Scala interoperability with conversions, switch to pure Scala instead.

Add onStatusChange event to RunnableTask

Add the ability to track when a status change takes place in a RunnableTask. You can use this to track the status change, but you cannot override it.

Create SparkConf/SparkContext/SparkSession for Jobs

Allow the ability for a SparkConf and/or SparkContext to be created once, and shared amongst Jobs (not Tasks). This will allow for a Job to be submitted to a running server, allow for the job to wait, then kick off multiple Tasks based on the DAG. Apply to SparkSession, as that is where the context is created.

This will require a lot of testing, as this can get very complex very quickly.

Add run() method to JobExecutor

Instead of having the JobExecutor return a CompletableFuture[Void] that the programmer can change, instead return JobExecutor that controls the CompletableFuture.

This way, the programmer now runs run() method in order for the Job to start and run. Right now, when creating a CompletableFuture with the first entry point as runAsync, this automatically triggers the job to run.

There needs to be a mechanism in place that will cause the Task to wait indefinitely until the run() method is triggered. This will be one step ahead of the first task, but programmatically, should not make a difference, as it is coded into the JobExecutor.

Example:

JobExecutor jExec = new JobExecutor("job", Seq(job)).queue()

This will not do anything else until you call

jExec.run()

This will run the job.

Add ability to pause/resume execution

Add pause() and resume() calls to the JobExecutor class. This will call the appropriate pause() and resume() calls in the PausableThreadPoolExecutor class.

Create several Spark jobs for testing SparkConf/SparkContext

This will go along with testing, meaning, CircleCI needs to start an instance of Spark as part of the workflow step. It will need to kick off a Spark server instance as a Docker image, then allow for tests to run. Once tests have been completed, it will need to signal to the Spark instance that the jobs are done. (Maybe a write-up on a blog somewhere to explain how to do this well?)

Add status to Job

Add status to job, indicating whether or not an entire job is completed.

Queued, Running, Failed, Finished
Runs as a last task on CompletableFuture

Complete ScalaDoc

Make sure the following documentation is in place:

More complete, more comprehensive documentation in each class, complete with working examples
Include Job diagrams as part of the site
Use mvn site to build the entire site's documentation
Add class package documentation

Cleanup 8

General clean-up of 0.0.3 codebase.

Add Job Statistics

Add statistics to see how long a Job took to run. Include elapsed time, and start/end times.

Create Job Payload that can be shared amongst Tasks

Create a Job Payload object that can be shared amongst Tasks. This will eventually be the payload that is serializable (either as the entire payload, or in chunks as Spark handles it.)

This is not distributed, meaning, the data format will not be one that is serializable over a network. At least, not yet. That will be coming in a later release.

Add ability to cancel a job

Add cancel(reason: String) to JobExecutor. This will:

setStatus(TaskCanceled) with cancelation reason
setJobStatus(JobCanceled) with cancelation reason
Calls resume() on the PausableThreadPoolExecutor
Calls either cancel() or interrupts the currently running thread (research)
Modify runTask() to check for canceled status, and set cancel reason appropriately

Wrap Runnable in custom Job Runner class

Instead of submitting a job via a Runnable interface to JobExecutor inside the Job class, it should be more intelligently wrapped in a JobRunner class. This way, the job can have some direct control over how it's run.

Potentially, we will have the ability to stop/cancel a job, and run an "onFinished" method that gets called depending on the status that occurs after a job finishes.

Just implement the "onFinished" method for this code. Other methods will come as they are developed.

Add Docker Support via Tasks

Add the ability to launch Docker containers via Tasks.

Publish artifact to maven central

Somehow I cannot find any artifact for scattersphere on MavenCentral.

Thanks

Provide some real-world test cases

Instead of just checking for code completeness, add some real world example tests so that task coordination can really be demonstrated well.

Handle Exceptions

Need to add code to handle .exceptionally() on the CompletableFuture code when creating the DAG. Exceptions must:

Be handled cleanly
Record the exception as being handled
Create a new status called FAILED
Set the task that caused an exception to:
- Set all remaining QUEUED status tasks to FAILED as well
Store the reason for failure in the Job, not the Task, if possible

Cleanup #2

Clean up tasks, reduce repeated code (in JobExecutor), add setBlocking and isBlocking to JobExecutor, change setXXX and getXXX to setXXX and XXX, respectively.

Simply creation of Task/Job/JobExecutor

Right now, it's a three step process. Get it down to 1 or 2 steps if possible, if it makes sense. We may want to keep the process separate due to options and settings for each step of the way.

Maybe make a simple process, then a more controlled process: simple being "create a job with this task, without creating a job builder or task builder." The controlled process being the way it is now.

Simple process could be done via the JobExecutor, possibly. Create a factory object that will do it for us. 🤔