twosigma / cook Goto Github PK
View Code? Open in Web Editor NEWFair job scheduler on Kubernetes and Mesos for batch workloads and Spark
License: Apache License 2.0
Fair job scheduler on Kubernetes and Mesos for batch workloads and Spark
License: Apache License 2.0
In mesos/monitor.clj, we currently report metrics on user waiting/running jobs/cpus/mem by sending riemann events directly. Instead, we should:
(1) Have a chime process query database and store the results in atoms/async-channels
(2) Have a go-loop that looks at the atoms/async-channels and register/deregister gauges.
Per discussion with @dgrnbrg
We need to test against kerberized hadoop to ensure that this isn't needed--it was added as a kerberos support hack, but its time might've expired.
In running lein run dev-config.edn
I get
2015-09-21 19:21:17,472:22178(0x116b06000):ZOO_ERROR@handle_socket_error_msg@1697: Socket [::1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
2015-09-21 19:21:17,472:22178(0x116b06000):ZOO_ERROR@handle_socket_error_msg@1697: Socket [127.0.0.1:2181] zk retcode=-4, errno=61(Connection refused): server refused to accept the client
which resolves once I start running a Zookeeper locally
2015-09-21 19:21:20,806:22178(0x116b06000):ZOO_INFO@check_events@1703: initiated connection to server [fe80::1:2181]
2015-09-21 19:21:21,191:22178(0x116b06000):ZOO_INFO@check_events@1750: session establishment complete on server [fe80::1:2181], sessionId=0x14ff236287a0000, negotiated timeout=10000
I0921 19:21:21.191696 327958528 group.cpp:313] Group process (group(1)@127.0.0.1:56667) connected to ZooKeeper
I0921 19:21:21.191776 327958528 group.cpp:787] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0921 19:21:21.191836 327958528 group.cpp:385] Trying to create path '/mesos' in ZooKeeper
Is this expected? From the documentation,
Cook is written in Clojure. To develop Cook, all you need is a JVM and Mesos installed and configured. Cook will automatically start embedded copies of the rest of its dependencies."
I thought I would not need any dependencies when running in dev-mode.
instance status should reflect the fact. However, we currently change instance status to failed to kill it. I think this is not ideal because when user sees instance status = failed in Cook, it should be the case that the task indeed fails, that is, cook receives task-failed/task-error/(maybe task-lost in some cases) from mesos.
The places we currently change instance status to failed are:
(1) To preempt a task
(2) To kill a task due to heartbeat timeout
@dgrnbrg let me know what you think
Cook should have a logo!
This can include datomic settings, g1 settings, etc
The first step here is determining the types of queries we do. This issue should be updated with the current list:
The status-related queries require second indices, but we could change instance IDs to be [jobid instanceid]
pairs, so that we only need to implement lookup by job id, and then we'd just store a "document" with the full job & instance state.
Still to be analyzed:
This is necessary so that normal, non-kerberos environments can use cook.
Currently, you need to transact quota changes. These should be in a config file or a REST Api.
This is being worked on in the uris_and_ports
branch. URI support is done; ports support is ongoing.
Usually in libmesos.so
/ libmesos.dylib
(Linux/Mac) are in /usr/lib
or /usr/local/lib
, but in my case, were in my $MESOS_BUILD_DIR/src/.libs/
-- need to understand why this was and document each edge.
I checked out a8e1c67 and tried to run lein run dev-config.edn but it failed to run because of missing dependencies. It seems tags referenced on line 318 of components are not defined anywhere. I commented out the expression, but then I got another error about cook.reporter on line 324 and commented that expression out as well. After those changes I was able to run.
I was able to get it all running in less than 30 minutes, including pulling dependencies and debugging this. Thanks for making it easy.
This will be a showcase for the heartbeat feature in the Cook scheduler.
This will give us an idea of how long it should take to start some number of jobs, of various sizes.
The motivation is to understand how long it should take to launch a Spark cluster, so that we can figure out how multitenancy affects this, and if something special is needed.
#4 will make this 1000x easier.
After discussion with @jcoveney, we've decided to release this to Maven central.
Currently cook scheduler always retries job when it fails. However, sometimes executor can determine a job fails permanently and therefore there is no point in retrying, in this case, we should allow executor to tell cook scheduler not to retry the job.
To implement this, we can leverage data field in TaskStatus. We can start including metadata (a json map, maybe) along with TaskStatus and this will just be a "terminal-failure": "true" entry.
Cook scheduler can simply set job state to complete when it sees "terminal-failure": "true" from a task status
This is meant to test that we can submit some JSON through the rest api, see it hit Datomic, then convert that to a protobuf, then follow the whole roundtrip back. This could catch potentially unknown serialization/format munging bugs, since we represent job data as Clojure datastructures, Mesos protobufs, Datomic datoms, and JSON objects.
On the earliest commits, it seems to have metrics v3 and v2 coexisting. I don't understand why that would work, but it should be possible to build this correctly using compatible versions.
For building multiple projects, this article seems useful: https://lord.io/blog/2014/travis-multiple-subdirs/
This includes using the new Cook Environment variables and URI APIs, integrating the latest code into spark 1.5, documenting the instructions for building off spark 1.5.
This should also add support so that the URI either uses Basic auth or kerberos, depending on if the URI is of the form cook://user:pass@host:port
or simply cook://host:port
.
I believe these are currently always the same number. Some users might get a bigger unkillable quota, but have an equal weighting in the sharing system.
They should be able to be resolved now.
This should be really easy to patch spark!
Here's an example of what the config file could look like:
:federation {:remotes ["http://localhost:12322"]
:priviledged-principal "admin"
:threads 4
:circuit-breaker {:failure-threshold 0
:lifetime-ms 60000
:response-timeout-ms 60000
:reset-timeout-ms 60000
:failure-logger-size 10000}}
The issue is dev/prod datomic needs metatrasaction jar, but currenly metatrasaction is inside scheduler project so I cannot compile a standalone metatrasaction jar.
Currently, the API hardcodes 32 CPUs and 200GB ram for blocking HTTP job submission. The API should instead use the same scheduler constraints that are provided as :task-constraints
.
ljin@hsljin:~/ws/github/Cook/scheduler$ lein uberjar
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 5555; nested exception is:
java.net.BindException: Address already in use
Compilation failed: Subprocess failed
Make sure that we have a sample dev config (should work out of the box) & prod config (should have comments to explain some choices).
This also should have details on the recommended production JVM options, and why to use them (Datomic using extra heap as cache, debugging GC pauses, etc).
All options should be documented in the asciidoc.
I submitted a job via cook to a mesos 0.23 cluster. Everything seems to have worked fine, but the instances[0].status
and framework_id
are not getting set. On the mesos page, I do see the job as running and cook scheduler as a registered framework.
[
{
mem: 16,
max_retries: 3,
max_runtime: 86400000,
name: "cookjob",
command: "while [ true ]; do echo hello cook I am "$(whoami)" and MY_VAR="${MY_VAR}"; sleep 10; done",
env: {
MY_VAR: "foo1"
},
framework_id: null,
instances: [
{
start_time: 1444169356373,
task_id: "cd66e79b-9272-4d54-bbd3-e89cff8c78c0",
hostname: "some.host.domain.com",
slave_id: "20151006-201511-738201772-5050-93146-S8",
executor_id: "cd66e79b-9272-4d54-bbd3-e89cff8c78c0",
status: "unknown"
}
],
priority: 50,
status: "waiting",
uuid: "f76aa5bd-e4bb-4ef3-9ad4-5b2938efc0fd",
uris: null,
cpus: 0.5
}
]
Besides setting the CPUs and Memory for each executor, we should be able to specify additional URIs or environment variables to retrieve for the executor, and the min threshold of running executors to wait for until we start computing.
The reason we show in json will be the enum value, but all lowercase, changing underscores to spaces, and dropping the leading reason
.
Discussed with @icexelloss.
Some of the scheduler tests currently seem to have a deadlock in them.
Cook's transaction functions expect to be able to call functions in Cook proper; the transactor will blow up if you don't do this, so we should document how to do so.
This will require submitting a request to here: https://github.com/travis-ci/apt-source-whitelist
Or we can download and install/unpack/build (or grab binaries) Mesos ourselves
But this also has the problem/downside that to get them added to the whitelist, they seem to need source packages. And to use the cache (necessary for building the package), we'd need to be a paying Travis customer.
This is trickier than I initially thought.
Either this should be configurable, or they're no longer in use. Either way, we should figure them out!
Users would like to be able to query Cook for the list of their running and waiting jobs. We've discussed this at length internally but I'd like to bring this to the open source for design, review and implementation.
Users would like programmatic ability to sort the queue. More generally, can we provide the ability to update any mutable property of a job via an UPDATE method.
This is a wrapper that just slightly changes the HttpClient API. HttpClient already supports every feature in this class.
This should be for things like "only on hosts w/ a specific attribute". This will enable things like GPU or machine class aware scheduling.
This will need to be added to the client-facing API, as well as to the scheduler & db.
@tnachen pointed https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionServer.scala out to me. We could implement this for Cook to simplify submitting prod jobs to Spark via Cook.
We should have a table documenting all the properties that you can configure for the Cook spark binding.
Currently, datomic free edition jars are available in public maven repos, so lein is happy with building against it. But to use datomic pro, one has to maven install the licensed jars in local maven repo before building it. The documentation on that whole process is a little sparse. I found http://aan.io/datomic-pro-and-leiningen/ this useful after googling around.
Current documentation suggests that switching to datomic pro is as simple as s/datomic-free/datomic-pro/g project.clj.
Things like max-runtime
and priority
are being set and written to the DB, rather than picking up defaults in the scheduler. This makes the schema and DB use ugly and inefficient; these should be cleaned up.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.