elodina / exhibitor-mesos-framework Goto Github PK
View Code? Open in Web Editor NEWThis project forked from ciscocloud/exhibitor-mesos-framework
Exhibitor on Apache Mesos for reliably running Zookeeper on Mesos
License: Apache License 2.0
This project forked from ciscocloud/exhibitor-mesos-framework
Exhibitor on Apache Mesos for reliably running Zookeeper on Mesos
License: Apache License 2.0
2015-08-04 18:34:16,104 [Exhibitor] INFO org.mortbay.log - jetty-1.5.5
2015-08-04 18:34:16,537 [Exhibitor] INFO org.mortbay.log - Started [email protected]:31000
2015-08-04 18:35:15,786 [ActivityQueue-0] INFO com.netflix.exhibitor.core.activity.ActivityLog - State: down
2015-08-04 18:35:15,792 [ActivityQueue-0] INFO com.netflix.exhibitor.core.activity.ActivityLog - Attempting to stop instance
2015-08-04 18:35:15,792 [ActivityQueue-0] INFO com.netflix.exhibitor.core.activity.ActivityLog - Attempting to start/restart ZooKeeper
2015-08-04 18:35:15,997 [ActivityQueue-0] INFO com.netflix.exhibitor.core.activity.ActivityLog - jps didn't find instance - assuming ZK is not running
2015-08-04 18:35:15,999 [ActivityQueue-0] ERROR com.netflix.exhibitor.core.activity.ActivityLog - Monitoring instance
java.io.IOException: Could not find (.log4j.)|(.slf4j.) jar
at com.netflix.exhibitor.core.processes.Details.findJar(Details.java:145)
at com.netflix.exhibitor.core.processes.Details.(Details.java:57)
at com.netflix.exhibitor.core.processes.StandardProcessOperations.startInstance(StandardProcessOperations.java:105)
at com.netflix.exhibitor.core.state.KillRunningInstance.completed(KillRunningInstance.java:41)
at com.netflix.exhibitor.core.activity.ActivityQueue$1.run(ActivityQueue.java:127)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2015-08-04 18:34:16,637 [Thread-1284] INFO ly.stealth.mesos.exhibitor.Scheduler$ - Adding server 2 to ensemble
2015-08-04 18:36:23,929 [Thread-1284] INFO ly.stealth.mesos.exhibitor.Scheduler$ - Exhibitor API not available.
2015-08-04 18:38:32,184 [Thread-1284] INFO ly.stealth.mesos.exhibitor.Scheduler$ - Exhibitor API not available.
2015-08-04 18:40:40,440 [Thread-1284] INFO ly.stealth.mesos.exhibitor.Scheduler$ - Exhibitor API not available.
It makes sense to autofill zookeeper-install-directory
, zookeeper-data-directory
and zookeeper-log-directory
properties and disable the possibility of setting them via CLI.
These values can be any because we create symbolic links to sandbox anyway so we could probably just hardcode them to be like /tmp/zookeeper-$frameworkid
, /tmp/zookeeper-data-$frameworkid
and /tmp/zookeeper-log-$frameworkid
or something
E.g. add and configure 5 servers, then run ./exhibitor-mesos.sh start 0..4
will time out even though servers start within the timeout
The same way as happens with zookeeper-install-directory - listen for config changes and create symlinks for these dirs
For now only framework id is persisted to storage which won't allow to recover from failures in desired way. Cluster state should be saved also to be able to recover.
I'm running the scheduler inside of a docker container. It runs fine on an openstack vm but doesn't respond to any commands when run on bare metal. Both machines are running ubuntu 14.04. The docker container is built from ubuntu 15.04 with this version of java:
java version "1.8.0_74"
Java(TM) SE Runtime Environment (build 1.8.0_74-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.74-b02, mixed mode)
Iโm able to connect to the port via netcat and other tools. When I run any cli command like status it just hangs. Any input or ideas would be appreciated. I should also not that I've tried running the container with --net=host and bridged networking with the same results.
Now the Running status update is being fired once the Exhibitor starts, but this does not mean Zookeeper itself is started by this time. Should investigate whether it is possible to check if Zookeeper is up via Exhibitor API
this happened after #5 api should be on another thread
seeing the offers being declined is very handy
Now Exhibitor uses Mesos-offered port only for Exhibitor UI. Client, connect and election ports should be also taken from offers and these ports must be the same across all running instances.
As a defensive mechanism we might probably also track changes of these ports (they can be changed via Exhibitor UI) and set them back to mesos-offered values
after the error in #5 the scheduler didn't release (ever) any of the resources that it had. the resources should be on another thread never getting into situation that it can hold them
we need to have the scheduler be able to survive a failure, to-do this the storage state needs to be zk
We should block until started
This is a follow-up to #11 and this must provide a way to restore cluster from information stored in file/zk
When exhibitor comes up the config has the fqdn in the config. Exhibitor pulls the hostname here: https://github.com/Netflix/exhibitor/blob/master/exhibitor-core/src/main/java/com/netflix/exhibitor/core/Exhibitor.java#L107
That call is returns the short hostname which cause it to not find itself in the cluster's config.
I'm not sure if this is something done by the framework. Do you know what sets up the config in the parent zookeeper?
Thanks in advance and let me know if you need anymore information on this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.