So this is a project using Storm. I want to collect data via a REST API, do some computation and return a result to some web ui, ideally via sockets.
There are going to be two main components - a web server to collect REST responses, and manage the socket to the web UI. The computation will be performed by Storm. The two components can communicate via distributed RPC.
I'm going to need state, eventually, and DRPC is best done via Trident, so we have an extra player. As it happens, Trident looks fairly easy to use, and to talk to it via Clojure (because why not), we're going to use a library called Marceline.
Note that although these are cluster instructions, I'm actually only running on one machine currently.
- Set up a machine(s) ready to become a storm cluster.
- Make sure the necessary inbound ports are open. In particular you'll want ports open for at least SSH (22), HTTP (80), Storm UI (8080) and DRPC (3772). AWS doesn't do anything if the port is closed, just hangs...
- If you're using AWS, remember to connect using the public DNS - IP didn't seem to work.
- Install JDK and Python
- Set up a Storm cluster ready to run our code. Instructions from here
a) Set up Zookeeper according to this
b) Download a Storm release from here. It should probably match the codebase? Not sure, but I'd go for 0.10.0 to be safe.
c) Extract somewhere, and create a file conf/storm.yaml containing the following:
storm.zookeeper.servers:
- "localhost"
storm.local.dir: "~/"
nimbus.host: "localhost"
## Locations of the drpc servers
drpc.servers:
- "localhost"
Obviously if you're running multiple machines in your cluster things might look a little different, but the principle should be the same.
d) Run the various storm components. I did this using screen
so that I could see the output of each command separately and restart them in-place if necessary. For production you'd want a script and a supervisor to revive them when they die. The four components I ran (I don't know if all are necessary) are:
$ ./bin/storm nimbus
$ ./bin/storm supervisor
$ ./bin/storm ui
$ ./bin/storm drpc
At this point the cluster should be ready.
-
Now we need to deploy the code we wrote so that we can have fun output! To do that, we need to package it up into a JAR. Leiningen does that with
lein uberjar
. We need to specify a main class though, which is currentlymain.java.testclj.marceline-test
. Note that there is some confusion because our Clojure namespace contains a hyphen, which turns into an underscore in the Java class. Also I manually deleted the automatically-bundled Storm dependencies, but I don't know if that was critical. The self-contained JAR (with SNAPSHOT in the name) can now bescp
'd over to the main cluster. -
Finally do the deployment:
bin/storm jar ../testclj-0.1.0-SNAPSHOT-standalone.jar main.java.testclj.marceline_test myfoo
Note the underscore! Took me a while to spot that one... Alsomyfoo
is the name for the topology but I don't know what it achieves beyond showing up in the web UI. -
Now you can run the server in ./server (currently with
node index.js
though I might switch to Python for fun). Joy!
Download from http://example.com/FIXME.
FIXME: explanation
$ java -jar testclj-0.1.0-standalone.jar [args]
FIXME: listing of options this app accepts.
...
...
Copyright © 2016 FIXME
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.