Code Monkey home page Code Monkey logo

streamflow's Introduction

StreamFlow™

Build Status

Join the chat at https://gitter.im/lmco/streamflow

Overview

StreamFlow™ is a stream processing tool designed to help build and monitor processing workflows. The ultimate goal of StreamFlow is to make working with stream processing frameworks such as Apache Storm easier, faster, and with "enterprise" like management functionality.
StreamFlow also provides a mechanism for non-developers such as data scientists, analysts, or operational users to rapidly build scalable data flows and analytics.

Sample topology

StreamFlow provides the following capabilities:

  1. A responsive web interface for building and monitoring Storm topologies.
  2. An interactive drag and drop topology builder for authoring new topologies
  3. A dashboard for monitoring the status and performance of topologies as well as viewing aggregated topology logs.
  4. A specialized topology engine which solves some Storm complexities such as ClassLoader isolation and serialization and provides a mechanism for dependency injection.
  5. A modular framework for publishing and organizing new capabilities in the form of Spouts and Bolts.

How it works

The following is a simple depiction of the StreamFlow stack. The web interface is built using open source web frameworks and is backed by a series of reusable web services. StreamFlow is capable of authoring and managing topologies dynamically using a series of reusable Frameworks. These Frameworks are simply JAR files comprised of standard Storm Spouts and Bolts with a metadata configuration file which exposes the frameworks. StreamFlow utilizes a custom topology driver which is used to bootstrap and execute a topology along with StreamFlow specific configuration logic.

Concepts

The following is a description of some core StreamFlow concepts and terminology.

Component

Components represent business logic modules which are draggable in the StreamFlow UI. Examples of Components include Storm Spouts and Storm Bolts.

Framework

A grouping of related Components and their associated metadata. Ideally elements of a framework should all be compatible when wired together on a topology as they share the same protocol. Frameworks might be organized around a set of technologies or domains. An analogy would be a Java Library or Objective C Framework. Topologies have frameworks as dependencies.

Resource

A resource is an object used by spouts/bolts in order to externalize common state. For example, an object which represents a technical asset in the environment/cluster such as a database or Kafka queue. Alternatively, a resource might provide an uploaded file or container of global state. Resources should be used to encapsulate functionality outside of a bolt/spout if that information is used in several places in a topology or within multiple topologies. Resources also provide a useful mechanism for injecting parameters, connections, or state into a bolt/spout making the spout or bolt simpler, easier to write, and more testable.

Serialization

Seriaizations allow for the definition of custorm serializers/deserializers. Specifically these serializations should be specified in the Kryo format to properly integrate with Storm.

Topology

Topologies in Storm define the processing logic and link between nodes to describe the data flow. StreamFlow utilizes registered components to allow users to dynamically build topologies in a drag and drop interface. This allows topologies to be built using existing components without requiring additional code.

Find out more

The StreamFlow Wiki is the best place to go to learn more about the StreamFlow architecture and how to install and configure a StreamFlow server in your environment.

https://github.com/lmco/streamflow/wiki;

Here are some quick links to help get you started with StreamFlow:

Questions or need help?

If you have any questions or issues please feel free to contact the development team using one of the following methods.

License

StreamFlow is copyright 2014 Lockheed Martin Corporation.

Licensed under the [Apache License, Version 2.0] license (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

This product incorporates open source software components covered by the terms of third party license agreements contained in the /Licenses folder of this project.

Documentation Version

Last Updated: 1/7/2015

streamflow's People

Contributors

christopherlakey avatar gitter-badger avatar juliencruz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

streamflow's Issues

Flink

Apache Flink is quickly gaining momentum as an alternative to Spark Streaming, Storm, etc.

Apache Flink is an open source platform for distributed stream and batch data processing. Flink’s core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams.

What are your thoughts on developing a plugin for Flink Streaming in StreamFlow? The rationale is that Flink provides a Storm compatible API:

Flink provides a Storm compatible API (org.apache.flink.storm.api) that offers replacements for the following classes:

TopologyBuilder replaced by FlinkTopologyBuilder
StormSubmitter replaced by FlinkSubmitter
NimbusClient and Client replaced by FlinkClient
LocalCluster replaced by FlinkLocalCluster

In order to submit a Storm topology to Flink, it is sufficient to replace the used Storm classes with their Flink replacements in the Storm client code that assembles the topology. The actual runtime code, ie, Spouts and Bolts, can be uses unmodified. If a topology is executed in a remote cluster, parameters nimbus.host and nimbus.thrift.port are used as jobmanger.rpc.address and jobmanger.rpc.port, respectively. If a parameter is not specified, the value is taken from flink-conf.yaml.

Tuple Generator Problem

How can I run a tuple generator spout? There is no source code in streamflow.spout.core.TupleGenerator, but core framework defines it in framework.yml.

Notification system enhancements

Currently, error notifications for framework uploads and topology submissions do not provide contextual information as to the source of the error. This enhancement will improve the error notification system to provide detailed information about the problem so it can be resolved more quickly.

During framework upload the framework.yml file will be analyzed for correctness and ensure any required properties are specified. In addition, any class names referenced in the configuration will dynamically loaded to ensure all classes exists as defined and to catch a class loading error that will currently manifest during topology submission.

During topology submission, the streamflow engine will verify that all specified dependencies (e.g. frameworks, resources, files) are available and can be built into the final topology jar. In addition, all topology properties will be validated to ensure all required properties have been specified and configured properly.

In each of the above cases when an error is found, the notification will provide detailed information to correct the problem.

Wiki contribution: suite of links for How-It-Works#key-technologies

In rektide/streamflow-wiki@e3951ee6 I took the liberty of going through the POM for Streamflow and extracted project names and links that should serve as a viable start to the presently blank section of the wiki, https://github.com/lmco/streamflow/wiki/How-It-Works#key-technologies . Please consider merging the references I'm providing into your wiki.

Your wiki is very well skeletoned out, and I look forward to seeing it fleshed out from there; I'm expecting a ton of great, interesting content.

I've also filed isaacs/github#333 in hopes that maybe someday there'll be a better process for contributing to wikis than this weird filing an issue & linking you the commit to merge yourself.

support storm 1.0

let streamflow to support storm 1.0 and use storm-ui‘s api to get the cluster's info and the topology's metric

Security enhancements

The existing Shiro authentication mechanism should be improved with the following features to enhance security:

  • Lock user account after providing an incorrect password a specific number of times in a given time window.
  • Allow custom realm implementations to provide a custom dao implementation to create/edit/retrieve user accounts. This is necessary to support using custom implementations which store user data such as LDAP and Active Directory.
  • Add support for user account pictures in the user service

Cleanup user created entities after delete

When user accounts are deleted, user created entities such as topologies and resource entries are not deleted. To prevent the creation of zombie entities, these entities should be deleted when user accounts are deleted.

Topology structure can be export and import

When we create a topology and build the topology. then we want to move this topology to another computer which install the streamflow. We want to export the structure of this topology and import this topology in another streamflow. I suggest it has these function

Topology kill dialog

The current implementation of the topology kill feature in the UI simply kills the topology using a background request. Due to time it may take a topology to be fully killed, a kill dialog should be displayed allowing the user to specify the kill wait time in seconds. Once the topology kill command has completed, the dialog should update with the status.

Add results paging and search to all listing pages

Add support for paging and text search for all pages in the user interface which provide a list of results. This change will better support long lists and allow for search when the lists get too long to manage. The paged results should be sortable by name or last modified date.

Change topology connector styling to use theme

The topology connectors implemented in jsPlumb currently do not use LESS styling. This results in changes to the theme not being reflected in the connectors of the topology editor. This is the only section of code not changed when updating the Bootswatch theme and should be changed to allow for easier theme modifications and updates.

Document public REST API

Javadoc comments need to be developed for all API worthy classes. REST documentation should be included in this Javadoc documentation to produce REST API documentation.

File upload manager

The file upload feature does not currently have a method to delete previously uploaded files. Rather than support inline file upload, the user should have access to a file manager user interface which allows for the creation of folders and files. The file manager should be available as a separate tab in the user interface and files should be accessible from the properties dialog.

Spark Streaming and Flink Integration

This is a very nice project. Thank you guys !

I am thinking of integrating Spark Streaming and Flink in to StreamFlow. So that the end user can choose the streaming framework of their choice from StreamFlow.

It would be of great help if you can give some pointers on which all are the classes/packages we need to expand or consider to accomplish above.

Thank You

Persistent topology monitor to ensure uptime of submitted topologies.

Currently, when a topology is submitted by a user from streamflow there are no protections that it will continue to run if there is a cluster outage.

This feature should implement a persistent service within the streamflow server that ensures that a topology that has been submitted will be resubmitted if the topology goes down for any reason. This will improve topology uptime in the event that a storm cluster goes down and comes back up for any reason.

Once a user triggers the kill operation in streamflow, the topology will be flagged as killed and should no longer be auto deployed.

The streamflow configuration should allow for enabling/disabling of this feature and controlling the polling interval of the service. A suggested config format is as follows:

monitor:
    enabled: true
    pollingInterval: 60

Custom property types in frameworks

Allow users to provide javascript implementations and HTML implementations of custom property types. This will help users inject new property types when the provided set of properties do not cover their needs. The custom types should be registered in the framework.yml file which will allow the UI to reference the provided implementations.

This enhancement will require refactoring of the existing property system to be more modular and utilize this system to reimplement the existing property types. Changes will be required in the framework service to support parsing and validation of this new capability so it can be registered in the UI.

Framework annotation enhancement

Currently the only way to register new frameworks is through the creation of a framework.yml file. It would be best if the user could also define frameworks through the use of custom annotations in the Spout or Bolt implementations. During framework upload, the framework service could scan for new Spouts and Bolts using these annotations and register them in the UI. This would simplify the process to register new Spouts and Bolts and reduce the chances of misconfiguring the framework.yml file.

This would require developers to import a custom streamflow library containing all available annotations. Changes would also be required in the framework service to properly parse the framework annotations.

JDK 1.7

Can we upgrade to StreamFlow JDK 1.7?

FlinkSQL Integration

This is feasible,

We'd need to create a Kafka table using DDL. To try this run:

docker-compose exec kafka bash -c 'kafka-console-consumer.sh --topic user_behavior --bootstrap-server kafka:9094 --from-beginning --max-messages 10

From there you'll see a boot screen, then finally the dashboard:

image

Font Awesome Icons don't work

See screenshot. Maybe a version issue.
screenshot-132 230 3 45 8080 2015-12-25 16-50-23

P.S. All my respect to a really good and professional made project. Wiki is one of the best I ever saw. You have to be paid twice.

Upgrade Storm to 0.9.5

Upgrade the storm-core references to utilize version 0.9.5. Associated Kafka dependencies should be updated as well to match that of Storm 0.9.5

Topology download and upload

Support download of compiled topologies which combines framework jars, resources, and topology configuration data. This topology download should produce a single jar and should be executable outside of StreamFlow using the Storm binary executables

Conversely, upload of topologies should allow for upload of non-StreamFlow topologies which do not allow modification in the topology editor. These native Storm topologies can only be submitted or killed in the StreamFlow dashboard. StreamFlow topology jars which are uploaded will have the option of installing inbuilt dependencies such as framework jars and resources. Users should be prompted to replace each of the dependencies to ensure more up to date entities are not replaced.

Refactor topology metrics feature

The existing topology metrics capabilities have been refactored out of the user interface due to changes in the storm engine. This feature should be modified and retested to ensure all topology metrics are reported in the user interface. Due to the increased Storm cluster integration, native Storm metrics should also be retrieved using the new JSON API.

The web interface should be updated along with the server code to display useful metrics to the user.

need help : facing problem while launching in windows

Hi

Thanks a lot for this beautiful app for learning.

I dont think this was a bug. please find below problem which i am facing,

i can able to build in windows machine but unable to launch bat file(opening cmd prompt and closing immediately ).

as per the instructions

For Windows Systems

copy streamflow-{VERSION}.zip /opt
cd /opt
// Unzip using Windows explorer or using your installed compression library

but in windows we doesn't have any /opt like linux.

Thanks in advance.

Improve code coverage of server and web code

Additional unit and integration tests are required for the streamflow-engine module to increase code coverage of the server code. This task should also ensure that all existing test cases are up to date and properly test each critical piece of code.

AngularJS test cases should be implemented to verify controller logic in the web code.

Toolbar size not recalculated during resize

When the browser window is too small, it is possible for the toolbar in the StreamFlow UI to span multiple lines. When this occurs, the toolbar offset is not recalculated causing some sections of the page to be hidden by the toolbar. The toolbar should take into account its actual height and adjust the offset of pages accordingly.

Support annotation based configuration to replace framework.yml

Currently, StreamFlow requires the definition of a framework.yml file to register new components in the framework.

It would allow for much quicker development to also include the ability to define the component settings such as the name/label/description at the class level using custom annotations. In addition, properties could also be annotated used field annotations on the properties directly.

If possible, it would be good to support the annotation approach AND the framework.yml approach in case user's have a specific preference or are working with legacy code.

Bind UI actions based on browser type

The topology editor currently requires clicking on each component in the canvas in order to delete or edit it. The click event rather than the hover event was used to display the component icons simply to support mobile devices which do not trigger hover events on elements.

It would be preferable to allow non-mobile browsers to bind to both hover and click events to make selection of the options more intuitive. Mobile browsers will only support the click event for selection of components. These changes will require binding the event listeners based on the browser type in the AngularJS code.

Internet Explorer issues

Fix the following Internet Explorer specific issues in the web interface.

  • IE8 Dragging elements highlights text as it drags (fixed this is streams)
  • IE8 Parallelism box is taking style from "input" instead of btn class
  • IE8 Component accordion showing properly. Group button does not change accordion
  • IE8 Ignore changes is very slow when leaving topology editor
  • IE8/IE9 Framework upload is not completing or showing progress
  • IE8/IE9 Topology properties are not saving properly.
  • IE8/IE9 Notification does not display at all
  • IE8/IE9 Caching service responses which causes invalid list states

Use moment.js for all date output

Dates are current displayed in typical date format (e.g. July 4, 2014 08:00 EST). It would be preferable to display dates in a more natural format using moment.js (e.g. 5 minutes ago, a moment ago).

This will require use of an existing open source AngularJS directive or development of a new directive to support use of moment.js date formats.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.