Code Monkey home page Code Monkey logo

hops-util's Introduction

HopsUtil

HopsUtil is a library facilitating development of Java/Scala programs for Hopsworks. It assists the developer by hiding the complexity of having to discover services and setting up security for programs interacting with Hopsworks services. Such services include the Hopsworks REST API, Apache Kafka, Apache Spark etc. For detailed documentation see here. For the python version of this library, see here.

HopsUtil is automatically deployed when users run jobs/notebooks in Hopsworks. If users need to make changes to the library itself, they can build it and provide it as an additional resource to their job/notebook (see doc).

Build

To build HopsUtil you need to have maven installed. Then simply do,

mvn clean package 

which generates under the target directory two archives, a thin jar that is deployed on Hops maven repository and a fat jar containing all the required dependencies to be used from within Hopsworks .

Usage

The latest version of HopsUtil is available in Hopsworks. When creating and submitting a job in Hopsworks, HopsUtil is automatically distributed on all the nodes managed by YARN on which the job will run.

If you want to make changes or append functionality to the library, the new version can be used with the submitted job by providing HopsUtil as a library when creating the job via the job service in HopsWorks. This will override the default HopsUtil available in the platform.

To include HopsUtil in your maven project, you should include the following dependency your application's POM file.

<dependency>
  <groupId>io.hops</groupId>
  <artifactId>hops-util</artifactId>
</dependency>

and the following repository under your repositories list,

<repository>
  <id>Hops</id>
  <name>Hops Repo</name>
  <url>https://bbc1.sics.se/archiva/repository/Hops/</url>
  <releases>
    <enabled>true</enabled>
  </releases>
  <snapshots>
    <enabled>true</enabled>
  </snapshots>
</repository>

API

HopsUtil provides an API that automatically sets up Apache Kafka producers and consumers for both Apache Spark and Apache Flink as well as providing methods for discovering endpoints of various Hopsworks services such as InfluxDB.

Javadoc for HopsUtil is available here.

Job Workflows

It is also possible to build simple Hopsworks job workflows using HopsUtil. The two methods provided are:

  • startJobs: Gets a number of job IDs as input parameter and starts the respective jobs of the project for which the user invoking the jobs is also their creator. It can be used like Hops.startJobs(1);
  • waitJobs: Waits for jobs (supplied as comma-separated job IDs) to transition to a running (default) state or not_running, depending whether an optional boolean parameter is true or not. It can be used like waitJobs(1,5,11);, which means the method will return when all three jobs with IDs 1,5,11 are not running, or waitJobs(false, 1,5,11); which means the method will return when all jobs have entered the running state.

The ID of a job is displayed in the Hopsworks Job Details page, as shown below. Job ID

Example

To create a Kafka Spark StructuredStreaming consumer using HopsUtil is as simple as this,

DataStreamReader dsr = Hops.getSparkConsumer().getKafkaDataStreamReader();

and to gracefully shut it down you can do

Hops.shutdownGracefully(queryFile);

where queryFile is the Spark StreamingQuery object.

Management of topics and consumer groups as well as distribution of SSL/TLS certificates is automatically performed by the utility. The developer needs only to care about implementing the application's business logic. A complete example on how to use HopsUtil for implementing a Kafka Spark-Streaming app is available here.

hops-util's People

Contributors

tkakantousis avatar o-alex avatar berthoug avatar misdess avatar

Watchers

James Cloos avatar Kim Hammar avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.