Code Monkey home page Code Monkey logo

data-acquisition's Introduction

Data Acquisition Service - DAS

DAS initiates and manages the operations of downloading and parsing data sets. It is a Spring Boot application build by maven

Required libraries

Following libraries are necessary to successfully build DAS:

  • metadata-parser

Required services

DAS requires following services to function properly:

  • Downloader
  • Metadata parser
  • User management

Running DAS locally - demo

To run and test DAS locally you need to install and run User management, Downloader and Metadata parser services. Moreover, you need to publish Metadata parser in version defined in DAS pom.xml to your local maven repository.

User Management

  • pull user-management from git repository
  • run service from command line with the following parameters:

mvn spring-boot:run -Dspring.cloud.propertiesFile=spring-cloud.properties

Downloader

  • pull downloader from a git repository
  • run service from command line with following parameters:

DOWNLOADS_DIR=/tmp SERVER_PORT=8090 mvn clean spring-boot:run

Metadata parser

  • pull metadata-parser from a git repository
  • publish artifact to local repository

mvn install

  • run service from command line with the following parameters:

DOWNLOADS_DIR=/tmp mvn spring-boot:run

DAS

  • pull data-acquisition from a git repository
  • run service from command line with the following parameters:

DOWNLOADER_URL="http://localhost:8090" DOWNLOADS_DIR=/tmp mvn clean test spring-boot:run

where:

  • DOWNLOADER_URL - we need to turn Downloader into valid CF service and it should make it easy to connect
  • DOWNLOADS_DIR - this is a folder where object store will put downloaded content. It needs to be shared between DAS and Downloader

To run a simple demo use script ./tools/curl.sh : ./curl.sh <das_app_url> <data_set_uri> <oauth_token> (for data_set_uri - only http/s is implemented)

EXAMPLE:

./curl.sh localhost:8080 https://www.quandl.com/api/v1/datasets/BCHARTS/BITSTAMPUSD.csv "`cf oauth-token | grep bearer`" <organisation's UUID>

This will download requested csv file and save it to memory into /tmp directory

  • You might see in Metadata parser's logs following exception: ResourceAccessException: I/O error on PUT request for "http://localhost:5000/rest/datasets/(id)":Connection refused; To get rid of that exception you need to run Data catalog service along with Elasticsearch

Deployment

There are two manifest files.

  • manifest.yml uses queues in memory
  • manifest-kafka.yml ueses kafka queues

data-acquisition's People

Contributors

akdajnowski avatar jakubzembik avatar kbalka avatar mbultrow avatar tciunel avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.