Code Monkey home page Code Monkey logo

backup-to-swift's Introduction

HDFS/SWIFT integration

Initial configuration

  • Download the correct version of hadoop-openstack integration lib from here or use the attached jar
  • Place it to the folder containing hadoop libraries on every node (if CDH is used then this folder would be /opt/cloudera/parcels/CDH/lib/hadoop)
  • Add lines from configuration.xml to core-site.xml configuration file (if CLoudera Manager is used, go to YARN configuration, search for YARN Service Advanced Configuration Snippet (Safety Valve) for core-site.xml and add the lines to the section, then restart cluster). Note that you need to specify correct parameters specific to your environment: PROVIDER, AUTH-URL, REGION-NAME, TENANT-NAME
  • Test the integration using commands like below:

Copy data from HDFS to Swift:

hadoop distcp -D fs.swift.service.<PROVIDER>.username=<username> -D fs.swift.service.<PROVIDER>.password=<api-key> -update -p <PATH-ON-HDFS> swift://<CONTAINER-NAME>.<PROVIDER>/<OBJECT-NAME>

Copy data from Swift to HDFS:

hadoop distcp -D fs.swift.service.<PROVIDER>.username=<username> -D fs.swift.service.<PROVIDER>.password=<api-key> -update -p swift://<CONTAINER-NAME>.<PROVIDER>/<OBJECT-NAME> <PATH-ON-HDFS> 

Note: Parameters fs.swift.service.<PROVIDER>.username=<username> and fs.swift.service.<PROVIDER>.password=<api-key> can be added to core-site.xml config file, and once added there is no need to specify them via -D options in hadoop distcp command.

Benchmarking

  • Copying 100GB data from HDFS to Swift: ~43 mins

Oozie workflow

You can use Oozie workflow to perform regular backup from HDFS to Swift.

Please follow the below steps to enable the workflow on your environment.

  • Clone the repo to your environment and cd to workflows/hdfs-to-swift
  • Edit hdfs-to-swift.properties file and specify parameters specific to your environment
  • Edit sourcelist.txt file and specify HDFS directories to backup
  • Edit distcp.sh file and specify parameters specific to your environment (edit lines 4, 5, 7 and 12)
  • Copy workflow directory to HDFS:
hdfs dfs -put -f workflows /user/$USER
  • Run coordinator job as below:
oozie job -oozie http://<oozie-server-hostname>:11000/oozie -run -config workflows/hdfs-to-swift/hdfs-to-swift.properties
  • Monitor the coordinator and workflow jobs via HUE or Oozie WebUI.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.