Code Monkey home page Code Monkey logo

hdfs's Introduction

HDFS for Go

GoDoc build

This is a native golang client for hdfs. It connects directly to the namenode using the protocol buffers API.

It tries to be idiomatic by aping the stdlib os package, where possible, and implements the interfaces from it, including os.FileInfo and os.PathError.

Here's what it looks like in action:

client, _ := hdfs.New("namenode:8020")

file, _ := client.Open("/mobydick.txt")

buf := make([]byte, 59)
file.ReadAt(buf, 48847)

fmt.Println(string(buf))
// => Abominable are the tumblers into which he pours his poison.

For complete documentation, check out the Godoc.

The hdfs Binary

Along with the library, this repo contains a commandline client for HDFS. Like the library, its primary aim is to be idiomatic, by enabling your favorite unix verbs:

$ hdfs --help
Usage: hdfs COMMAND
The flags available are a subset of the POSIX ones, but should behave similarly.

Valid commands:
  ls [-lah] [FILE]...
  rm [-rf] FILE...
  mv [-fT] SOURCE... DEST
  mkdir [-p] FILE...
  touch [-amc] FILE...
  chmod [-R] OCTAL-MODE FILE...
  chown [-R] OWNER[:GROUP] FILE...
  cat SOURCE...
  head [-n LINES | -c BYTES] SOURCE...
  tail [-n LINES | -c BYTES] SOURCE...
  du [-sh] FILE...
  checksum FILE...
  get SOURCE [DEST]
  getmerge SOURCE DEST
  put SOURCE DEST
  
  To alter the default locations from which configurations are loaded, 
  the following environment variables may be used:

    - HADOOP_CONF_DIR     hadoop configuration directory. Default: %s
    - HADOOP_KRB_CONF     kerberos configuration file. Default: %s
    - HADOOP_CCACHE       credential cache to use. Defaults: to "/tmp/krb5cc_{user_uid}"
    - HADOOP_KEYTAB       if set, the specified keytab is used and the credential cache is ignored.

Since it doesn't have to wait for the JVM to start up, it's also a lot faster hadoop -fs:

$ time hadoop fs -ls / > /dev/null

real  0m2.218s
user  0m2.500s
sys 0m0.376s

$ time hdfs ls / > /dev/null

real  0m0.015s
user  0m0.004s
sys 0m0.004s

Best of all, it comes with bash tab completion for paths!

Installing the library

To install the library, once you have Go all set up:

$ go get -u github.com/colinmarc/hdfs

Installing the commandline client

Grab a tarball from the releases page and unzip it wherever you like.

You'll want to add the following line to your .bashrc or .profile:

export HADOOP_NAMENODE="namenode:8020"

To install tab completion globally on linux, copy or link the bash_completion file which comes with the tarball into the right place:

ln -sT bash_completion /etc/bash_completion.d/gohdfs

By default, the HDFS user is set to the currently-logged-in user. You can override this in your .bashrc or .profile:

export HADOOP_USER_NAME=username

Kerberos support

Authentication via Kerberos (and authentication only) is supported.

The binary will check the default locations for your kerberos and hadoop configurations. These can be overridden via environment variables HADOOP_KRB_CONF, and HADOOP_CONF_DIR.

You will need either a kinit’ed credential cache, which is expected to live at /tmp/krb5cc_$(id -u $(whoami)) — override via HADOOP_CCACHE — or a keytab specified through HADOOP_KEYTAB.

This has only been tested on one or two different kerberized clusters: if you have trouble using it, feedback is more than welcome.

Compatibility

This library uses "Version 9" of the HDFS protocol, which means it should work with hadoop distributions based on 2.2.x and above. The tests run against CDH 5.x and HDP 2.x.

Acknowledgements

This library is heavily indebted to snakebite.

hdfs's People

Contributors

adamfaulkner avatar advincze avatar ashishgandhi avatar bjk-soundcloud avatar colinmarc avatar crozzy avatar dajobe avatar ebartels avatar itszootime avatar junjieqian avatar killerwhile avatar sakserv avatar shastick avatar smcquay avatar snoble avatar tyler-sommer avatar vlivan-microsoft avatar yjh0502 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.