Code Monkey home page Code Monkey logo

hfind's Introduction

hfind

hfind is a find(1) implementation for Hadoop.

Getting started

Download the latest version of the hfind tarball here.

The tarball contains two files, a shell script and a jar.

You first need to configure hfind to find your Hadoop installation, via the HFIND_OPTS variable:

export HFIND_OPTS="-Dhfind.hadoop.ugi='pierre\,pierre' -Dhfind.hadoop.namenode.url=hdfs://namenode.company.com:9000"

You can then simply run ./hfind from the exploded tarball directory:

[pierre@mouraf ~/downloads]$ ls
metrics.hfind-1.0.0-SNAPSHOT.tar.gz
[pierre@mouraf ~/downloads]$ tar zxvf metrics.hfind-1.0.0-SNAPSHOT.tar.gz
hfind
target/metrics.hfind-1.0.0-SNAPSHOT-jar-with-dependencies.jar
[pierre@mouraf ~/downloads]$ export HFIND_OPTS="-Dhfind.hadoop.ugi='pierre\,pierre' -Dhfind.hadoop.namenode.url=hdfs://namenode.ning.com:9
000"
[pierre@mouraf ~/downloads]$ ./hfind /user/pierre -type f -name "file*.dat" -o -type d -name a
/user/pierre/testing/a
/user/pierre/testing/a/b/fileb.dat
/user/pierre/testing/a/c/filec.dat
/user/pierre/testing/a/filea.dat
/user/pierre/testing/a/fileaa.dat

Examples

[pierre@mouraf ~]$ hfind /user/pierre -type f -name "file*.dat" -o -type d -name a
/pierre/testing/a
/pierre/testing/a/b/fileb.dat

[pierre@mouraf ~]$ hfind /user/pierre -mtime +3 -type d
/user/pierre/2010/09
/user/pierre/2010/09/20
/user/pierre/2010/09/21

Usage

hfind follows as close as possible the POSIX specification documented here.

As of 2010, some options cannot be implemented (e.g. symbolic links).

-maxdepth/-d follows the BSD implementation:

 -d      Cause find to perform a depth-first traversal, i.e., directories are visited in post-order and all entries in a
         directory will be acted on before the directory itself.  By default, find visits directories in pre-order, i.e.,
         before their contents.  Note, the default is not a breadth-first traversal.
         This option is equivalent to the -depth primary of IEEE Std 1003.1-2001 (``POSIX.1'').

-maxdepth n
         Descend at most n directory levels below the command line arguments.  If any -maxdepth
         primary is specified, it applies to the entire expression even if it would not normally be evaluated.
         -maxdepth 0 limits the whole search to the command line arguments.

See TODO below for work in progress.

Things to remember

A directory size is always reported as zero. Use -empty to find empty directories.

A directory mtime is the date of the latest modification of any file in the directory, going one level only. You can have more recent files in subdirectories:

[pierre@mouraf ~]$ find /tmp/a -ls
45404725        0 drwxr-xr-x    4 pierre   wheel         136 Sep 23 17:10 /tmp/a
45404727        0 drwxr-xr-x    3 pierre   wheel         102 Sep 23 17:11 /tmp/a/b
45404730        0 -rw-r--r--    1 pierre   wheel           0 Sep 23 17:11 /tmp/a/b/hello
45404726        0 -rw-r--r--    1 pierre   wheel           0 Sep 23 17:10 /tmp/a/hello

In the previous example, mtime of /tmp/a is 17:10 although, it contains /tmp/a/b/hello, modified later.

TODO

  • Respect parentheses

  • Implement -prune

  • Implement -exec and -ok

  • Implement -perm

  • Implement -depth and -mindepth

Build

mvn install

By default, hfind bundles Hadoop version 0.20.2. One can override it via the hadoop.version parameter, e.g.

mvn clean install
mvn -Dhadoop.version=0.20.2-cdh3u1 clean install
mvn -Dhadoop.version=0.20.2-cdh3u2 clean install
mvn -Dhadoop.version=0.20.2-cdh3u3 clean install
mvn -Dhadoop.version=1.0.0 clean install
mvn -Dhadoop.version=1.0.1 clean install
mvn -Dhadoop.version=1.0.2 clean install

License (see COPYING file for full license)

Copyright 2010-2012 Ning

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

hfind's People

Contributors

pierre avatar

Watchers

James Cloos avatar Narava avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.