Code Monkey home page Code Monkey logo

hive-udtf's Introduction

hive-udtf Build Status

Create UDTF for Hive on Hadoop, I just refer the blog post: RECURSION IN HIVE โ€“ PART 1

Build & Package

$ sbt compile
$ sbt assembly

Known Issues

  • This user defined function may not work with Hive under 1.3.0
    • HIVE-11892 - UDTF run in local fetch task does not return rows forwarded during GenericUDTF.close()

Please use Hive 1.3.0 or later...

Testing

  • Please upload & add jar for your Hadoop ENV

  • Prepare Hive tables, like following:

CREATE TABLE t_state(
  state      STRING,
  next_state STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';
  • Prepare test data with CSV files, like following:
$ cat t_state.csv
S1,
S2,S1
S3,S1
S4,S2

$ hadoop fs -put t_state.csv

hive> LOAD DATA INPATH 't_state.csv' OVERWRITE INTO TABLE t_state;

hive> SELECT * FROM t_state;
OK
S1
S2      S1
S3      S1
S4      S2

hive> add jar /path/to/hive-udtf.jar;
Added [/home/hadoop/hive-udtf.jar] to class path
Added resources: [/home/hadoop/hive-udtf.jar]

-- Drop function if it exists
hive> DROP FUNCTION expand_tree;

-- Create function with this class name
hive> CREATE FUNCTION expand_tree AS 'jp.gr.java_conf.hangedman.ExpandTree2UDTF';

-- Hive will return following records
hive> SELECT expand_tree(state, next_state) FROM t_state;
OK
S4      S4      0
S4      S2      1
S4      S1      2
S1      S1      0
S3      S3      0
S3      S1      1
S2      S2      0
S2      S1      1

-- In case of INSERT INTO another TABLE
hive> INSERT INTO TABLE xxx SELECT expand_tree(state, next_state) AS (x,y,z) FROM t_state;

-- You can limit records up to your use-case

hive-udtf's People

Contributors

hangingman avatar

hive-udtf's Issues

Key not found

Hi,

I'm not sure this will work in a distributed environment. It works on the data set you provided, but when I attempt to use it on a real data set, it has issues with key not found. My guess is that depending on the number of executors, they do not all have access to all the data stored in the tree hashmap.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.