Code Monkey home page Code Monkey logo

hadoop-framework-examples's Introduction

Realistic Hadoop Data Processing Examples

This code is to accompany my blog post on map reduce frameworks

The point of the code in this repository is to provide an implementation for a business question (listed below) in each of the major Map Reduce frameworks.

Each implementation will get it's own subdirectory with it's own build and running instructions. Each framework will also get an accompanying test, and an in-depth walkthrough about implementation details.

The following implementations are complete:

The problem

The Data

We have two datasets: customers, and transactions.

Customer Fields:

Transaction Fields:

  • transaction-id (1)
  • product-id (1)
  • user-id (1)
  • purchase-amount (19.99)
  • product-description (a rubber chicken)

These two datasets are stored in tab-delimited files somewhere on HDFS.

The Question

For each product, we want to know the number of locations in which that product was purchased.

That's it!

In the real world, we might have other questions, like the number of purchases per location for each product.

hadoop-framework-examples's People

Contributors

helenahm avatar rathboma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hadoop-framework-examples's Issues

new issue

Run the Apache Spark Pi job in yarn-client mode, using code from org.apache.spark:
./bin/spark-submit --class org.apache.spark.examples.SparkPi
--master yarn-client
--num-executors 1
--driver-memory 512m
--executor-memory 512m \

Issue with Building Jar using Maven and Executing Application

Hey Matthew,
I'm having some difficulty with running the "Spark-Scala" application and was following your blog post;. I noticed that you skipped the jar packaging step in the guide, so I followed the Github repository readme by running "mvn compile". However, this doesn't create the jar file within the "target" directory; instead, the "target" directory has a directory "classes" and a classes.timestamp file, and "classes" has the "ExampleJob.class" bytecode. Could you assist with this issue?

Error: java.io.IOException: wrong key class: hadoop_join.TextTuple is not class org.apache.hadoop.io.Text

I copy your hadoop join example and run , but it raise error, I can not figure it out. following is some logs, and my env is hadoop 2.7.5

18/01/19 22:54:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.1.13:8032
18/01/19 22:54:29 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
18/01/19 22:54:29 INFO input.FileInputFormat: Total input paths to process : 1
18/01/19 22:54:29 INFO input.FileInputFormat: Total input paths to process : 1
18/01/19 22:54:29 INFO mapreduce.JobSubmitter: number of splits:2
18/01/19 22:54:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1516287880873_0006
18/01/19 22:54:30 INFO impl.YarnClientImpl: Submitted application application_1516287880873_0006
18/01/19 22:54:30 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1516287880873_0006/
18/01/19 22:54:30 INFO mapreduce.Job: Running job: job_1516287880873_0006
18/01/19 22:54:38 INFO mapreduce.Job: Job job_1516287880873_0006 running in uber mode : false
18/01/19 22:54:38 INFO mapreduce.Job:  map 0% reduce 0%
18/01/19 22:54:47 INFO mapreduce.Job:  map 100% reduce 0%
18/01/19 22:54:52 INFO mapreduce.Job: Task Id : attempt_1516287880873_0006_r_000000_0, Status : FAILED
Error: java.io.IOException: wrong key class: hadoop_join.TextTuple is not class org.apache.hadoop.io.Text
	at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1375)
	at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:83)
	at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
	at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
	at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
	at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1754)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.