vsingh58 / ceteri-mapred Goto Github PK
View Code? Open in Web Editor NEWThis project forked from ceteri/ceteri-mapred
MapReduce examples
Home Page: http://www.slideshare.net/pacoid/getting-started-on-hadoop
This project forked from ceteri/ceteri-mapred
MapReduce examples
Home Page: http://www.slideshare.net/pacoid/getting-started-on-hadoop
## Getting Started on Hadoop ## Paco Nathan <[email protected]> ## ## Silicon Valley Cloud Computing Meetup ## http://www.meetup.com/cloudcomputing/calendar/13911740/ ## Mountain View, 2010-07-19 GitHub src repo: http://github.com/ceteri/ceteri-mapred Presentation slides available here in Keynote format or online at SlideShare: doc/enron.key http://www.slideshare.net/pacoid/getting-started-on-hadoop See the "WordCount" example at: bin/run_wc.sh See the "Enron Email Dataset" demo at: bin/run_enron.sh R statistics demo: thresh.R, thresh.tsv Gephi graph demo: graph.gephi ## to run your own code on Elastic MapReduce 1. create a bucket in S3 2. copy the Python scripts into a "src" folder there 3. determine some subset of the email message input cat msgs.tsv | head -1000 > input 4. copy "input" to your S3 "src" folder 5. follow examples in slide deck, based on params below ## Hadoop job flow 1 on Elastic MapReduce -input s3n://ceteri-mapred/enron/src/input -output s3n://ceteri-mapred/enron/src/output -mapper '"python map_parse.py http://ceteri-mapred.s3.amazonaws.com/ stopwords"' -reducer '"python red_idf.py 2500"' -cacheFile s3n://ceteri-mapred/enron/src/map_parse.py#map_parse.py -cacheFile s3n://ceteri-mapred/enron/src/red_idf.py#red_idf.py -cacheFile s3n://ceteri-mapred/enron/src/stopwords#stopwords ## Hadoop job flow 2 on Elastic MapReduce -input s3n://ceteri-mapred/enron/src/output -output s3n://ceteri-mapred/enron/src/filter -mapper '"python map_filter.py"' -reducer '"python red_filter.py 0.0633"' -cacheFile s3n://ceteri-mapred/enron/src/map_filter.py#map_filter.py -cacheFile s3n://ceteri-mapred/enron/src/red_filter.py#red_filter.py ## after downloading the partition file named "filter" from S3, then ## run the following command to build a lexicon: cat filter/part-* | sort -k1 -k4 -nr > lexicon
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.