prawin777 / mapreduce Goto Github PK
View Code? Open in Web Editor NEWBrief: Running MapReduce jobs in fully distributed mode to analyze Airline performance data set. Scenario: Design, implement, and run an Oozie workflow to find out a. the 3 airlines with the highest and lowest probability for being on schedule; b. the 3 airports with the longest and shortest average taxi time per flight and c. the most common reason for flight cancellations. Running the entire data from the period of October 1987 to April 2008 on two VM’s for at least 5 increment steps, and measure each corresponding workflow execution time.