Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
- In Class Instruction: 4 Hours
- In Class code along Dataset: orders_Pig, student_details
- Installation of Pig
- Hands-on exercise with datasets
- Pig lets you write Pig Latin scripts for doing complex map-reduce tasks more easily.
- Hortonworks has an introductory tutorial.
- Understand what is PIG and where does it fit in the Hadoop ecosystem
- Components of PIG
- Extending PIG where required to achieve different objectives
- Running and Executing a PIG script
- Why we need Pig
- Pig v/s Mapreduce
- Pig Latin Processing
- Pig Execution Modes
- Pig Assignments
- Go through an old Twitter deck on why Pig is good.