Apache Spark is an open source big data processing framework built around speed, ease of use, and sophisticated analytics
#What is Big Data?
- Large Volume of Data
- Structured
- UnStructured
- Volume
- Verity
- Velocity
By the end of this lecture, you will be able to:
-
Create RDDs to distribute data across a cluster
-
Use the Spark shell to compose and execute Spark commands
-
Use Spark to analyze apache access.log file
- Analyse Apache access.log using Spark