Light

lsandrade / basic-transformations Goto Github PK

View Code? Open in Web Editor NEW

This project forked from chandnipateltw/basic-transformations

0.0 0.0 0.0 735 KB

This repository has been built to help people learn how to do basic transformations on a single DataFrame in Spark + Scala.

Scala 100.00%

basic-transformations's Introduction

Basic Transformations

Work through the katas in this codebase to learn basic transformations, transformations on a single dataframe/dataset, with Spark + Scala.

Analyze diamonds dataset and write code for below operations:

count number of records
remove duplicates
calculate average price
calculate min and max price
filter flawless diamonds
groupBy clarity and calculate average price
Add column "grade" with computation based on cut and clarity
drop a column

Goal: Do code changes for above operations and make sure all the test cases pass.

Dataset: src/main/resources/diamonds.csv

Metadata: (Ref: https://www.kaggle.com/shivam2503/diamonds)

Column   Description

index: counter
carat: Carat weight of the diamond
cut: Describe cut quality of the diamond. Quality in increasing order Fair, Good, Very Good, Premium, Ideal
color: Color of the diamond, with D being the best and J the worst
clarity: How obvious inclusions are within the diamond:(in order from best to worst, FL = flawless, I3= level 3 inclusions) FL,IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1, I2, I3
depth: depth % :The height of a diamond, measured from the culet to the table, divided by its average girdle diameter
table: table%: The width of the diamond's table expressed as a percentage of its average diameter
price: the price of the diamond
x: length mm
y: width mm
z: depth mm

How to run spark program through Intellij?

Set main class as org.apache.spark.deploy.SparkSubmit
Set program arguments as --master local --class<main_class> target/scala-2.12/<jar_name>

basic-transformations's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.