Code Monkey home page Code Monkey logo

basic-transformations's Introduction

Basic Transformations

Work through the katas in this codebase to learn basic transformations, transformations on a single dataframe/dataset, with Spark + Scala.

Analyze diamonds dataset and write code for below operations:
  • count number of records
  • remove duplicates
  • calculate average price
  • calculate min and max price
  • filter flawless diamonds
  • groupBy clarity and calculate average price
  • Add column "grade" with computation based on cut and clarity
  • drop a column
Goal: Do code changes for above operations and make sure all the test cases pass.

Dataset: src/main/resources/diamonds.csv

Metadata: (Ref: https://www.kaggle.com/shivam2503/diamonds)

Column   Description
  1. index: counter
  2. carat: Carat weight of the diamond
  3. cut: Describe cut quality of the diamond. Quality in increasing order Fair, Good, Very Good, Premium, Ideal
  4. color: Color of the diamond, with D being the best and J the worst
  5. clarity: How obvious inclusions are within the diamond:(in order from best to worst, FL = flawless, I3= level 3 inclusions) FL,IF, VVS1, VVS2, VS1, VS2, SI1, SI2, I1, I2, I3
  6. depth: depth % :The height of a diamond, measured from the culet to the table, divided by its average girdle diameter
  7. table: table%: The width of the diamond's table expressed as a percentage of its average diameter
  8. price: the price of the diamond
  9. x: length mm
  10. y: width mm
  11. z: depth mm
How to run spark program through Intellij?
  • Set main class as org.apache.spark.deploy.SparkSubmit
  • Set program arguments as --master local --class<main_class> target/scala-2.12/<jar_name>

basic-transformations's People

Contributors

chandnipateltw avatar chandnirpatel avatar dsepulve avatar lsandrade avatar piyushpungliya avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.