Code Monkey home page Code Monkey logo

observatory's Introduction

Functional Programming in Scala Capstone (Coursera)

Repository with implementation to all 6 modules for final capstone (https://www.coursera.org/learn/scala-capstone). Some tips and code snippets are shown here.

Module 1 - Extraction

Implemented with Spark. I wouldn't recommend trying to implement any other module with Spark. Next modules require highly efficient implementations (a few milliseconds per iteration) to pass the grader tests. Spark creates too much overhead for non-distributed data. Also, be very careful to follow all the requirements when preparing both data sets and specially in your JOIN logic (null keys should also be compared):

val joined = stations.joinWith(temperatures,
     stations("stnId") <=> temperatures("stnId") &&
       stations("wbanId") <=> temperatures("wbanId")
   )

   /*
   Alternative using string literals:
   import org.apache.spark.sql.Column
   import org.apache.spark.sql.functions._

   def compareColumns(c: String): Column =
      coalesce(stations(c), lit("*")) === coalesce(temperatures(c), lit("*"))

   val joined = stations.joinWith(temperatures,
    compareColumns("stnId") && compareColumns("wbanId")
   )
    */

Module 2 - Visualization

Probably the hardest module. You need to implement an algorithm to interpolate Temperatures for every single Location in the map and it has to be FAST; you need to reduce your time to a few (in single digits) milliseconds per temperature interpolated or the grader will throw an annoying Timeout error. Most people suggest (in the forums) to use the closes data points only, not all data points in a single year (using "par" is good enough as long as your implementation is correct). To reduce the number of data points you can use Java quickSort (fastest way to sort by far) and take. My implementation (without helper methods) is:

def predictTemperature(temperatures: Iterable[(Location, Temperature)], location: Location):
Temperature = {

    // calculate distances
    val distanceTempPairs: ParIterable[(Double, Temperature)] =
      temperatures.par.map {
        case (l: Location, t: Temperature) => (getDistance(location, l), t)
      }

    /*
    Optional implementation using quicksort and fewer distances:
    val arrTest = distanceTempPairs.toArray
    scala.util.Sorting.quickSort(arrTest)
    val arrTest2 = arrTest.take(40)
     */

    // No interpolation needed if distance < 1km.
    val closeP = distanceTempPairs.filter(_._1 < 1).take(1)
    if (closeP.nonEmpty) closeP.head._2
    // else implement the Inverse Distance Weighting formula
    else getInverseDistanceWeight(distanceTempPairs)
}

I am using all distances and still passing all test by just using parallel computation.

Module 3 - Interaction

Not too difficult once you understand how to generate Location coordinates from Tile coordinates.

Module 4 - Manipulation

For this module I use memoization at the Grid level only. I use the following helper method as a wrapper:

def memoizeFnc[K, V](dict: MMAP[K, V])(f: K => V): K => V = {
    k =>
      dict.getOrElse(k, {
        dict.update(k, f(k))
        dict(k)
      })
  }

This wrapper is used together with a local Map to cache all computed temperatures:

val GRID_MAPS: MMAP[Int, MMAP[GridLocation, Temperature]] = MMAP.empty

  /**
    * @param temperatures Known temperatures
    * @return A function that, given a latitude in [-89, 90] and a longitude in [-180, 179],
    *         returns the predicted temperature at this location
    */
  def makeGrid(temperatures: Iterable[(Location, Temperature)]):
  GridLocation => Temperature = {

    val tempsHash = temperatures.hashCode()
    if (!GRID_MAPS.contains(tempsHash))
      GRID_MAPS.update(tempsHash, MMAP.empty[GridLocation, Temperature])

    val memoizedGetTemperature = memoizeFnc(GRID_MAPS(tempsHash))(
      (gridLoc: GridLocation) => predictTemperature(temperatures, gridLoc.toLocation)
    )

    memoizedGetTemperature
  }

You can use this approach or you could just pre-compute all temperatures.

Module 5 - Visualization2

The hardest part is the conversion of Location to CellPoint. Other than that the implementation is similar to Interaction. My approach in visualizeGrid:

val allColors: Array[Color] = allLocations.par
      .map { loc =>
        val lat = loc.lat.toInt
        val lon = loc.lon.toInt
        bilinearInterpolation(
          CellPoint(loc.lon - lon, loc.lat - lat),
          grid(GridLocation(lat, lon)),
          grid(GridLocation(lat + 1, lon)),
          grid(GridLocation(lat, lon + 1)),
          grid(GridLocation(lat + 1, lon + 1)))
      }
      .map(temperature => interpolateColor(colors, temperature))
      .toArray

Module 6 - Interaction2

Implemented using Reactive Programming. Probably the easiest module.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.