Code Monkey home page Code Monkey logo

imdbapp's Introduction

Assumptions

  • We are using JPA to represent the IMDB movie Model. All the DAO are one to one mapping with the IMDB tables. However, we creating few relationship tables to maintain one too many relationships
  • To load the data, we are using a Queue based pattern, where the request to upload is queued, and workers work on it. This approach is taken so that in the future, we can parallelize it.
  • One of the requirements was only to import 2017 movie data. I did try to find the best way to do that. Initially, I thought the title file was sorted based on the year, and I thought I could scan through that get the titles for a particular date range. However, later I found thats, not the case. So, for now, i am importing everything. In addition, all the other tables like names, principles, rating didn't have any way to filter by year. We could have to maintain a list of a map of all the moves from 2017 and then do a scan of all associated items. However I think assuming the scope of this exercise was to understand java coding and best practices i didn't think it made sense to implement a distributed application that would break this task down and achieve it in a scalable way,

Code Design

I am using a SpringBoot application and based on the parameter passed we execute the following tasks

  • ImportAll
  • Update Rating
  • GetRating

ImdbFile

This file represents the IMDB offline entity. Its the uzip file stored in memory, which is used the DAO to stream it and create entries in the data store. SInce we are using JPA and Hibernate, the data store can be changed easily. For this exercise, i am using SQLLite.

DAO

For data access, we are using JPA and Hiberate. We are using Spring to generate all the necessary CURD classes. Finally, for import, we are using ImdbFile calls the appropriate method in ImdbEntityDAO. These methods are guarded by transactions, so you are either able to import and fail. Also in the ImdbEntityDAO implementation, we are flush and clear the cache to avoid excessive memory consumption.

Mapper

This class is used to Map the CSV values over to the ImdbEntity. This mapper is simple and handles most of the translation.

Testing

Testing is broken down into unit tests and integration test. For integration tests we are using in-memory database to test the import process end to end with fake data and file created in run time

Run

I was able to run this with all the files except title.principals.tsv.gz because this file is 300M ziped and the import took over an hour.

imdbapp's People

Contributors

debajyotimahanta avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.