Code Monkey home page Code Monkey logo

dataengineertask's Introduction

DataEngineerTask

For this assignment you will create a Python program that does the following:

Load the data https://raw.githubusercontent.com/localytics/data-viz-challenge/master/data.json (in a programmatic way, no manual download) Process the data in the following way: Output the following columns: age, device, date, count, amount_sum. date has the format of YYYY-MM-DD and is based on client_time. count is the count of entries. amount_sum is the sum the values in the amount field. Only entries of female ("gender": "F") and Californian ("state": "CA") users should be considered. Write the result as a CSV file total_events.csv to AWS S3.

Installation

You will want to have Anaconda installed. More information here: https://docs.anaconda.com/anaconda/install/

The environment has been saved to the file: packagesADD.yml

  1. Go to the directory in the command line where you have placed this repo.

  2. This file may be used to create an environment using:


$ conda env create --file packagesADD.yml

This environment was created using the platform: win-64

  1. Activate the enviroment with:

$ conda activate envADD

  1. Open Jupyter notebook from your envADD active environment through your commandline with:

$ jupyter notebook

Run on local machine

  • Activate the environment using

$ conda activate envADD

  • Place the file 'secrets.py' in your local folder where you have the environment. Usually, here: C:\Users{UserName}.conda\envs\envADD\lib

  • Open the file DataEngineeringTask.ipynb and run all cells

  • You can also run the file DataEngineeringTask.py. Make sure you are using the provided environment or that the necessary libraries have been installed.

  • To write the csv file to the s3 bucket you will need the secrets. As you can imagine, I will be sending the secrets directly to you and I will not upload the secrets to the git repo ;)

dataengineertask's People

Contributors

albertodiazdurana avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.