porterehunley / onereview_data_collection Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Prepare the application for deployment with WSGI and Gunicorn
Add in a years toggle for the collection at the front end.
Sometimes when a movie has multiple parts, the collector will collect a movie review for the second movie when it should be in the first.
Right now the startservercontroller is open, and anyone can just start the data collection by hitting that API endpoint. The application has a token system. I need to first validate the user is logged in, then get his token, then call the protected endpoint.
We need to store our data somewhere. Google cloud is a pretty good option since we are using YouTube's API
Documentation below
https://cloud.google.com/storage/docs/
Add a page to the front end that documents the API usage and Auth system
We do not have infinite queries and some are bigger than others. We should not go through the CC of the video unless we have too.
Should we look are comments, CC's, other things? What data do we need to get from the videos to train our algorithms. This is going to require training some crude ML models with clean data.
Do a in-depth reading of chapter 2. This chapter has an example of an end-to-end ML project. All pretty useful and helps us frame our project.
Right now, multiple authenticated users can start their own collection process. Add in a check that checks if there is a controller working at that moment
This is a large issue with a couple of parts. First we need to be able to click on a movie on the frontend and have it display the youtube videos associated with that movie. Then we need to be able to enter a score for those videos if there is not one already.
Remove the internal API calls inside of the application to increase speed and reduce complexity. Also makes it more configurable.
Add a channel both into the SQL database and into firestore.
^^^^
Write a script integrated with the application that transfers data from the database and puts it in Firestore.
The data collector will sometimes skip over parts of the data pipeline. I do not know why.
Have the user be able to select an entry then clear it.
Have the frontend be able to color the entries that don't have captions
Lets the user know (front end) whether or not a media item contains all the data entries needed.
Add a button that recollects incomplete data. Complete the issue before this that allows for detection of media items with an incomplete data pipeline
Create an algorithm that gets the movie titles from IMDB
Make the number of videos that the controller collects configurable from front-end.
Create Logs for specified directories
Have an approval system for email where if a applicant is approved from the email, then it registers them into the database with a new token.
Add in a configuration file that allows for app name to become configurable for better routing
Provision and setup server to hold an NginX deployment.
Gather some example data from a couple YouTube videos relating to different products and see what it all looks like and how we should clean it.
Mark it up using python and go ahead and commit it
Have the application not totally crash when it hits a quota limit.
Set up the data transformation pipeline for youtube data. Dirty data in, clean data out.
Have the user be able to select a current entry, then add it.
Add a user registration page that submits to an email registered with TrueReview.
Setup an authorized account that has access to YouTube's API.
Make sure the data collector does not crash when it tries to collect a media item that is already in the database. Let the user know, then go to the next entry.
Add a button that will stop the data collection
Write a script, in Ansible or others, that installs/provisions the server as well as sets up the database. Should ssh into the ubuntu host: [email protected] and set everything up.
Add the current year of the movie titles to the server status API so the frontend can update correctly.
Install Anaconda and set up an environment in sublime text
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.