Code Monkey home page Code Monkey logo

cortext-pytheas-youtube's Introduction

YoutubeExplorer

Disclaimer : understand this is WIP

Intended goal

  • Explore youtube from a "data point of view"
  • Download requested data as a comprehensible way
  • Make data analyzable by CORTEXT platform

Feature

  • Explore video and videos from query, playlist or channel
  • Get Comments, captions and metrics from video list
  • Download data as JSON

Build on

  • Requests/Mongodb : build data
  • Flask/Jinja : front/back
  • Docker : deployment

Install

You can use Docker for an agnostic deployment (it will need a well installed Docker) or launch it diretcly with Python (but will need to get all requirements : Python3, Mongodb)

  • docker service (with docker-compose) :
docker-compose up 

Then after all is correctly built and verifying it's running stop it for :

docker-compose start
  • python environment :

For this variant please be sure

cd cortext-pytheas-youtube
virtualenv env3 -p python3
source ./env3/bin/activate

Then from two terminal and for each ./main.py and rest/rest.py (since new branch Rest has his own app):

pip install -R requirements.txt
python main.py

Other requirements :

Configuration files

  • Capitalized keys are required to work
  • lower case for debug purpose

conf/conf.json

{
    "DATA_DIR": "data/",
    "PORT": 8080,
    "REDIRECT_URI": "http://localhost:8080/auth",
    "GRANT_HOST_URL": "https://auth.cortext.net",
    
    "MONGO_HOST": "mongo",
    "MONGO_DBNAME": "youtube",
    "MONGO_PORT": 27017,
    
    "REST_HOST": "rest",
    "REST_PORT": 5002,
    
    "api_key": "",
    "oauth_status": "True",
    "debug_level": "False"
}

rest/conf/conf.json

{
    "LOG_DIR": "log/",
    "PORT": 5002,
    "MONGO_HOST": "localhost",
    "MONGO_DBNAME": "youtube",
    "MONGO_PORT": 27017
}

API Key from Google :

  1. Obtain an api key from Google and activate the YoutubeDataAPI from console.developers.google.com
  2. Put api key in Pytheas web interface or in persistant inside config file
  3. Start exploration

Limitations

  • Youtube results (api & browser) from search can only provide ~500 results (but you can get more video list by channel, playlist, arbitraty list of videos or even horodated search query)
  • Automatic captions cannot be totally retrieved via API (need to trick with xml request and also with undocumented frontend Youtube API...)
  • Comments gets only one sub-level

Good to know

  • Search is list of video
  • Playlist is list from video
  • Author/Channel are different

Very basic rest implemented

  • /queries/
  • /queries/query_id
  • /queries/query_id/videos/
  • /videos/video_id
  • /videos/video_id/comments/
  • /comments/comment_id
  • /captions/caption_id

Next to do

  • multiThreaded // parallele // queuding (CELERY again ?)
  • continue refactoring : meaning dynamic and self function
  • integrate errorhandler directly from flask
  • Integrate api openSpec
  • Continue to integrate methods (see about network of reccommandation mainly)
  • ...

cortext-pytheas-youtube's People

Contributors

breucker avatar ikario404 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.