smappnyu / youtube-data-api Goto Github PK

View Code? Open in Web Editor NEW

78.0 5.0 31.0 12.97 MB

A Python Client for collect and parse public data from the Youtube Data API

Home Page: https://youtube-data-api.readthedocs.io/en/latest/index.html

License: MIT License

Jupyter Notebook 85.60% Python 14.35% Makefile 0.05%

youtube data python python-client api-wrapper api youtube-api-v3 youtube-search research research-tool

youtube-data-api's Issues

depricarting "YoutubeDataApi" in favor of "YouTubeDataAPI"

As of v.0.0.21, we're depricating the class "YoutubeDataApi" in favor of "YouTubeDataAPI". These classes are functionally equivalent!

Potential bug: quota explodes

Hi! First of all thank you so much for creating this awesome package. I am using the most recent version available via pip and via attempts to get recommended videos, I hit my quota limit for YouTube within one minute.

I have used the YouTube API before and before was able to collect that much data within the quota limit, which is why I think that this might be bug. Here's my code:

def get_related(seed_id, max_results = 25):
    all_rels = []
    seed =  yt.get_recommended_videos(seed_id, max_results = max_results)
    for rank, i in enumerate(seed):
        i['rank'] = rank + 1
        i['seed'] = seed_id
        seed_id = i['video_id']
        all_rels.append(i)
        seed = yt.get_recommended_videos(seed_id, max_results = max_results)
        for rank1, i1 in enumerate(seed):
            i1['rank'] = rank1 + 1
            i1['seed'] = seed_id
            seed_id = i1['video_id']
            all_rels.append(i1)
    return all_rels

res = get_related('4Y1lZQsyuSQ')

{
 "error": {
  "errors": [
   {
    "domain": "usageLimits",
    "reason": "dailyLimitExceeded",
    "message": "Daily Limit Exceeded. The quota will be reset at midnight Pacific Time (PT). You may monitor your quota usage and adjust limits in the API Console."  }
  ],
  "code": 403,
  "message": "Daily Limit Exceeded. The quota will be reset at midnight Pacific Time (PT). You may monitor your quota usage and adjust limits in the API Console: "
 }
}

Error in search

On python 3.7.3 using search returns
File "C:\Program Files\Python37\lib\site-packages\youtube_api\youtube_api.py", line 740, in search if len(videos) >= max_results: TypeError: '>=' not supported between instances of 'int' and 'str'

OverflowError: timestamp out of range for platform time_t

Describe the bug
when using the lib on a 32bits env (raspberry pi), the following error occurs

Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/src/app/server.py", line 11, in
import youtube
File "/usr/src/app/youtube.py", line 9, in
from youtube_api import YoutubeDataApi
File "/opt/venv/lib/python3.8/site-packages/youtube_api/init.py", line 1, in
from youtube_api.youtube_api import YoutubeDataApi, YouTubeDataAPI
File "/opt/venv/lib/python3.8/site-packages/youtube_api/youtube_api.py", line 24, in
class YouTubeDataAPI:
File "/opt/venv/lib/python3.8/site-packages/youtube_api/youtube_api.py", line 580, in YouTubeDataAPI
published_before=datetime.datetime.timestamp(datetime.datetime(3000,1,1)),
OverflowError: timestamp out of range for platform time_t

Expected behavior
no error should happen. In my opinion the date '3000' is too high on 32 bits env. Maybe setting it to 2038 is enough.

Traceback
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/src/app/server.py", line 11, in
import youtube
File "/usr/src/app/youtube.py", line 9, in
from youtube_api import YoutubeDataApi
File "/opt/venv/lib/python3.8/site-packages/youtube_api/init.py", line 1, in
from youtube_api.youtube_api import YoutubeDataApi, YouTubeDataAPI
File "/opt/venv/lib/python3.8/site-packages/youtube_api/youtube_api.py", line 24, in
class YouTubeDataAPI:
File "/opt/venv/lib/python3.8/site-packages/youtube_api/youtube_api.py", line 580, in YouTubeDataAPI
published_before=datetime.datetime.timestamp(datetime.datetime(3000,1,1)),
OverflowError: timestamp out of range for platform time_t

Desktop (please complete the following information):

OS: Raspbian
Version Buster

Additional context
Add any other context about the problem here.

get_channel_metadata() fails when channel_id is a list instead of a string

Describe the bug
get_channel_metadata() fails when channel_id is a list instead of a string. Cardinality of the list doesn't matter, meaning it fails even on a list of length 1.

Expected behavior
Instead of crashing, it should return an iterable (probably a dict) of metadata for each of the channel_id's.

Traceback

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 209, in get_channel_metadata
#     **kwargs):
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 165, in get_channel_metadata_gen
#     response_json = self._http_request(http_endpoint)
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 109, in _http_request
#     response_json = _load_response(response)
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api_utils.py", line 48, in _load_response
#     response.raise_for_status()
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
#     raise HTTPError(http_error_msg, response=self)
# requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://www.googleapis.com/youtube/3/channels?part=id,snippet,contentDetails,statistics,topicDetails,brandingSettings&id=UCwdVOry0oNF9WIe_3uCfz9Q&key={key_goes_here}&maxResults=50
# >>>

Desktop (please complete the following information):

OS: Mac OSX 10.14.6
Python 3.7.3

Additional context

## Here's a minimal testbed for reproducing this bug:
import os
from youtube_api import YoutubeDataApi as yt_api

YT_KEY = os.environ.get('YT_KEY')
yt = yt_api(YT_KEY, verbose = True) 

## Here we try it with channel_id as a string. This test case works perfectly. I turned on verbose mode so we can see the endpoint URL fully composed with all vars subbed in.
yt.get_channel_metadata(channel_id = 'UCwdVOry0oNF9WIe_3uCfz9Q')

# https://www.googleapis.com/youtube/v3/channels?part=id,snippet,contentDetails,statistics,topicDetails,brandingSettings&id=UCwdVOry0oNF9WIe_3uCfz9Q&key={API_KEY_PLACEHOLDER}&maxResults=50
## It turns out that this URL is identical to the one printed by the failing case below

# {'channel_id': 'UCwdVOry0oNF9WIe_3uCfz9Q', 'title': "Matt's Off Road Recovery", 'account_creation_date': datetime.datetime(2019, 3, 28, 13, 50, 25), 'keywords': '"matt\'s off road recovery" "matt\'s towing" "matt\'s towing and recovery" "matts off road recovery" "winder towing" "off road recovery" "matts towing" "matt towing and recovery" "matts recovery" "matt off road recovery" "wheels on the bus" "matt recovery" "off road" "you took your hyundai where" "stuck in the mud" "stuck in the sand" "stuck truck" "stuck 4x4" "sand recovery" "4x4 recovery" "side by side" "jeep cherokee" "jeep xj"', 'description': 'Off road towing, recoveries and rescues. We cover beautiful southern Utah, near Zion National park. We have a unique way to do off road recovery with our Jeep XJ affectionately named, the yellow banana. We have the infamous Ed with his postive outlook on life.', 'view_count': '27534880', 'video_count': '137', 'subscription_count': '147000', 'playlist_id_likes': None, 'playlist_id_uploads': 'UUwdVOry0oNF9WIe_3uCfz9Q', 'topic_ids': None, 'country': 'US', 'collection_date': datetime.datetime(2020, 4, 7, 20, 35, 24, 687072)}
# >>>


## On the other hand, with the same channel_id as a list containing a single string, we get a failure. The same is true for longer lists also of course. As above, I turned on verbose mode so we can see the endpoint URL fully composed with all vars subbed in.

yt.get_channel_metadata(channel_id = ['UCwdVOry0oNF9WIe_3uCfz9Q'])

# https://www.googleapis.com/youtube/3/channels?part=id,snippet,contentDetails,statistics,topicDetails,brandingSettings&id=UCwdVOry0oNF9WIe_3uCfz9Q&key={API_KEY_PLACEHOLDER}&maxResults=50
## It turns out that this URL is identical to the one printed by the failing case above

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 209, in get_channel_metadata
#     **kwargs):
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 165, in get_channel_metadata_gen
#     response_json = self._http_request(http_endpoint)
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 109, in _http_request
#     response_json = _load_response(response)
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api_utils.py", line 48, in _load_response
#     response.raise_for_status()
#   File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
#     raise HTTPError(http_error_msg, response=self)
# requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://www.googleapis.com/youtube/3/channels?part=id,snippet,contentDetails,statistics,topicDetails,brandingSettings&id=UCwdVOry0oNF9WIe_3uCfz9Q&key={key_goes_here}&maxResults=50
# >>>

def __init__(self, key, api_version='3')

TO DO

test valid key
test invalid key

Tests for `verify_key`

Located in test_initialization.py

Function Tested

def verify_key(self)

TO DO

test valid key
test invalid key

Tests for `get_playlists`

Located in test_playlist_methods.py

Function Tested

def get_playlists(self, channel_id, next_page_token=False, parser=P.parse_playlist_metadata, part=['id','snippet','contentDetails'], **kwargs)

TO DO

channel_id
next_page_token
parser
part

video_id
parser
part

Tests for `get_video_metadata`

Located in test_video_methods.py

Function tested

def get_video_metadata(self, video_id, parser=P.parse_video_metadata, part=['statistics','snippet'], **kwargs)

TO DO

video_id
parser
part

Tests for `get_video_comments`

Located in test_video_methods.py

Function tested

def get_video_comments(self, video_id, get_replies=True, max_results=None, next_page_token=False, parser=P.parse_comment_metadata, part = ['snippet'], **kwargs)

TO DO

Tests for `get_recommended_videos`

Located in test_video_methods.py

Function tested

def get_recommended_videos(self, video_id, max_results=5, parser=P.parse_rec_video_metadata, **kwargs)

TO DO

video_id
max_results
parser

Tests for `test_videos_from_playlist.py`

MERGE WITH THE ACTUAL TEST FILE FOR THIS SET OF PLAYLISTS in test_playlist_methods.py

Tests for Parsers

Located in test_parsers.py

Functions Tested

def raw_json(item)
def raw_json_with_datetime(item)
def parse_video_metadata(item)
def parse_video_url(item)
def parse_channel_metadata(item)
def parse_subscription_descriptive(item)
def parse_featured_channels(item)
def parse_playlist_metadata(item)
def parse_comment_metadata(item)
def parse_rec_video_metadata(item)
def parse_caption_track(item)

TO DO

Tests for Utils

Located in test_utils.py

Functions Tested

def _chunker(l, chunksize)
def _load_response(response)
def _text_from_html(html_body)
def parse_yt_datetime(date_str)
def strip_video_id_from_url(url)
def get_upload_playlist_id(channel_id)
def get_liked_playlist_id(channel_id)
def is_user(channel_url)
def strip_youtube_id(channel_url)
def get_channel_id_from_custom_url(url)
def get_url_from_video_id(video_id)

TO DO

Remove non API functions

I had originally added some convenience functions to deal with in-the-wild URLs and to collect comments, but these functions aren't actually part of the YouTube API.

I'd like to set a deprecation date on those function. The utils might be added to gists for convenience or ported to a separate library of helper functions.

Thoughts? @mabrownnyu

specifying a mix playlist into get_videos_from_playlist_id() causes hang

Describe the bug
when specifying a mix playlist into the function YoutubeDataApi.get_videos_from_playlist_id() the function does not respond

Expected behavior
either return the videos in the playlist or raise an exception

Traceback
None
Desktop (please complete the following information):

OS: Ubuntu
Version 20.04

Additional context
Add any other context about the problem here.

Caching support?

Thanks for the module!

I'm looking for caching support, but I believe the module itself does not offer caching functionality. Correct?

Is it possible to add caching functionality via https://pypi.org/project/requests-cache/ without touching the youtube module?

I tried it like so:

from youtube_api import YoutubeDataApi
import requests_cache
requests_cache.install_cache()
yt = YoutubeDataApi(APIKEY)
yt.get_video_metadata("QeP5WCM2oz0")
yt.get_video_metadata("QeP5WCM2oz0")

But this didn't seem to have work. Both requests seem to hit the Youtube API via network.

Any hints? Thanks!

Error in get_videos_from_playlist_id

I'm re-running code that worked in the spring of 2019.

Most of the functions work fine but I'm having issues with the one listed above. When I call

yt.get_videos_from_playlist_id("UUaeO5vkdj5xOQHp4UmIN6dw")

I get the following message.

TypeError Traceback (most recent call last)
in ()
----> 1 yt.get_videos_from_playlist_id("UUaeO5vkdj5xOQHp4UmIN6dw")

C:\Users\kevin\Anaconda3\lib\site-packages\youtube_api\youtube_api.py in get_videos_from_playlist_id(self, playlist_id, next_page_token, published_after, parser)
265 for item in response_json.get('items'):
266 publish_date = parse_yt_datetime(item['snippet'].get('publishedAt'))
--> 267 if publish_date <= published_after:
268 run=False
269 break

TypeError: '<=' not supported between instances of 'NoneType' and 'datetime.datetime'

Feature Request: Make "part" an argument of lists.

Right now "part" is hardcoded for every request.

`strip_video_id_from_url` doesn't trim time delimiters

Ex: 'https://youtube.com/8qi8ixn6-Vo#t=1397' returns '8qi8ixn6-Vo#t=1397'

HTTP Error: 403 Client Error: Forbidden for url

Hi,
after obtaining API Keys following steps 1 to 6 from the specific Tutorial, and running python code on a Jupiter notebook just as described in steps 1 to 6 from this github github, I The following error on the Jupiter notebook:

HTTP Error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/search?part=snippet&type=video&maxResults=50&order=date&key=AIzaSyAeQv2hbTDAWDVeoo_dJmMMrk-woNBmSMo&q=social%20justice%7Csjw&publishedAfter=2018-03-01T00:00:00Z&publishedBefore=2018-03-31T23:59:00Z&verbose=1&pageToken=CKwCEAA

Right before that error pops up, the Jupiter notebook shows a pink line that looks as follows:

I thought my problem might be that I exceeded my daily quota, but I'm afraid that is not the cause because I obtained the same result after I created two new projects in the same day and one more new project that they after (apparently quotas are set to zero/new every day). I believe my problem might lie in the I followed the wrong instructions to generate API keys. My goal was to replicate that whole Jupiter notebook that I linked above. I would appreciate any help in this respect.

cannot import name 'YoutubeDataApi' from 'youtube_api'

Describe the bug
cannot import the module

Expected behavior
module import works

Traceback

>>> from youtube_api import YoutubeDataApi
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'YoutubeDataApi' from 'youtube_api' (/home/netllama/stuff/flask/lib64/python3.11/site-packages/youtube_api/__init__.py)

Desktop (please complete the following information):

OS: Fedora Linux 37 (x86_64)
Version: youtube-data-api-0.0.21 / python-3.11.2

get_video_metadata() fails with error about missing statistics key

Describe the bug
In the current implementation of get_video_metadata(), the function fails for me with default parameters. The error is related to a missing statistics key in the parsed JSON result. While I think I can track this issue to the default parser used in get_video_metadata(), even if I provide "statistics" as part of the requested data parts, the error persists.

Expected behavior
In the default behavior, this function should return video metadata that includes both statistics and snippet data. The parser should work with this default data.

Desktop (please complete the following information):

OS: Unix-compatible

Help! dark theme

Does anyone want to try to implement this theme in readthedocs/sphinx?
https://userstyles.org/styles/159458/read-the-docs-dark?utm_campaign=stylish_stylepage

no youtube_api.youtube_api_utils.strip_youtube_id()

why was this removed?

Video Publish Date

While using the search function, I realised that the video publish date is no longer available.
Could I ask if youtube has remove this piece of information?
Image include is for reference.

Thank you,

"name 'urllib' is not defined" error when using the search function despite importing library

Describe the bug
I am getting the error "name 'urllib' is not defined" when using the search function based on location - the error occurs with urllib.parse despite including import urllib.parse at the top of my code.

Expected behavior
The output should be similar to as outlined here.

Traceback
The error occurs when I run the following
test_search = yt.search(location = (2.09319, 96.647463), \ locationRadius = "50km", \ order_by = "viewCount", \ topic_id = "/m/06bvp", \ max_results = 5)

Desktop (please complete the following information):

OS: macOS
Version 13.4.1

AttributeError: module 'signal' has no attribute 'SIGALRM'

I want to receive the list of video's ids that are in the given playlist. I use get_videos_from_playlist_id(id), but there is a AttributeError.

vds = yt.get_videos_from_playlist_id(id)
Traceback (most recent call last):
  File "<pyshell#141>", line 1, in <module>
    vds = yt.get_videos_from_playlist_id(pls[2])
  File "C:\Python3\lib\site-packages\youtube_api\youtube_api.py", line 380, in get_videos_from_playlist_id
    timeout_in_n_seconds=20)
  File "C:\Python3\lib\site-packages\youtube_api\youtube_api.py", line 100, in _http_request
    with timeout(seconds=timeout_in_n_seconds):
  File "C:\Python3\lib\site-packages\youtube_api\youtube_api_utils.py", line 33, in __enter__
    signal.signal(signal.SIGALRM, self.handle_timeout)
AttributeError: module 'signal' has no attribute 'SIGALRM'

What it actually means and how to solve it?

Thank you!

Bug in `get_captions`

    468 
    469         if isinstance(video_id, str):
--> 470             captions = _get_captions(video_id, **kwargs)
    471         else:
    472             captions = []

NameError: name 'kwargs' is not defined

get_subscriptions has wrong order of args in http request

Describe the bug
When the string is formatted for the http request, two arguments are placed in the wrong order.
See: https://github.com/SMAPPNYU/youtube-data-api/blob/master/youtube_api/youtube_api.py#L402

Expected behavior
Switch the order of these keys
Traceback
Include code block of the error, along with the version of the software and Python.
HTTPError: 400 Client Error: Bad Request for url: https://www.googleapis.com/youtube/v3/subscriptions?channelId=id,snippet&part=UC[...]&maxResults=50&key=AIza[...]

Desktop (please complete the following information):
Linux

`get_channel_metadata()` does not handle lists where single element has error

Read the docs not rendering

Looks like something broke the read the docs API documentation. Was this from the merge request?

KeyError: 'resourceId' in parsers.py

File "/Users/michael/Code/youtube-data-api/youtube_api/parsers.py", line 66, in parse_video_url
video_id = item['snippet']['resourceId'].get('videoId')
KeyError: 'resourceId'

Add dtype for arguments

We might want to consider explicit datatypes for each parameter.
This issue arose from overlooking this:
#17
Like this:

def func(a:int, b:int) -> int:
    return a + b

Tests needed

Hi we need a test for:

Search has a lot of arguments, and get_captions does not use the API!

Documents typo

Hi, there is an error in the documents as follows

https://youtube-data-api.readthedocs.io/en/latest/usage/quickstart.html

import os
import pandas as pd
from youtube_api import YoutubeDataApi <----- because import as Api

YT_KEY = os.environ.get('YOUTUBE_API_KEY') # you can hardcode this, too.
yt = YouTubeDataAPI(YT_KEY) <-----------typo here, API shall be Api

this is a great tool, but this error might cause some beginners stuck on this stage.

thank you for your brilliant work.

What does get_comments return when a video has comments disabled?

Hit the API to return comments for a video that has zero comments, and one that has comments disabled.

Do we change backend code when we hit the API?

smappnyu / youtube-data-api Goto Github PK

youtube-data-api's Issues

TESTS

Tests for get_captions

Function Tested

TO DO

Tests for get_channel_id_from_user

Function Tested

TO DO

Tests for get_channel_metadata_gen

Function Tested

TO DO

Tests for get_channel_metadata

Function Tested

TO DO

Tests for get_subscriptions

Function Tested

TO DO

Tests for get_featured_channels

Function Tested

TO DO

Tests for get_featured_channels_gen

Function Tested

TO DO

Tests for __init__

Function Tested

TO DO

Tests for verify_key

Function Tested

TO DO

Tests for get_playlists

Function Tested

TO DO

Tests for get_video_from_playlist_id

Function Tested

TO DO

Tests for search

Function Tested

TO DO

Tests for get_video_metadata_gen

Function tested

TO DO

Tests for get_video_metadata

Function tested

TO DO

Tests for get_video_comments

Function tested

TO DO

Tests for get_recommended_videos

Function tested

TO DO

Tests for test_videos_from_playlist.py

Tests for Parsers

Functions Tested

TO DO

Tests for Utils

Functions Tested

TO DO

Recommend Projects

Recommend Topics

Recommend Org

Tests for `get_captions`

Tests for `get_channel_id_from_user`

Tests for `get_channel_metadata_gen`

Tests for `get_channel_metadata`

Tests for `get_subscriptions`

Tests for `get_featured_channels`

Tests for `get_featured_channels_gen`

Tests for `init`

Tests for `verify_key`

Tests for `get_playlists`

Tests for `get_video_from_playlist_id`

Tests for `search`

Tests for `get_video_metadata_gen`

Tests for `get_video_metadata`

Tests for `get_video_comments`

Tests for `get_recommended_videos`

Tests for `test_videos_from_playlist.py`