smappnyu / youtube-data-api Goto Github PK
View Code? Open in Web Editor NEWA Python Client for collect and parse public data from the Youtube Data API
Home Page: https://youtube-data-api.readthedocs.io/en/latest/index.html
License: MIT License
A Python Client for collect and parse public data from the Youtube Data API
Home Page: https://youtube-data-api.readthedocs.io/en/latest/index.html
License: MIT License
As of v.0.0.21, we're depricating the class "YoutubeDataApi" in favor of "YouTubeDataAPI". These classes are functionally equivalent!
Hi! First of all thank you so much for creating this awesome package. I am using the most recent version available via pip and via attempts to get recommended videos, I hit my quota limit for YouTube within one minute.
I have used the YouTube API before and before was able to collect that much data within the quota limit, which is why I think that this might be bug. Here's my code:
def get_related(seed_id, max_results = 25):
all_rels = []
seed = yt.get_recommended_videos(seed_id, max_results = max_results)
for rank, i in enumerate(seed):
i['rank'] = rank + 1
i['seed'] = seed_id
seed_id = i['video_id']
all_rels.append(i)
seed = yt.get_recommended_videos(seed_id, max_results = max_results)
for rank1, i1 in enumerate(seed):
i1['rank'] = rank1 + 1
i1['seed'] = seed_id
seed_id = i1['video_id']
all_rels.append(i1)
return all_rels
res = get_related('4Y1lZQsyuSQ')
{
"error": {
"errors": [
{
"domain": "usageLimits",
"reason": "dailyLimitExceeded",
"message": "Daily Limit Exceeded. The quota will be reset at midnight Pacific Time (PT). You may monitor your quota usage and adjust limits in the API Console." }
],
"code": 403,
"message": "Daily Limit Exceeded. The quota will be reset at midnight Pacific Time (PT). You may monitor your quota usage and adjust limits in the API Console: "
}
}
On python 3.7.3 using search returns
File "C:\Program Files\Python37\lib\site-packages\youtube_api\youtube_api.py", line 740, in search if len(videos) >= max_results: TypeError: '>=' not supported between instances of 'int' and 'str'
Describe the bug
when using the lib on a 32bits env (raspberry pi), the following error occurs
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/src/app/server.py", line 11, in
import youtube
File "/usr/src/app/youtube.py", line 9, in
from youtube_api import YoutubeDataApi
File "/opt/venv/lib/python3.8/site-packages/youtube_api/init.py", line 1, in
from youtube_api.youtube_api import YoutubeDataApi, YouTubeDataAPI
File "/opt/venv/lib/python3.8/site-packages/youtube_api/youtube_api.py", line 24, in
class YouTubeDataAPI:
File "/opt/venv/lib/python3.8/site-packages/youtube_api/youtube_api.py", line 580, in YouTubeDataAPI
published_before=datetime.datetime.timestamp(datetime.datetime(3000,1,1)),
OverflowError: timestamp out of range for platform time_t
Expected behavior
no error should happen. In my opinion the date '3000' is too high on 32 bits env. Maybe setting it to 2038 is enough.
Traceback
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/src/app/server.py", line 11, in
import youtube
File "/usr/src/app/youtube.py", line 9, in
from youtube_api import YoutubeDataApi
File "/opt/venv/lib/python3.8/site-packages/youtube_api/init.py", line 1, in
from youtube_api.youtube_api import YoutubeDataApi, YouTubeDataAPI
File "/opt/venv/lib/python3.8/site-packages/youtube_api/youtube_api.py", line 24, in
class YouTubeDataAPI:
File "/opt/venv/lib/python3.8/site-packages/youtube_api/youtube_api.py", line 580, in YouTubeDataAPI
published_before=datetime.datetime.timestamp(datetime.datetime(3000,1,1)),
OverflowError: timestamp out of range for platform time_t
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
Describe the bug
get_channel_metadata()
fails when channel_id
is a list
instead of a string
. Cardinality of the list doesn't matter, meaning it fails even on a list of length 1.
Expected behavior
Instead of crashing, it should return an iterable (probably a dict
) of metadata for each of the channel_id
's.
Traceback
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 209, in get_channel_metadata
# **kwargs):
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 165, in get_channel_metadata_gen
# response_json = self._http_request(http_endpoint)
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 109, in _http_request
# response_json = _load_response(response)
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api_utils.py", line 48, in _load_response
# response.raise_for_status()
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
# raise HTTPError(http_error_msg, response=self)
# requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://www.googleapis.com/youtube/3/channels?part=id,snippet,contentDetails,statistics,topicDetails,brandingSettings&id=UCwdVOry0oNF9WIe_3uCfz9Q&key={key_goes_here}&maxResults=50
# >>>
Desktop (please complete the following information):
Additional context
## Here's a minimal testbed for reproducing this bug:
import os
from youtube_api import YoutubeDataApi as yt_api
YT_KEY = os.environ.get('YT_KEY')
yt = yt_api(YT_KEY, verbose = True)
## Here we try it with channel_id as a string. This test case works perfectly. I turned on verbose mode so we can see the endpoint URL fully composed with all vars subbed in.
yt.get_channel_metadata(channel_id = 'UCwdVOry0oNF9WIe_3uCfz9Q')
# https://www.googleapis.com/youtube/v3/channels?part=id,snippet,contentDetails,statistics,topicDetails,brandingSettings&id=UCwdVOry0oNF9WIe_3uCfz9Q&key={API_KEY_PLACEHOLDER}&maxResults=50
## It turns out that this URL is identical to the one printed by the failing case below
# {'channel_id': 'UCwdVOry0oNF9WIe_3uCfz9Q', 'title': "Matt's Off Road Recovery", 'account_creation_date': datetime.datetime(2019, 3, 28, 13, 50, 25), 'keywords': '"matt\'s off road recovery" "matt\'s towing" "matt\'s towing and recovery" "matts off road recovery" "winder towing" "off road recovery" "matts towing" "matt towing and recovery" "matts recovery" "matt off road recovery" "wheels on the bus" "matt recovery" "off road" "you took your hyundai where" "stuck in the mud" "stuck in the sand" "stuck truck" "stuck 4x4" "sand recovery" "4x4 recovery" "side by side" "jeep cherokee" "jeep xj"', 'description': 'Off road towing, recoveries and rescues. We cover beautiful southern Utah, near Zion National park. We have a unique way to do off road recovery with our Jeep XJ affectionately named, the yellow banana. We have the infamous Ed with his postive outlook on life.', 'view_count': '27534880', 'video_count': '137', 'subscription_count': '147000', 'playlist_id_likes': None, 'playlist_id_uploads': 'UUwdVOry0oNF9WIe_3uCfz9Q', 'topic_ids': None, 'country': 'US', 'collection_date': datetime.datetime(2020, 4, 7, 20, 35, 24, 687072)}
# >>>
## On the other hand, with the same channel_id as a list containing a single string, we get a failure. The same is true for longer lists also of course. As above, I turned on verbose mode so we can see the endpoint URL fully composed with all vars subbed in.
yt.get_channel_metadata(channel_id = ['UCwdVOry0oNF9WIe_3uCfz9Q'])
# https://www.googleapis.com/youtube/3/channels?part=id,snippet,contentDetails,statistics,topicDetails,brandingSettings&id=UCwdVOry0oNF9WIe_3uCfz9Q&key={API_KEY_PLACEHOLDER}&maxResults=50
## It turns out that this URL is identical to the one printed by the failing case above
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 209, in get_channel_metadata
# **kwargs):
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 165, in get_channel_metadata_gen
# response_json = self._http_request(http_endpoint)
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api.py", line 109, in _http_request
# response_json = _load_response(response)
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/youtube_api/youtube_api_utils.py", line 48, in _load_response
# response.raise_for_status()
# File "/Users/anavratil/.virtualenvs/influencer_fraud/lib/python3.7/site-packages/requests/models.py", line 941, in raise_for_status
# raise HTTPError(http_error_msg, response=self)
# requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://www.googleapis.com/youtube/3/channels?part=id,snippet,contentDetails,statistics,topicDetails,brandingSettings&id=UCwdVOry0oNF9WIe_3uCfz9Q&key={key_goes_here}&maxResults=50
# >>>
Thanks for the great library by the way, it's quite handy :)
get_captions
Located in test_captions_methods.py
def get_captions(self, video_id, lang_code='en', parser=P.parse_caption_track, **kwargs)
get_channel_id_from_user
Located in test_channel_methods.py
def get_channel_id_from_user(self, username, **kwargs)
get_channel_metadata_gen
Located in test_channel_methods.py
def get_channel_metadata_gen(self, channel_id, parser=P.parse_channel_metadata, part=["id", "snippet", "contentDetails", "statistics", "topicDetails", "brandingSettings"], **kwargs)
get_channel_metadata
Located in test_channel_methods.py
def get_channel_metadata(self, channel_id, parser=P.parse_channel_metadata, part=["id", "snippet", "contentDetails", "statistics", "topicDetails", "brandingSettings"], **kwargs)
get_subscriptions
Located in test_channel_methods.py
def get_subscriptions(self, channel_id, next_page_token=False, parser=P.parse_subscription_descriptive, part=['id', 'snippet'], **kwargs)
get_featured_channels
Located in test_channel_methods.py
def get_featured_channels(self, channel_id, parser=P.parse_featured_channels, **kwargs)
get_featured_channels_gen
Located in test_channel_methods.py
def get_featured_channels_gen(self, channel_id, parser=P.parse_featured_channels, part=["id", "brandingSettings"], **kwargs)
__init__
Located in test_initialization.py
def __init__(self, key, api_version='3')
verify_key
Located in test_initialization.py
def verify_key(self)
get_playlists
Located in test_playlist_methods.py
def get_playlists(self, channel_id, next_page_token=False, parser=P.parse_playlist_metadata, part=['id','snippet','contentDetails'], **kwargs)
get_video_from_playlist_id
Located in test_playlist_methods.py
def get_videos_from_playlist_id(self, playlist_id, next_page_token=None, published_after=datetime.datetime(1990,1,1), parser=P.parse_video_url, part=['snippet'], **kwargs)
search
Located in test_search_args.py
def search(self, q=None, channel_id=None, max_results=5, order_by="relevance", next_page_token=None, published_after=datetime.datetime(2000,1,1), published_before=datetime.datetime(3000,1,1), location=None, location_radius='1km', region_code=None, safe_search=None, relevance_language=None, event_type=None, topic_id=None, video_duration=None, search_type="video", parser=P.parse_rec_video_metadata, part=['snippet'], **kwargs)
get_video_metadata_gen
Located in test_video_methods.py
def get_video_metadata_gen(self, video_id, parser=P.parse_video_metadata, part=['statistics','snippet'], **kwargs)
get_video_metadata
Located in test_video_methods.py
def get_video_metadata(self, video_id, parser=P.parse_video_metadata, part=['statistics','snippet'], **kwargs)
get_video_comments
Located in test_video_methods.py
def get_video_comments(self, video_id, get_replies=True, max_results=None, next_page_token=False, parser=P.parse_comment_metadata, part = ['snippet'], **kwargs)
get_recommended_videos
Located in test_video_methods.py
def get_recommended_videos(self, video_id, max_results=5, parser=P.parse_rec_video_metadata, **kwargs)
test_videos_from_playlist.py
test_playlist_methods.py
Located in test_parsers.py
def raw_json(item)
def raw_json_with_datetime(item)
def parse_video_metadata(item)
def parse_video_url(item)
def parse_channel_metadata(item)
def parse_subscription_descriptive(item)
def parse_featured_channels(item)
def parse_playlist_metadata(item)
def parse_comment_metadata(item)
def parse_rec_video_metadata(item)
def parse_caption_track(item)
raw_json
parse_video_metadata
parse_video_url
parse_channel_metadata
parse_featured_channels
parse_playlist_metadata
parse_comment_metadata
parse_rec_video_metadata
def raw_json_with_datetime(item)
def parse_caption_track(item)
def parse_subscription_descriptive(item)
Located in test_utils.py
def _chunker(l, chunksize)
def _load_response(response)
def _text_from_html(html_body)
def parse_yt_datetime(date_str)
def strip_video_id_from_url(url)
def get_upload_playlist_id(channel_id)
def get_liked_playlist_id(channel_id)
def is_user(channel_url)
def strip_youtube_id(channel_url)
def get_channel_id_from_custom_url(url)
def get_url_from_video_id(video_id)
I had originally added some convenience functions to deal with in-the-wild URLs and to collect comments, but these functions aren't actually part of the YouTube API.
I'd like to set a deprecation date on those function. The utils might be added to gists for convenience or ported to a separate library of helper functions.
Thoughts? @mabrownnyu
Describe the bug
when specifying a mix playlist into the function YoutubeDataApi.get_videos_from_playlist_id() the function does not respond
Expected behavior
either return the videos in the playlist or raise an exception
Traceback
None
Desktop (please complete the following information):
Additional context
Add any other context about the problem here.
Thanks for the module!
I'm looking for caching support, but I believe the module itself does not offer caching functionality. Correct?
Is it possible to add caching functionality via https://pypi.org/project/requests-cache/ without touching the youtube module?
I tried it like so:
from youtube_api import YoutubeDataApi
import requests_cache
requests_cache.install_cache()
yt = YoutubeDataApi(APIKEY)
yt.get_video_metadata("QeP5WCM2oz0")
yt.get_video_metadata("QeP5WCM2oz0")
But this didn't seem to have work. Both requests seem to hit the Youtube API via network.
Any hints? Thanks!
I'm re-running code that worked in the spring of 2019.
Most of the functions work fine but I'm having issues with the one listed above. When I call
yt.get_videos_from_playlist_id("UUaeO5vkdj5xOQHp4UmIN6dw")
I get the following message.
TypeError Traceback (most recent call last)
in ()
----> 1 yt.get_videos_from_playlist_id("UUaeO5vkdj5xOQHp4UmIN6dw")
C:\Users\kevin\Anaconda3\lib\site-packages\youtube_api\youtube_api.py in get_videos_from_playlist_id(self, playlist_id, next_page_token, published_after, parser)
265 for item in response_json.get('items'):
266 publish_date = parse_yt_datetime(item['snippet'].get('publishedAt'))
--> 267 if publish_date <= published_after:
268 run=False
269 break
TypeError: '<=' not supported between instances of 'NoneType' and 'datetime.datetime'
Right now "part" is hardcoded for every request.
Ex: 'https://youtube.com/8qi8ixn6-Vo#t=1397' returns '8qi8ixn6-Vo#t=1397'
Hi,
after obtaining API Keys following steps 1 to 6 from the specific Tutorial, and running python code on a Jupiter notebook just as described in steps 1 to 6 from this github github, I The following error on the Jupiter notebook:
HTTP Error: 403 Client Error: Forbidden for url: https://www.googleapis.com/youtube/v3/search?part=snippet&type=video&maxResults=50&order=date&key=AIzaSyAeQv2hbTDAWDVeoo_dJmMMrk-woNBmSMo&q=social%20justice%7Csjw&publishedAfter=2018-03-01T00:00:00Z&publishedBefore=2018-03-31T23:59:00Z&verbose=1&pageToken=CKwCEAA
Right before that error pops up, the Jupiter notebook shows a pink line that looks as follows:
I thought my problem might be that I exceeded my daily quota, but I'm afraid that is not the cause because I obtained the same result after I created two new projects in the same day and one more new project that they after (apparently quotas are set to zero/new every day). I believe my problem might lie in the I followed the wrong instructions to generate API keys. My goal was to replicate that whole Jupiter notebook that I linked above. I would appreciate any help in this respect.
Describe the bug
cannot import the module
Expected behavior
module import works
Traceback
>>> from youtube_api import YoutubeDataApi
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'YoutubeDataApi' from 'youtube_api' (/home/netllama/stuff/flask/lib64/python3.11/site-packages/youtube_api/__init__.py)
Desktop (please complete the following information):
Describe the bug
In the current implementation of get_video_metadata()
, the function fails for me with default parameters. The error is related to a missing statistics
key in the parsed JSON result. While I think I can track this issue to the default parser used in get_video_metadata()
, even if I provide "statistics" as part of the requested data parts, the error persists.
Expected behavior
In the default behavior, this function should return video metadata that includes both statistics
and snippet
data. The parser should work with this default data.
Desktop (please complete the following information):
Does anyone want to try to implement this theme in readthedocs/sphinx?
https://userstyles.org/styles/159458/read-the-docs-dark?utm_campaign=stylish_stylepage
why was this removed?
Describe the bug
I am getting the error "name 'urllib' is not defined" when using the search function based on location - the error occurs with urllib.parse despite including import urllib.parse at the top of my code.
Expected behavior
The output should be similar to as outlined here.
Traceback
The error occurs when I run the following
test_search = yt.search(location = (2.09319, 96.647463), \ locationRadius = "50km", \ order_by = "viewCount", \ topic_id = "/m/06bvp", \ max_results = 5)
Desktop (please complete the following information):
I want to receive the list of video's ids that are in the given playlist. I use get_videos_from_playlist_id(id), but there is a AttributeError.
vds = yt.get_videos_from_playlist_id(id)
Traceback (most recent call last):
File "<pyshell#141>", line 1, in <module>
vds = yt.get_videos_from_playlist_id(pls[2])
File "C:\Python3\lib\site-packages\youtube_api\youtube_api.py", line 380, in get_videos_from_playlist_id
timeout_in_n_seconds=20)
File "C:\Python3\lib\site-packages\youtube_api\youtube_api.py", line 100, in _http_request
with timeout(seconds=timeout_in_n_seconds):
File "C:\Python3\lib\site-packages\youtube_api\youtube_api_utils.py", line 33, in __enter__
signal.signal(signal.SIGALRM, self.handle_timeout)
AttributeError: module 'signal' has no attribute 'SIGALRM'
What it actually means and how to solve it?
Thank you!
468
469 if isinstance(video_id, str):
--> 470 captions = _get_captions(video_id, **kwargs)
471 else:
472 captions = []
NameError: name 'kwargs' is not defined
Describe the bug
When the string is formatted for the http request, two arguments are placed in the wrong order.
See: https://github.com/SMAPPNYU/youtube-data-api/blob/master/youtube_api/youtube_api.py#L402
Expected behavior
Switch the order of these keys
Traceback
Include code block of the error, along with the version of the software and Python.
HTTPError: 400 Client Error: Bad Request for url: https://www.googleapis.com/youtube/v3/subscriptions?channelId=id,snippet&part=UC[...]&maxResults=50&key=AIza[...]
Desktop (please complete the following information):
Linux
Looks like something broke the read the docs API documentation. Was this from the merge request?
File "/Users/michael/Code/youtube-data-api/youtube_api/parsers.py", line 66, in parse_video_url
video_id = item['snippet']['resourceId'].get('videoId')
KeyError: 'resourceId'
We might want to consider explicit datatypes for each parameter.
This issue arose from overlooking this:
#17
Like this:
def func(a:int, b:int) -> int:
return a + b
Hi we need a test for:
search
get_captions
get_playlists
get_videos_from_playlist
get_subscriptions
get_featured_channels
get_video_captions
get_recommended_videos
Search has a lot of arguments, and get_captions does not use the API!
Hi, there is an error in the documents as follows
https://youtube-data-api.readthedocs.io/en/latest/usage/quickstart.html
import os
import pandas as pd
from youtube_api import YoutubeDataApi <----- because import as Api
YT_KEY = os.environ.get('YOUTUBE_API_KEY') # you can hardcode this, too.
yt = YouTubeDataAPI(YT_KEY) <-----------typo here, API shall be Api
this is a great tool, but this error might cause some beginners stuck on this stage.
thank you for your brilliant work.
Hit the API to return comments for a video that has zero comments, and one that has comments disabled.
Do we change backend code when we hit the API?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.