Table of Contents
As a way of determining the originality of a video, it can be helpful to look for semantically similar videos. Start by entering the video link you wish to search for semantically similar videos and your Google API key for Streamlit.
Streamlit Cloud version can be accessed here: https://share.streamlit.io/anderson2805/youtubeplus
Developed and tested on Python 3.8/3.9.
For local installation:
- Get a free API Key at https://developers.google.com/youtube/v3/getting-started
- Clone the repo
git clone https://github.com/anderson2805/YoutubePlus.git cd YoutubePlus
- Install packages in requirements.txt
pip install -r requirements.txt
- To start on local machine
streamlit run streamlit_app.py
Access local version : http://localhost:8501/
Access Streamlit Cloud: https://share.streamlit.io/anderson2805/youtubeplus
- Insert YouTube API obtain from credentials page. Guide
- Enter Video URL of interest which will be served as seed for video comparison (Do not enter additional parameters/data after "v=xxxxx" such as "&t=3228s &ab_channel=CNA")
- Title and Description will be extracted from seed video and shown.
- You can make edits on "Processed Description" to manually remove call-to-actions texts or add texts wrongly removed.
- Suggested Query Keywords will be generated based on Title and Processed Description
- Pick the best suggested query keyword, where more videos will be queried based on.
- Select the number of pages to query, where they are currently sorted based on relevance
- Choose to include or exclude Related videos, which is based on YouTube internal algorithms.
- (Work in Progress, this button currently do nothing) Choose to include or exclude channels data.
- Click the button "Call data from YT APIs"
- During this time, YouTube API are queried for more videos information based on the "Query Keywords" you have entered.
- Video information such as Title, Description, Closed Captioning, No. of Likes (for more information, refer to Data Description)
- Followed by, collecting related videos (Based on YouTube Algorithms).
- Word embeddings of English Closed Captioning (Automatically Generated or Manually uploaded) are processed.
- All the embeddings were than calculated for similarity score towards the seed video caption
For more examples, please refer to the Documentation
Distributed under the Apache-2.0 License. See LICENSE.txt
for more information.