Code Monkey home page Code Monkey logo

ytranscript's Introduction

ytranscript

ytranscript is a Rust crate that provides functionality to fetch YouTube video transcripts. It supports fetching transcripts in different languages and handles various error scenarios that might occur while retrieving the transcripts.

Features

  • Extracts YouTube video IDs from URLs or strings.
  • Fetches transcripts for YouTube videos.
  • Supports fetching transcripts in specific languages.
  • Handles common errors such as video unavailability, transcript unavailability, and too many requests.

Installation

Add ytranscript to your Cargo.toml:

Usage

Here is an example of how to use the ytranscript crate in a binary crate:

use ytranscript::YoutubeTranscript;
use std::env;

#[tokio::main]
async fn main() {
    // Get the video ID from command line arguments
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        eprintln!("Usage: ytranscript_bin <video_id>");
        return;
    }
    let video_id = &args[1];

    // Fetch the transcript
    match YoutubeTranscript::fetch_transcript(video_id, None).await {
        Ok(transcript) => {
            for entry in transcript {
                println!("{:?}", entry);
            }
        }
        Err(e) => {
            eprintln!("Error: {}", e);
        }
    }
}

Functionality

YoutubeTranscript::fetch_transcript

Fetches the transcript for a given YouTube video ID or URL.

  • Arguments:

    • video_id: A string slice representing the YouTube video URL or ID.
    • config: An optional TranscriptConfig specifying the desired language for the transcript.
  • Returns:

    • Ok(Vec<TranscriptResponse>): A vector of TranscriptResponse if the transcript is successfully fetched.
    • Err(YoutubeTranscriptError): An error if the transcript cannot be fetched.

Error Handling

The crate defines a set of errors that might occur while fetching transcripts:

use thiserror::Error;

#[derive(Error, Debug)]
pub enum YoutubeTranscriptError {
    #[error("YouTube is receiving too many requests from this IP and now requires solving a captcha to continue")]
    TooManyRequests,
    #[error("The video is no longer available ({0})")]
    VideoUnavailable(String),
    #[error("Transcript is disabled on this video ({0})")]
    TranscriptDisabled(String),
    #[error("No transcripts are available for this video ({0})")]
    TranscriptNotAvailable(String),
    #[error("No transcripts are available in {0} for this video ({2}). Available languages: {1:?}")]
    TranscriptNotAvailableLanguage(String, Vec<String>, String),
    #[error("Impossible to retrieve Youtube video ID.")]
    InvalidVideoId,
}

Regex Patterns

The crate uses regex patterns to extract YouTube video IDs and parse XML transcripts:

pub const RE_YOUTUBE: &str =
    r#"(?:youtube\.com\/(?:[^\/]+\/.+\/|(?:v|e(?:mbed)?)\/|.*[?&]v=)|youtu\.be\/)([^"&?\/\s]{11})"#;

pub const USER_AGENT: &str = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36,gzip(gfe)";

pub const RE_XML_TRANSCRIPT: &str = r#"<text start="([^"]*)" dur="([^"]*)">([^<]*)<\/text>"#;

Types

The crate defines the following types:

#[derive(Debug)]
pub struct TranscriptConfig {
    pub lang: Option<String>,
}

#[derive(Debug)]
pub struct TranscriptResponse {
    pub text: String,
    pub duration: f64,
    pub offset: f64,
    pub lang: String,
}

Testing

You can test the functionality of the ytranscript crate by running the following command:

cargo test

License

This project is licensed under the MIT License. See the LICENSE file for details

ytranscript's People

Contributors

rudrodip avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.