Code Monkey home page Code Monkey logo

youtube-comment-scraper-cli's Introduction

YouTube Comment Scraper

Deprecation Notice

This package has been deprecated and no longer works.

About

Command line utility for scraping YouTube comments.

If you prefer a simple-to-use online solution, please go to http://ytcomments.klostermann.ca.

Installation

  1. Download and install Node.js (at least v6.11.4): https://nodejs.org/
  2. In a terminal window, type npm install -g youtube-comment-scraper-cli
  3. The program can be run from the command line with youtube-comment-scraper <VideoID>
  4. Read the rest of the docs or check out youtube-comment-scraper --help

Usage

Command

$ youtube-comment-scraper [options] <VideoID>

Tutorial

For more detailed instructions suitable for beginners take a look at the NetLab Tutorial.

Where's the VideoId?

It's part of the video URL: https://www.youtube.com/watch?v=<VideoID>.

Examples:

Video URL Video ID
https://www.youtube.com/watch?v=abc123def45 abc123def12
https://youtu.be/abc123def45 abc123def45

Options

All command line options are optional (d'oh), except for the VideoID parameter.

Option Description
-f --format <format> output format (json or csv)
-o --outputFile <outputFile> write results to the given file
-d --stdout write results to stdout
-c --collapseReplies collapse replies and treat them the same as regular comments
-s --stream output comments one-at-a-time as they are being scraped
-V, --version output the version number
-h, --help output usage information

Options explained

Format

-f --format <format>

The comments can be formatted as either JSON or CSV data. Defaults to JSON if not specified.

Examples

youtube-comment-scraper -f csv <VideoID>

youtube-comment-scraper --format json <VideoID>


Output File

-o --outputFile <outputFile>

The comments can be written directly to a file. In that case they will not be written to stdout (the terminal window). If you want both file and stdout output use the --stdout flag in addition to --outputFile.

Examples

youtube-comment-scraper -o ./path/to/some/file.json <VideoID>

youtube-comment-scraper --outputFile some-file.csv --stdout --format csv <VideoID>


Stdout

-d --stdout

By default comments are always written to stdout (even without the --stdout flag). However, when using --outputFile, they will only be written to the file. If you want output to both, use --stdout.

Examples

youtube-comment-scraper -d <VideoID>

youtube-comment-scraper --outputFile ./some/file --stdout <VideoID>


Collapse Replies

-c --collapseReplies

By default replies to comments are kept nested under that comment. If --collapseReplies is set, replies will be treated the same as regular comments. An additional field is added to replies replyTo wich contains the ID of the comment to which a reply belongs.

Examples

youtube-comment-scraper -c <VideoID>

youtube-comment-scraper --collapseReplies --format csv <VideoID>


Stream

-s --stream

By default the program will scrape all comments without outputting any of them until the scrape is complete. When the --stream flag is set, comments are written one at a time as soon as they are scraped, while still maintaining the original order of comments (newest first). This works for both the JSON and CSV format.

Examples

youtube-comment-scraper -s <VideoID>

youtube-comment-scraper --stream --format csv <VideoID>

youtube-comment-scraper --stream --format csv --outputFile some-file.csv <VideoID>

youtube-comment-scraper --stream <VideoID> > json-processing-tool


Version

-V --version

Output the current version of the program.

Examples

youtube-comment-scraper -V

youtube-comment-scraper --version


Help

-h --help

Output usage help.

Examples

youtube-comment-scraper -h

youtube-comment-scraper --help

youtube-comment-scraper-cli's People

Contributors

philbot9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

youtube-comment-scraper-cli's Issues

Error API response does not contain a "content_html" field

I tried to download the comments from this video We tried Android 11!
which has 3,330 comments. and using node js LTS Version 12.18.0 running under Debian. the Scraper start fetch the comments but after some time it throws an error
API response does not contain a "content_html" field
the only solution I found is to try five or ten times in hope one time it will not throw an error
and the parameters I used are youtube-comment-scraper -f json -o comments.json 05X0RRmUtE0
and also the error happens with videos with more than 1,000 comments. videos with comments under 1,000 will not throw an error

Receiving "✕ This video does not exist" error

>youtube-comment-scraper --version
1.0.1
> youtube-comment-scraper 'https://www.youtube.com/watch?v=-KYfnYYFMYg'
✕ This video does not exist.
> node --version
v13.12.0
> youtube-comment-scraper KYfnYYFMYg
✕ This video does not exist. 

Does anybody have the same issue?

Add custom parameters to URL requests or session cookies

Are there options to add custom parameters to URL requests or session cookies? I need to set youtube preferences such as language or country. Without them i can't scrape comments of some YT videos which are blocked in my country

Command not found

Hello,

I followed the instruction, but I am getting an error with youtube-comment-scraper abc123def12

that youtube-comment-scraper command not found.

Can you please suggest something?

Does not work in Windows 10

I installed with npm but got this error:

E:\cs>youtube-comment-scraper
'youtube-comment-scraper' is not recognized as an internal or external command,
operable program or batch file.

Also did not work:
E:\cs>C:\Users\user\AppData\Roaming\npm\node_modules\youtube-comment-scraper-cli\bin\youtube-comment-scraper --help
'C:\Users\user\AppData\Roaming\npm\node_modules\youtube-comment-scraper-cli\bin\youtube-comment-scraper' is not recognized as an internal or external command,
operable program or batch file.

Resume/retry capabilities

Youtube API sometimes may return error pages (such as #47) or simply a network error may break the whole fetching process.

To overcome this, I'll kindly propose the below approach:

  1. Turn the scheme into something like this:

    {
        "comments": [
            // ...
        ],
        "numberOfTotalComments": 2300,
        "nextPageToken": null
    }
  2. If an error is encountered at some point, save the partial data to the file with "nextPageToken" being set.

  3. If the user runs the same command again, check if the "nextPageToken" is not empty, and if so resume the operation using it.

And some notes about this approach:

  • "nextPageToken": null means everything is successfully downloaded.
  • On step 2 above, when there is an error, the script can actually make a second try before failing and saving partial data.
  • To decrease the chance of error, there may be delay between requests (in fact I looked at the source code to see if this is already done but failed; because JS code, to me, is very hard to follow)

Handle VideoIDs that start with a "-"

In trying to scrape the following video, I discovered that it's treating the VideoID like another command-line option (one that doesn't exist):

https://youtu.be/-GNO_zEeiYU

youtube-comment-scraper -f csv -o hannity.csv "-GNO_zEeiYU"
error: unknown option `-G'

I'm trying to send a patch to fix it, but there's so much magic going on there that it looks like yargs isn't even explicitly called XD
I'll keep trying, but I'm more of a back-end dev, so this is all a bit of a mystery to me...

Ignore pinned comments

How to ignore pinned comments. Because on scrapping it always takes it as the newest comment when scraping next page.

Download does not work if '-' in videoid

Hello,
Great tool, thanks ! light and efficient

I have an issue to report : dowload of comments with character "-" as the first character of a videoid does not work

For example : youtube-comment-scraper '-CmadmM5cOk'

The character cannot be escaped with \

Many thanks for the fix

videos with ID starting with -

Great project!
Is there a way to wrap the video ID so videos starting with - (e.g. -2SB-pdzTHQ) can be downloaded?
Thanks.

Too many comments?

I am trying to scrape comments from a URL with many comments (i.e. this one IZXgjR9INsA) and i keep getting errors like below:

<--- Last few GCs --->

[31198:0x103ec9000]  2834281 ms: Mark-sweep 2041.5 (2067.8) -> 2039.2 (2067.5) MB, 983.2 / 0.0 ms  (average mu = 0.277, current mu = 0.306) allocation failure scavenge might not succeed
[31198:0x103ec9000]  2835233 ms: Mark-sweep 2041.3 (2067.5) -> 2039.2 (2067.8) MB, 939.7 / 0.0 ms  (average mu = 0.166, current mu = 0.014) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x1006cfa8d]
Security context: 0x384a2c3c0921 <JSObject>
    1: toString [0x384af47b43f1] [buffer.js:~753] [pc=0xd95ac9259f2](this=0x384a50c0dff9 <Uint8Array map = 0x384a6c3e5151>,0x384a6ee804b9 <undefined>,0x384a6ee804b9 <undefined>,0x384a6ee804b9 <undefined>)
    2: arguments adaptor frame: 1->3
    3: /* anonymous */ [0x384a50c07e09] [/usr/local/lib/node_modules/youtube-comment-scraper-cli/node_modules/request/req...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x100b8ff9a node::Abort() (.cold.1) [/usr/local/bin/node]
 2: 0x1000832c4 node::FatalError(char const*, char const*) [/usr/local/bin/node]
 3: 0x1000833ec node::OnFatalError(char const*, char const*) [/usr/local/bin/node]
 4: 0x1001728fd v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 5: 0x1001728a7 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/usr/local/bin/node]
 6: 0x10028c7e7 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/usr/local/bin/node]
 7: 0x10028db6c v8::internal::Heap::MarkCompactPrologue() [/usr/local/bin/node]
 8: 0x10028b73a v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [/usr/local/bin/node]
 9: 0x10028a219 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [/usr/local/bin/node]
10: 0x100291bb8 v8::internal::Heap::AllocateRawWithLightRetrySlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
11: 0x100291c0e v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [/usr/local/bin/node]
12: 0x10026edad v8::internal::Factory::AllocateRawWithImmortalMap(int, v8::internal::AllocationType, v8::internal::Map, v8::internal::AllocationAlignment) [/usr/local/bin/node]
13: 0x1002712ff v8::internal::Factory::NewRawTwoByteString(int, v8::internal::AllocationType) [/usr/local/bin/node]
14: 0x100271282 v8::internal::Factory::NewStringFromUtf8(v8::internal::Vector<char const> const&, v8::internal::AllocationType) [/usr/local/bin/node]
15: 0x10018a7c7 v8::String::NewFromUtf8(v8::Isolate*, char const*, v8::NewStringType, int) [/usr/local/bin/node]
16: 0x100102fb3 node::StringBytes::Encode(v8::Isolate*, char const*, unsigned long, node::encoding, v8::Local<v8::Value>*) [/usr/local/bin/node]
17: 0x10006a8d6 void node::Buffer::(anonymous namespace)::StringSlice<(node::encoding)1>(v8::FunctionCallbackInfo<v8::Value> const&) [/usr/local/bin/node]
18: 0x1006cfa8d Builtins_CallApiCallback [/usr/local/bin/node]
19: 0xd95ac9259f2 
20: 0x1006c8c19 Builtins_ArgumentsAdaptorTrampoline [/usr/local/bin/node]
21: 0x1006cecdb Builtins_InterpreterEntryTrampoline [/usr/local/bin/node]
22: 0xd95ac91af16 
Abort trap: 6

Invalid response from YouTube. Missing body,

Hello,
I tried to run in the node.js CLI but I could not make it work...

youtube-comment-scraper -f json --outputFile C:\Users\Andrino\Desktop\YoutubeDownloads/Youtube.json RJQWdKS7vZg

Error:Invalid response from YouTube. Missing body,

Saving BOTH to json and csv in the same grab/run

Suggestion for new feature/functionality.

Saving BOTH to json and csv in the same grab/run, without having to re-fetch comments from YouTube (which may be "hairy" if there are very many of them). Or a separate tool to convert the json to csv after the json is retrieved. This is only an issue with running the script oneself, not when running from the site (when the site isn't blocked by YouTube, of course), as the site first retrieve the comments from YouTube and then allow one to save first in one and then in the other format without the site having to re-download from YouTube.

One thought for a solution would be more possible parameters to the scraper script, allowing specifying both formats and save to separate files. youtube-comment-scraper-cli-master/lib/csv.js already is aware of both the json and the csv, so perhaps that's the script that would become responsible for the actual output to two formats, if this is the solution opted for.

The perhaps easier solution is a separate simple converter from json to csv. Possibly using the already existing youtube-comment-scraper-cli-master/lib/csv.js .

parameter to add count in the scrapper.

I need a limited number of comments from particular video. Right now what i am doing is storing stream into a txt and then killing the process id when length goes above required. Would be great if you add this as a paramter.
PS. Scrapper works as told! 👍

fails on some VideoIDs

Doesn't allow for VideoIDs whose 1st character is a dash, as it thinks you're passing it an option; quotes around the VideoID don't help. My experience using a valid ID:

youtube-comment-scraper -s -f json -d "-2E18bwbbHk"
error: unknown option `-2'

"--only-new" for fetching only the new comments from the last time

This argument makes the script check the output file first (if it exists) to find the timestamp of the latest comment and stop fetching new comments once it reaches that timestamp.

I believe this feature would save time and bandwidth for videos with 1000s of comments. Also less DDoS to the Youtube API :-)

One issue to come into mind is, if I'm not wrong, this feature will miss new replies that are made to old comments. If there is no easy fix for this, it's fine for me.

Thank you and questions

Dear Phil,

Thank you so much for sharing this program. I found it very easy to implement and helpful. I am working on a research project that requires content analysis about those comments from YouTube videos.

When I retrieve comments, I found some of them written in several languages other than English. Do you know any easy ways to auto-translate them to English?

Looking forward to your reply.

Many thanks,
Shuo

Investigate "unknown error" on koPmuEyP3a0

Note that this issue is now blocked by #47


Prompted by Suggestion: Comment limiting + Don't discard on fail. I've been experimenting with downloading the comments for koPmuEyP3a0 using --stream.

I am using youtube-comment-scraper 1.0.1 and node v10.19.0 in a Debian 64bit (stable) environment within a VirtualBox guest on a Windows 10 host.

All attempts have failed with "unknown error", and all attempts have resulted in a file with a different size.

Either I need advice on how to troubleshoot further, or debugging functionality would need to be implemented to learn more. Perhaps a counter of time and amount of data collected could help; I might be able to implement that on the user side of things (perhaps using some combination of watch and du manual logging? I don't know.)


Suspicions

  • Flaky internet connection somewhere along the chain
  • Throttling or limitations by YouTube
    • Note that I'm running this on a non-proxy IP which also has a browser logged into an account, so IP-based spam protection shouldn't be the issue, but usage still might.
  • Invalid data? - I don't think this is something like invalid data (like a font) within the stream because different file sizes result. There is nothing obvious when I look at the tail data.

CSV tests

youtube-comment-scraper --format csv --stream -- koPmuEyP3a0 | tee output.csv
(text)
✕ unknown error
  • Test 1 - a 13,652 kB file
  • Test 2 - a 1,244 kB file (using just a redirect instead of tee)
  • Test 3 - a 36,632 kB file

JSON tests

youtube-comment-scraper --stream -- koPmuEyP3a0 | tee output.json
(text)
✕ unknown error
  • Test 1 - a 2,440 kB file
  • Test 2 - a 57,436 kB file

(3) With:

node --max-old-space-size=10000 /usr/local/bin/youtube-comment-scraper --stream -- koPmuEyP3a0 | tee output3.json
  • Test 3 - a 68,644 kB file

Special Characters

Hi, really nice project. Congrats!
There are several special characters (depending on the language) within the comments in a youtube video, such as '´' or 'ñ', which are not being properly shown in the output files. What can one do, in this case, in order to have the original corresponding characters within the output file?

Getting unknown error

I have installed the scrapper as per the instructions and when I run the following command I get unknown error message

youtube-comment-scraper -f json --outputFile C:\comments11.json 62PKuCrxDSo

Video IDs starting with dash = 'unknown option' error

Fantastic script!

Have been backing up my video comments with it and discovered that video IDs prefixed with a dash are mistaken as a group of short flag options and result in an error.

The command:
npx youtube-comment-scraper "-e5PICdUp44"

Results in the error message:
error: unknown option '-e'

Upon investigation I found that a solution is to add an option terminator:
npx youtube-comment-scraper -- "-e5PICdUp44"

Just wanted to note this here for anyone else writing scraper scripts like myself.

For future reference though, the culprit is in node_modules\commander\index.js in the Command.prototype.normalize function, where the short flag split occurs.

Some suggestions/requests

Thank you for providing this scraper. I have a few feature suggestions.

  • Recognize playlist and channel IDs and write video listed to separate files
  • Show the actual date/time the comment was made, not the shorthand (not sure if this is possible)
  • Option to omit certain information columns when exporting to CSV (for me, some are unnecessary, but exporting batch amounts makes it challenging to remove them manually)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.