snarfed / huffduff-video Goto Github PK

View Code? Open in Web Editor NEW

95.0 10.0 6.0 146 KB

📺 Extract the audio from videos on YouTube, Vimeo, and other sites and send it to Huffduffer.

Home Page: https://huffduff-video.snarfed.org/

Python 70.68% HTML 18.61% JavaScript 4.83% Dockerfile 5.87%

huffduff-video's Introduction

huffduff-video

Extracts the audio from videos on YouTube, Vimeo, and many more sites and sends it to Huffduffer.

See huffduff-video.snarfed.org for bookmarklet and usage details.

Uses yt-dlp to download the video and extract its audio track. Stores the resulting MP3 file in Backblaze B2.

License: this project is placed in the public domain. Alternatively, you may use it under the CC0 license.

Related projects

Podify is a self-hosted app that also serves feeds of the generated MP3 files. Backed by youtube-dl.
youtube-dl-api-server is a web front-end that uses youtube-dl to extract and return a video's metadata.
Flask webapp and Chrome extension for using youtube-dl to download a video to local disk.
iOS workflow that does the same thing as huffduff-video, except all client side: downloads a YouTube video, converts it to MP3, uploads the MP3 to Dropbox, and passes it to Huffduffer.

Requirements

huffduff-video has a few specific requirements that make it a bit harder than usual to find a host, so right now it's on a full VM, on AWS EC2. I'd love to switch to a serverless/containerized host instead, but I haven't found one that satisfies all of the requirements yet:

Python 3 WSGI application server
able to install and use ffmpeg, generally as a system package
long-running HTTP requests, often over 60s
streaming HTTP responses aka "hanging GETs"
= 1G memory
= 2G disk (largest output file in Dec 2019 was 1.7G)
Lots of egress bandwidth, often >200G/mo

Many of the major serverless PaaS hosts didn't/don't support all of these, especially streaming HTTP responses, since they often have a frontend in front of the application server that buffers entire HTTP responses before returning them.

AWS Lambda: only 512MB disk but can use EFS for more; ~~only Java~~ supports streaming, other languages unclear; expensive-ish egress bandwidth ($.09/G)
Google Cloud Run: ~~no streaming~~ now supports streaming! In memory file system though, which gets expensive; expensive-ish egress bandwidth ($.085/G)
App Engine Standard: no streaming or system packages
App Engine Flexible: pricing is a bit prohibitive, ~$40/mo minimum
Azure Functions and App Service: seems like no streaming or system packages, but hard to tell for sure

Most other smaller serverless hosts (eg Heroku, Zeit, Serverless) don't allow installing system packages like ffmpeg or support streaming HTTP responses either.

Cost and storage

I track monthly costs here. They come from this B2 billing page, and before that, this AWS billing page. The B2 bucket web UI shows the current total number of files and total bytes stored in the huffduff-video bucket.

I've configured the bucket's lifecycle to hide files after 31 days, and delete them 1 day after that. I also configured the bucket settings to send the Cache-Control: max-age=210240 HTTP header to let clients cache files for up to a year.

I originally used AWS S3 instead of B2, but S3 eventually got too expensive. As of 11/21/2019, huffduff-video was storing ~200GB steady state, and downloads were using well over 2T/month of bandwidth, so my S3 bill alone was >$200/month.

System setup

Currently on an AWS EC2 t2.micro instance on Ubuntu 20. unattended-upgrades is on, with the default configuration; logs are in /var/log/unattended-upgrades/.

I started it originally on a t2.micro. I migrated it to a t2.nano on 2016-03-24, but usage outgrew the nano's CPU quota, so I migrated back to a t2.micro on 2016-05-25.

I did both migrations by making an snapshot of the t2.micro's EBS volume, making an AMI from the snapshot, then launching a new t2.nano instance using that AMI. Details.

Here's how I set it up:

# set up swap
sudo dd if=/dev/zero of=/var/swapfile bs=1M count=4096
sudo chmod 600 /var/swapfile
sudo mkswap /var/swapfile
sudo swapon /var/swapfile

# add my dotfiles
mkdir src
cd src
git clone [email protected]:snarfed/dotfiles.git
cd
ln -s src/dotfiles/.cshrc
ln -s src/dotfiles/.gitconfig
ln -s src/dotfiles/.git_excludes
ln -s src/dotfiles/.python

# install core system packages and config
sudo apt-get update
sudo apt-get install apache2 libapache2-mod-wsgi-py3 tcsh python3 python3-pip ffmpeg
sudo pip3 install -U pip
sudo chsh ubuntu
# enter /bin/tcsh

# install and set up huffduff-video
cd ~/src
git clone https://github.com/snarfed/huffduff-video.git
cd huffduff-video
sudo pip3 install -r requirements.txt

# add these lines to /etc/httpd/conf/httpd.conf
#
# # rest is for huffduff-video!
# Options FollowSymLinks
# WSGIScriptAlias /get /var/www/cgi-bin/app.py
# LogLevel info
#
# # tune number of prefork server processes
# StartServers       8
# ServerLimit        12
# MaxClients         12
# MaxRequestsPerChild  4000

# start apache
sudo service apache2 start
systemctl status apache2.service
sudo systemctl enable apache2.service
sudo chmod a+rx /var/log/apache2
sudo chmod -R a+r /var/log/apache2

# on local laptop
cd ~/src/huffduff-video/
scp b2_* aws_* ubuntu@[IP]:src/huffduff-video/

# back on EC2
cd /var/www/
sudo mkdir cgi-bin
cd cgi-bin
sudo ln -s ~/src/huffduff-video/app.py
cd /var/www/html
sudo ln -s ~/src/huffduff-video/static/index.html
sudo ln -s ~/src/huffduff-video/static/robots.txt
sudo ln -s ~/src/huffduff-video/static/util.js

# install cron jobs
cd
cat > ~/crontab << EOF
# clean up /tmp every hour
0 * * * *  find /tmp/ -user www-data -not -newermt yesterday | xargs rm
# auto upgrade yt-dlp daily
10 10 * * *  sudo pip3 install -U yt-dlp; sudo service apache2 restart
# recopy robots.txt to S3 since our bucket expiration policy deletes it monthly
1 2 3 * *  aws s3 cp --acl=public-read ~/src/huffduff-video/s3_robots.txt s3://huffduff-video/robots.txt
EOF
crontab crontab

Local development

It's possible to set Apache up on macOS to run Python like the production Linux setup, eg with Homebrew Apache and UWSGI, but it's a bit complicated. The simpler approach is to make a virtualenv, install requirements.txt and gunicorn in it, and then run appp.py under gunicorn with eg :

gunicorn --workers 1 --threads 10 -b :8080 app

The app will serve on localhost:8080. Run with eg http://localhost:8080/?url=...

Upgrading OS

huffduff-video is pretty small and simple, it doesn't have many unusual dependencies or needs, so I've generally had good luck using Ubuntu's do-release-upgrade tool to upgrade from one Ubuntu LTS version to the next (more, even more):

sudo apt-get update
sudo apt-get upgrade
sudo do-release-upgrade

Python installed packages may disappear, make sure to reinstall those with sudo! Otherwise Apache's mod_wsgi won't see them, or will see older versions.

sudo pip3 install -r requirements.txt

SSL

I followed the Certbot Apache instructions to mint an SSL certificate, install it, and set up a cron job to renew it every 3 months:

sudo snap install core; sudo snap refresh core
sudo snap install --classic certbot
sudo certbot --apache
# answer questions; domain is huffduff-video.snarfed.org

Monitoring

I use Honeycomb to monitor huffduff-video with black box HTTP probes to its home page. If enough of them fail in a given time window, it emails me.

I use CloudWatch to monitor and alert on EC2 instance system checks and CPU quota. When alarms fire, it emails me.

System metrics

To get system-level custom metrics for memory, swap, and disk space, set up Amazon's custom monitoring scripts.

sudo yum install perl-DateTime perl-Sys-Syslog perl-LWP-Protocol-https
wget http://aws-cloudwatch.s3.amazonaws.com/downloads/CloudWatchMonitoringScripts-1.2.1.zip
unzip CloudWatchMonitoringScripts-1.2.1.zip
rm CloudWatchMonitoringScripts-1.2.1.zip
cd aws-scripts-mon

cp awscreds.template awscreds.conf
# fill in awscreds.conf
./mon-put-instance-data.pl --aws-credential-file ~/aws-scripts-mon/awscreds.conf --mem-util --swap-util --disk-space-util --disk-path=/ --verify

crontab -e
# add this line:
# * * * * *	./mon-put-instance-data.pl --aws-credential-file ~/aws-scripts-mon/awscreds.conf --mem-util --swap-util --disk-space-util --disk-path=/ --from-cron

Log collection

To set up HTTP and application level monitoring, I had to:

add an IAM policy
install the logs agent with sudo yum install awslogs
add my IAM credentials to /etc/awslogs/awscli.conf and set region to us-west-2
add these lines to /etc/awslogs/awslogs.conf:

[/var/log/httpd/access_log]
file = /var/log/httpd/access_log*
log_group_name = /var/log/httpd/access_log
log_stream_name = {instance_id}
datetime_format = %d/%b/%Y:%H:%M:%S %z

[/var/log/httpd/error_log]
file = /var/log/httpd/error_log*
log_group_name = /var/log/httpd/error_log
log_stream_name = {instance_id}
datetime_format = %b %d %H:%M:%S %Y

# WSGI writes Python exception stack traces to this log file across multiple
# lines, and I'd love to collect them multi_line_start_pattern or something
# similar, but each line is prefixed with the same timestamp + severity + etc
# prefix as other lines, so I can't.

start the agent and restart it on boot:

sudo service awslogs start
sudo service awslogs status
sudo chkconfig awslogs on

wait a while, then check that the logs are flowing:

aws --region us-west-2 logs describe-log-groups
aws --region us-west-2 logs describe-log-streams --log-group-name /var/log/httpd/access_log
aws --region us-west-2 logs describe-log-streams --log-group-name /var/log/httpd/error_log

define a few metric filters so we can graph and query HTTP status codes, error messages, etc:

aws logs put-metric-filter --region us-west-2 \
  --log-group-name /var/log/httpd/access_log \
  --filter-name HTTPRequests \
  --filter-pattern '[ip, id, user, timestamp, request, status, bytes]' \
  --metric-transformations metricName=count,metricNamespace=huffduff-video,metricValue=1

aws logs put-metric-filter --region us-west-2 \
  --log-group-name /var/log/httpd/error_log \
  --filter-name PythonErrors \
  --filter-pattern '[timestamp, error_label, prefix = "ERROR:root:ERROR:", ...]' \
  --metric-transformations metricName=errors,metricNamespace=huffduff-video,metricValue=1

aws --region us-west-2 logs describe-metric-filters --log-group-name /var/log/httpd/access_log
aws --region us-west-2 logs describe-metric-filters --log-group-name /var/log/httpd/error_log

Understanding bandwidth usage

Back in April 2015, I did a bit of research to understand who was downloading huffduff-video files, to see if I could optimize its bandwidth usage by blocking non-human users.

As always, measure first, then optimize. To learn a bit more about who's downloading these files, I turned on S3 access logging, waited 24h, then ran these commands to collect and aggregate the logs:

aws --profile personal s3 sync s3://huffduff-video/logs .
grep -R REST.GET.OBJECT . | grep ' 200 ' | grep -vE 'robots.txt|logs/20' \
  | sed -E 's/[A-Za-z0-9\/+=_-]{32,76}/X/g' | cut -d' ' -f8,20- | sort | uniq -c | sort -n -r > user_agents
grep -R REST.GET.OBJECT . | grep ' 200 ' | grep -vE 'robots.txt|logs/20' \
  | cut -d' ' -f5 | sort | uniq -c | sort -n -r > ips

This gave me some useful baseline numbers. Over a 24h period, there were 482 downloads, 318 of which came from bots. (That's 2/3!) Out of the six top user agents by downloads, five were bots. The one exception was the Overcast podcast app.

Flipboard Proxy (142 downloads)
Googlebot (67)
Twitterbot (39)
Overcast (47)
Yahoo! Slurp (36)
Googlebot-Video (34)

(Side note: Googlebot-Video is polite and includes Etag or If-Modified-Since when it refetches files. It sent 68 requests, but exactly half of those resulted in an empty 304 response. Thanks Googlebot-Video!)

I switched huffduff-video to use S3 URLs on the huffduff-video.s3.amazonaws.com virtual host, added a robots.txt file that blocks all bots, waited 24h, and then measured again. The vast majority of huffduff-video links on Huffduffer are still on the s3.amazonaws.com domain, which doesn't serve my robots.txt, so I didn't expect a big difference...but I was wrong. Twitterbot had roughly the same number, but the rest were way down:

Overcast (76)
Twitterbot (36)
FlipboardProxy (33)
iTunes (OS X) (21)
Yahoo! Slurp (20)
libwww-perl (18)
Googlebot (14)

(Googlebot-Video was way farther down the chart with just 4 downloads.)

This may have been due to the fact that my first measurement was Wed-Thurs, and the second was Fri-Sat, which are slower social media and link sharing days. Still, I'm hoping some of it was due to robots.txt. Fingers crossed the bots will eventually go away altogether!

To update the robots.txt file:

aws --profile personal s3 cp --acl=public-read ~/src/huffduff-video/s3_robots.txt s3://huffduff-video/robots.txt

I put this in a cron job to run every 30d. I had to run aws configure first and give it the key id and secret.

To find a specific bot's IPs:

$ grep -R FlipboardProxy . | cut -d' ' -f5 |sort |uniq
34.207.219.235
34.229.167.12
34.229.216.231
52.201.0.135
52.207.240.171
54.152.58.154
54.210.190.43
54.210.24.16

...and then to block them, add them to the bucket policy:

{
  "Version": "2012-10-17",
  "Id": "Block IPs",
  "Statement": [
    {
      "Sid": "Block FlipboardProxy (IPs collected 1/25-26/2017)",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": "arn:aws:s3:::huffduff-video/*",
      "Condition": {
        "IpAddress": {
          "aws:SourceIp": [
            "34.207.219.235/32",
            "34.229.167.12/32",
            "34.229.216.231/32",
            "52.201.0.135/32",
            "52.207.240.171/32",
            "54.152.58.154/32",
            "54.210.190.43/32",
            "54.210.24.16/32"
          ]
        }
      }
    }
  ]
}

While doing this, I discovered something a bit interesting: Huffduffer itself seems to download a copy of every podcast that gets huffduffed, ie the full MP3 file. It does this with no user agent, from 146.185.159.94, which reverse DNS resolves to huffduffer.com.

I can't tell that any Huffduffer feature is based on the actual audio from each podcast, so I wonder why they download them. I doubt they keep them all. Jeremy probably knows why!

Something also downloads a lot from 54.154.42.3 (on Amazon EC2) with user agent Ruby. No reverse DNS there though.

huffduff-video's People

Contributors

Stargazers

Watchers

Forkers

kevinmarks fpcmotif damenleeturks ascandroli project-renard-survey kroqbandit

huffduff-video's Issues

ssl issue on c-span.org

https://www.c-span.org/video/?191400-1/depth-francis-fukuyama
-->
http://huffduff-video.snarfed.org/get?url=https%3A%2F%2Fwww.c-span.org%2Fvideo%2F%3F191400-1%2Fdepth-francis-fukuyama

huffduff-video

Fetching https://www.c-span.org/video/?191400-1/depth-francis-fukuyama ...
ERROR: Unable to download webpage: (caused by URLError(SSLError(1, '_ssl.c:493: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure'),))

Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!

<!DOCTYPE html>
<html>
<head>
<title>huffduff-video</title>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="mobile-web-app-capable" content="yes">
</head>
<style> #progress span {display:none;}
        #progress span:last-of-type {display:inline;}
</style>
<body>
<h1><a href="http://huffduff-video.snarfed.org/" target="_blank">huffduff-video</a></h1>
<div id="progress">
Fetching https://www.c-span.org/video/?191400-1/depth-francis-fukuyama ...<br /><p>ERROR: Unable to download webpage: <urlopen error [Errno 1] _ssl.c:493: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure> (caused by URLError(SSLError(1, '_ssl.c:493: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure'),))</p>
Here are the <a href="http://rg3.github.io/youtube-dl/supportedsites.html">
supported sites</a>. If this site isn't supported, it may also post
its videos on YouTube. Try there!
</body>
</html>

C-Span is on the list of supported sites.

Using the current youtube-dl binary on windows the download proceeds properly.

youtube-dl.exe version 2016.05.16 (and 2016.09.04.1) from the download page, the following command line successfully downloads an mp4 file::

youtube-dl.exe https://www.c-span.org/video/?191400-1/depth-francis-fukuyama

[CSpan] 191400: Downloading webpage
[CSpan] 171746: Downloading JSON metadata
[CSpan] 171746: Downloading XML
[download] Destination: In Depth with Francis Fukuyama-171746.mp4
[download]   3.0% of 1.18GiB at  8.00MiB/s ETA 02:25

I am not sure if this is a problem with the version of youtube-dl used as a library (it doesn't seem to be checked into the huffduff-video codebase) or some sort of SSL issue on the server executing the huffduff-video code.

Gets stuck at Fetching...

For the last several days I can't get huffduff-video to get beyond "Fetching". All I get for output is, for example:
Fetching https://www.youtube.com/watch?v=HcedcEr27ZU ...

I tried letting it go for hours and there's no sign of progress.
I've tried it through the bookmarklet (my usual method) and the web interface.

Has Vimeo blocked the IP addresses for huffduff-video?

I've been having difficulty huffduffing Vimeo videos for a few months now. When I attempt to huffduff, I get error messages similar to this one:

Fetching https://vimeo.com/278439003 ...
ERROR: Unable to extract info section (caused by ExtractorError(u'Unable to download webpage: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!

I'm not sure exactly what is going on, but I can think of two plausible scenarios:

Huffduff-video has triggered some automated spam blocker in Vimeo by sending lots of requests from the same IP range; or
Vimeo has changed the URL format from which they serve their videos.

I suspect the former is more likely, but I don't actually know what's going on.

YouTube Returning Bot Error

I've started getting an error on YouTube videos:

ERROR: [youtube] BpkLhRubZkw: Sign in to confirm you're not a bot. This helps protect our community. Learn more

port to AWS lambda or t2.nano

lambda docs. it'd be pretty nice to not have to run the EC2 instance, and it would cut our cost down a bit once our free tier year is up.

it would only cut our cost down by ~1/3, though. our bill is currently ~$17/mo, 90% of which is bandwidth, and our t2.micro will only add ~$9/mo.)

also, i'm not sure if lambda is ready for us yet. notably, there's a 5m request deadline and 512MB disk quota. i think most requests finish in <5m, but i'm not sure. i don't have good metrics on request latency. :/ and our output files are pretty much all <512MB, but i'm not sure how much more space the intermediate files take up.

an alternative is the new t2.nano instance type.

just collecting data here for now.

support RTMP videos

...by installing rtmpdump. background in ytdl-org/youtube-dl#1797. the difficulty is that rtmpdump isn't in amazon's yum repo, so i'd have to build, install, and maintain it manually. bleh.

example RTMP video: http://www.bbc.co.uk/programmes/b05zktnk

requested by @kevinmarks on #indiewebcamp.

block crawlers/bots on backblaze, robots.txt or otherwise

when we used AWS S3 to serve files, we got our own subdomain, eg https://huffduff-video.s3-us-west-2.amazonaws.com/ , so we could add and serve our own /robots.txt to block crawlers and other bots.

backblaze URLs don't give us our own subdomain, though. the bucket is in the path instead of the domain. example: https://f000.backblazeb2.com/file/huffduff-video/foo.mp3 . so we can't serve a /robots.txt. and i think we've started to see the crawlers and bots come back. April bandwidth usage was 4x March's, and May is on track to be a 5.5x increase over April, to 13TB!!! ugh.

number of files and file size has stayed constant.

Youtube fetching appears broken

If you give a URL to a youtube video (such as https://www.youtube.com/watch?v=joSUFaLcKWc), you get:

huffduff-video
Fetching https://www.youtube.com/watch?v=joSUFaLcKWc ...

and then it stops. No further updates, no loading. The page load just finishes.

Doesn't capture from inside a YouTube playlist

If content is playing in the 'Watch Later' playlist on YouTube, huffduff-video returns this error...

huffduff-video
Fetching https://www.youtube.com/watch?v=QPc0wyeuYW4&list=WL&index=1 ...
ERROR: Unsupported URL: https://www.youtube.com/watch?v=QPc0wyeuYW4&list=WL&index=1
Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!

If I search for the video title and play it outside the list, it works as expected.

This doesn't apply to videos within a playlist that I created in YouTube.

403 from s3 bucket

The last couple of videos I snagged, cmdln at huff duff, return 403 when I try to download them.

Instagram not working

I tried a few Instagram videos and get: "ERROR: Unable to extract video url"

My theory was that it isn't using the latest youtube-dl release (the last two changes were to fix Instagram problems), but then saw you wrote on another issue that it updates automatically.

embedded youtube videos don't work

...they almost work, but the filename gets confused.

e.g. https://snarfed.org/2015-02-26_13320 embeds https://www.youtube.com/watch?v=ei2JbYeHl-I . we think the download filename is based on the web site domain, https_-_snarfed.org_2015-02-26_13320.mp3, but it actually still uses youtube.com, ie https_-_www.youtube.com_watchv=ei2JbYeHl-I.mp3, so key.set_contents_from_filename(filename) fails.

evidently vimeo embeds work fine though. odd.

add a way to delete truncated/bad mp3 files

I had a Youtube video (3 videos actually) fail to download (or maybe transcode?) completely. (I didn't save the error message, sorry.) I tried it again, and it seemed to work. However, the mp3 files are incomplete, and only have the first ~20 minutes of a 2-hour video.

When I try it again, it just gives me the same truncated mp3 file. Is there some way to delete the incomplete mp3 and start over?

Here's the video: https://www.youtube.com/watch?v=3OdyygKqTW0
and here's the resulting mp3: https://huffduff-video.s3-us-west-2.amazonaws.com/youtube.com_watchv=3OdyygKqTW0.mp3

add video link to description

...and maybe also link to huffduff-video. thanks to christian for the nudge.

Failure to download

Trying to download from https://monocle.com/radio/shows/the-entrepreneurs/301/ I get this error:

"could not convert string to float: Unknown"

Vimeo no longer working

Parsing of Vimeo URLs seems to have stopped a few weeks back. I've tried with multiple Vimeo videos and keep getting the error:

"ERROR: No known codec found; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output."

S3 rule to expire files after 30d

Unable to extract video data

Currently Huffduff-video seems to be unable to download from YouTube:

Currently returning an error message like so:

huffduff-video

Fetching https://www.youtube.com/watch?v=AnZ0uTOerUI ...
ERROR: AnZ0uTOerUI: YouTube said: Unable to extract video data

Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!

I'm getting a Service Unavailable error

Here is the error I've been getting the past several days:

huffduff-video

Fetching https://www.youtube.com/watch?v=si-31hsxrGI&feature=em-uploademail ...
ERROR: Unable to download webpage: HTTP Error 503: Service Unavailable (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!

ERROR: Signature extraction failed

I tried to use the tool for a 50-minute YouTube video, and it errors out pretty badly without accomplishing anything. Any idea what's up?

Video: https://www.youtube.com/watch?v=KEt5_LELTd4

Error message:
Fetching https://www.youtube.com/watch?v=KEt5_LELTd4 ...

ERROR: Signature extraction failed: Traceback (most recent call last): File "/usr/local/lib/python2.7/site-packages/youtube_dl/extractor/youtube.py", line 1070, in _decrypt_signature video_id, player_url, s File "/usr/local/lib/python2.7/site-packages/youtube_dl/extractor/youtube.py", line 958, in _extract_signature_function raise ExtractorError('Cannot identify player %r' % player_url) ExtractorError: Cannot identify player u'https://www.youtube.com/yts/jsbin/player-vflppxuSE/en_US/base.js'; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output. (caused by ExtractorError(u"Cannot identify player u'https://www.youtube.com/yts/jsbin/player-vflppxuSE/en_US/base.js'; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.",)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!

truncate download filenames when they're too long

saw this in the wild on http://www.lightreading.com/mobile/carrier-wifi/whats-up-with-wifi/v/d-id/714374:

DownloadError: ERROR: unable to open for writing: [Errno 36] File name too long: '/tmp/http_-_c.brightcove.com_services_viewer_htmlFederatedplayerID=3639434099001_%40videoPlayer=4104889826001_playerKey=AQ%7E%7E%2CAAADPjcRqUE%7E%2CTq347XBYAnbIZWisdUFfTPqlIVJWTQel_youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.lightreading.com%2Fmobile%2Fcarrier-wifi%2Fwhats-up-with-wifi%2Fv%2Fd-id%2F714374%22%7D.mp4.part'

we need to truncate the download filename when it gets too long like this. may not be easy with outtmpl, but hopefully possible.

The website seems to be completely down.

As of 5:36AM PT (9/2/2019), the website huffduff-video.snarfed.org is down. Not just from my end, but I also checked https://downforeveryoneorjustme.com/huffduff-video.snarfed.org, whom also say the website's down.

Perhaps it's just a temporary thing, but I thought I may as well report it here.

Soundcloud errors

Howdy!

It looks like anything on Soundcloud is no longer working with huffduff-video:

ERROR: Unable to download JSON metadata: HTTP Error 401: Unauthorized (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Before I submit a bug report to youtube-dl I just wanted to check that huffduff-video was all up to date.

Cheers,

Jeremy

Error in output template: unsupported format character

happens when the video title has a % in it. i think this is ytdl-org/youtube-dl#5006 ?

example:

mod_wsgi (pid=10284): Exception occurred processing WSGI script '/var/www/cgi-bin/app.py'.,
  referer: http://www.polygon.com/a/2016-game-preview-100-games/introduction-and-video-special
Traceback (most recent call last):
  File "/var/www/cgi-bin/app.py", line 128, in run
    youtube_dl.YoutubeDL(options).download([url])
  File "/usr/local/lib/python2.6/site-packages/youtube_dl/YoutubeDL.py", line 1677, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "/usr/local/lib/python2.6/site-packages/youtube_dl/YoutubeDL.py", line 676, in extract_info
    return self.process_ie_result(ie_result, download, extra_info)
...
  File "/usr/local/lib/python2.6/site-packages/youtube_dl/YoutubeDL.py", line 589, in prepare_filename
    self.report_error('Error in output template: ' + str(err) + ' (encoding: ' + repr(preferredencoding()) + ')')
  File "/usr/local/lib/python2.6/site-packages/youtube_dl/YoutubeDL.py", line 540, in report_error
    self.trouble(error_message, tb)
  File "/usr/local/lib/python2.6/site-packages/youtube_dl/YoutubeDL.py", line 510, in trouble
    raise DownloadError(message, exc_info)
DownloadError: ERROR: Error in output template: unsupported format character 'B' (0x42) at index 102 (encoding: 'ANSI_X3.4-1968')

Error - This site can’t be reached

At 3am Central on Monday, 6-21-2021 getting "This site can’t be reached" when trying to access http://huffduff-video.snarfed.org/get

Soundcloud audio is failing

The site is failing to pull files from Soundcloud. I notice that yt-dl had an update recently. Maybe huffduff-video needs to update its version?

Error message:

huffduff-video
Fetching https://soundcloud.com/post-traditional-buddhism/111-imperfect-buddha-buddhism-goes-post-traditional ...

ERROR: Unable to download JSON metadata: HTTP Error 401: Unauthorized (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!

Youtube broken again

Attempting to snarf Youtube content just now, I receive the following error:

**ERROR: Unsupported URL: https://www.youtube.com/

Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!**

set up cloudwatch monitoring and alerts

details in the readme. this is maybe half done: cloudwatch is up and running and ingesting the apache logs, but we don't yet have meaningful monitoring or alerting rules yet, either app level or system level.

Switch to yt-dlp?

youtube-dl is venerable and awesome and the workhorse behind huffduff-video, but its development progress may have stalled. It cut weekly releases for years, but the last release was 2 mos ago, 6/6/2021, and the two main maintainers now seem inactive on GitHub as a whole. Background in ytdl-org/youtube-dl#29753 and ytdl-org/youtube-dl#26462.

A couple active forks have popped up, https://github.com/yt-dlp/yt-dlp and https://github.com/blackjack4494/yt-dlc. Should we switch to one of them?

Downloads file, then nothing

So this is an irritatingly generic bug, but on Chrome 43.0.2357.65 (64-bit) on OS X, the bookmarklet seems to download YouTube and Vimeo files just fine, but after that nothing happens. The regular huffduffer bookmarklet is working, so I'm signed in and all that.

How should I go about debugging this?

Twitch.tv

When I try to huffduff-video a recorded twitch page, I just get a huffduffer popup with no audio link. I couldn't tell by http://rg3.github.io/youtube-dl/supportedsites.html if it was supposed to be supported or not.

Let me know if I should open this issue on youtube-dl instead.

Cannot Huffduff Twitter Videos

I get the following error when trying to huffduff a twitter video:

huffduff-video

Fetching https://twitter.com/Seahawks/status/922246436753936386 ...
ERROR: Unable to extract guest token; please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Here are the supported sites. If this site isn't supported, it may also post its videos on YouTube. Try there!

I have pulled a new copy of the bookmarklet just to make sure I have the latest. I have also tried pasting the link into the site and I get the same error.

Error - No space left on device

I'm getting this error when trying to archive an audio via the huffduff-video.snarfed.org site:

ERROR: unable to write data: [Errno 28] No space left on device

Vimeo?

My apologies if this isn't the right place to report this, but It appears that HuffDuffer bookmarlet is no longer working with Vimeo. Try this one:

https://vimeo.com/268653081

I see this in the popup window:

Fetching https://vimeo.com/268653081 ...

ERROR: Unable to extract info section (caused by ExtractorError(u'Unable to download webpage: HTTP Error 404: Not Found (caused by HTTPError()); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.',)); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; see https://yt-dl.org/update on how to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.

Is it possible that youtube-dl needs to be updated where it is running?

ERROR: unable to write data: [Errno 28] No space left on device

Got this error while trying to get a simple 20-minute video.

Video:
https://www.youtube.com/watch?v=VPoX74kSHsE

Stuck on "Extracting audio"?

Using huffduffer, it seems to just hang and never get past "Extracting Audio (this can take a while)...".

I've let it run for over 30 minutes, and it doesn't complete. It's never been like this before. Any idea what's up?
Here's the vid I tried it on: https://www.youtube.com/watch?v=-yH6Z9m2vsg . Also tried another vid and it didn't work.

Youtube: too many requests

When I try to use huffduffer to add a Youtube video I get this error:

nice error message for unsupported sites

log full exception stack traces

SSL issue pulling soundcloud audio

When I try to pull an audio file from soundcloud, I'm seeing the following SSL handshake error:

huffduff-video
Fetching https://soundcloud.com/motherboard/cryptoparty-in-harlem ...

ERROR: Unable to download JSON metadata: (caused by URLError(SSLError(1, '_ssl.c:493: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure'),))

ERROR: unable to write data: [Errno 28] No space left on device

I am getting the following error when trying to huffduff-video a youtube talk:

ERROR: unable to write data: [Errno 28] No space left on device

Thanks for the incredibly valueable script.

ERROR: AmS0KpQL1HU: YouTube said: Unable to extract video data

I have been getting this error all day today:
ERROR: AmS0KpQL1HU: YouTube said: Unable to extract video data
ERROR: nXdV46UD6Uo: YouTube said: Unable to extract video data

ERROR: unable to write data: [Errno 28] No space left on device

I'm getting the following error when trying to use huffduff-video bookmarklet. It seems to point to the server being out of space.

ERROR: unable to write data: [Errno 28] No space left on device

Thanks for your incredibly useful script.

Stuck on Opening Huffduffer dialog

That's it, really. Not the first time this has happened recently. From https://fieldlabearth.libsyn.com/sustainable-agriculture-programs-in-the-usa-and-eu-with-dr-scott-hutchins

The audio downloaded OK and then no dialog from Huffduffer.

I wonder, is it possible to show the url of the audio? Then I could add it to Huffduffer manually.

Thanks,

Jeremy

Provide form field for pasting a URL

It appears the only way to use this is via the bookmarklet, which means people who don't use bookmarks or the bookmark toolbar don't have a good way to use this. I was trying to walk someone through converting a video for huffduffer, and this was a stumbling block. It would have been clearer to explain to paste a Youtube URL into this app.

occasionally shows no output text until the very end

not sure why. WSGI + HTTP streaming + some output getting buffered somewhere? ¯_(ツ)_/¯

i'm not exactly jumping at the chance to debug this myself, but it's definitely a bad user experience. cc @kevinmarks in case you have any idea what's going on!

Updated graphics on pop-up is very nice

The progress bar looks a lot more modern and polished. Feels like I'm using a whole new product 😅

Random comment I'll be closing out shortly, but just wanted to praise the update

snarfed / huffduff-video Goto Github PK

huffduff-video's Introduction

huffduff-video

Related projects

Requirements

Cost and storage

System setup

Local development

Upgrading OS

SSL

Monitoring

System metrics

Log collection

Understanding bandwidth usage

huffduff-video's People

Contributors

Stargazers

Watchers

Forkers

huffduff-video's Issues

huffduff-video

Recommend Projects

Recommend Topics

Recommend Org