melroy89 / bitcoin-core-web-scraper Goto Github PK

View Code? Open in Web Editor NEW

Web Spider for mirroring Bitcoin Core bin folder. Live: https://bitcoin.melroy.org/bin/ (mirror of bitcoincore.org/bin)

License: MIT License

Dockerfile 3.98% Python 95.12% Shell 0.90%

bitcoin-core-web-scraper's Introduction

Bitcoin Core Web Scraper

A script for web scraping and downloading the Bitcoin Core bin directory.
Ideal for creating your own mirror!

Usage

Dependencies

Run-time dependency:

Python3 + pip (python3 python3-dev python3-pip)
Additional libs for Scrapy (libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev)

More packages will be downloaded via pip, see next section.

Prepare

I advice you to use a Python virtual environment, create & activate such an environment via:

python3 -m venv env
source env/bin/activate

Next, install the required packages via:

pip install -r requirements.txt

Run scraper

Execute scraper and start downloading:

scrapy crawl bitcoincore

Or by running: ./start_spider.py

Note: Files are stored within the bin sub-folder of the root-folder of this project.

Optionally, execute scraper and output the meta-data to a "feed" file (eg. JSON file):

scrapy crawl bitcoincore -O bitcoincore.json

Docker Image

The Docker image is available on DockerHub.

Note: The Docker Image will start the scrawler using a cronjob, so the bitcoin spider runs automatically once a week.

I provided a docker-compose file for convenience.

Building Docker image

Create a Docker image locally using:

docker build -t danger89/bitcoinscraper .

Learn & Debug

You can use the Scrapy shell to help debugging or learn how to extract data when using scrapy:

scrapy shell 'https://bitcoincore.org/bin/'

Check the response object for data, just an example:

response.css('pre a')[3].get()

External Links

More info:

Scrapy homepage
Scrapy Tutorial docs (ideal for beginners)
APScheduler Cron docs

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

melroy89 / bitcoin-core-web-scraper Goto Github PK