A script for web scraping and downloading the Bitcoin Core bin
directory.
Ideal for creating your own mirror!
Run-time dependency:
- Python3 + pip (
python3 python3-dev python3-pip
) - Additional libs for Scrapy (
libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev
)
More packages will be downloaded via pip
, see next section.
I advice you to use a Python virtual environment, create & activate such an environment via:
python3 -m venv env
source env/bin/activate
Next, install the required packages via:
pip install -r requirements.txt
Execute scraper and start downloading:
scrapy crawl bitcoincore
Or by running: ./start_spider.py
Note: Files are stored within the bin
sub-folder of the root-folder of this project.
Optionally, execute scraper and output the meta-data to a "feed" file (eg. JSON file):
scrapy crawl bitcoincore -O bitcoincore.json
The Docker image is available on DockerHub.
Note: The Docker Image will start the scrawler using a cronjob, so the bitcoin spider runs automatically once a week.
I provided a docker-compose file for convenience.
Building Docker image
Create a Docker image locally using:
docker build -t danger89/bitcoinscraper .
You can use the Scrapy shell to help debugging or learn how to extract data when using scrapy
:
scrapy shell 'https://bitcoincore.org/bin/'
Check the response
object for data, just an example:
response.css('pre a')[3].get()
More info:
- Scrapy homepage
- Scrapy Tutorial docs (ideal for beginners)
- APScheduler Cron docs