Code Monkey home page Code Monkey logo

node-site-downloader's Introduction

Build Status Build Status Docker Pulls Npm Downloades

NodeJS based website downloader

Download a website locally without any configuration right from you terminal

Note: The script is based entirely on node-webiste-scraper, an awesome website scraper library :)

Requirments

  • Nodejs version >= 8

Installation

npm install -g node-site-downloader

Usage

node-site-downloader download DOMAIN START_POINT OUTPUT_FOLDER [VERBOSE] [OUTPUT_FOLDER_SUFFIX] [INCLUDE_IMAGES]

Example

# Download all of the english jest documentation
node-site-downloader download -s https://jestjs.io/docs/en/getting-started -d https://jestjs.io/docs/en/ -o jest-docs -v --include-images

For more information please run

node-site-downloader --help
node-site-downloader download --help

Docker support

Now you can run the downloader straight from a docker container. This way there is no need to download nodejs and install node-site-downloader.

Instead please pull the image from dockerhub

docker pull gnird/node-site-downloader

And then run the container with all of the relevant options passed to the script (Please check the options section), except for --output-folder.

--output-folder isn't passed to the container because the script saves the site inside of the container.

Instead configure a volume from a folder on your computer to /data in the container.

docker run -v /some/path:/data ...

Docker example

docker run -v /tmp/mysite:/data gnird/node-site-downloader download -d https://jestjs.io/docs/en/ -s https://jestjs.io/docs/en/getting-started -v 

NOTICE: The first -v configures the volume for the container and the second -v (at the end of the command) is passed to the script in order to make it verbose.

Options

  • domain (-d) - The script will download all of the urls under the specified url.
  • start point (-s) - The page from which the script should start scraping
  • include-images (--include-images) - Should the script download relevant images as well?
  • output folder (--output-folder) - The folder in which the script should save the downloaded assets, Note: The folder should not exist!
  • verbose (-v) - If flag is present the script will print every url that was downloaded.
  • output folder suffix (--output-folder-suffix) - The suffix that will be added to OUTPUT_FOLDER, defaults to: .site

node-site-downloader's People

Contributors

gnir-work avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.