Code Monkey home page Code Monkey logo

memgatorbulkdownload's Introduction

Memgator Bulk TimeMap Downloader

Have you ever had a need to download 100 or 1 million TimeMaps using oduwsdl/memgator?

With the caveat that it must be done in a timely manner?

If so then you are in luck because this project has you covered.

Requirements

Requires python 3

Be sure to install the dependencies first

  • [sudo] pip install -r requirements.txt

You also need a running instance of oduwsdl/MemGator

If you do not have one. You can get one at oduwsdl/MemGator/releases

Usage

Basic usage

$ python download.py -m {MGURL} {FORMAT2} -d {DUMDIR} -u {LIST}
# MGURL   => http://localhost:1208
# FORMAT  => link|json|cdxj
# FORMAT2 => (-l, --link)|(-j, --json)|(-c, --cdxj)
# DUMDIR  => Path to directory where timemaps will be dumped
# LIST    => Path to URL list

Full Usage

$ python download.py --help
usage: download [-h] [-m MEMURL] [-w WORKERS] [-r REQUESTS] [-d DUMP] -u URLS
                [-k KEY] [-j | -l | -c]

Bulk download TimeMaps using a local memgator instance

optional arguments:
  -h, --help            show this help message and exit
  -m MEMURL, --memurl MEMURL
                        URL for running memgator instance. Defaults to
                        http://localhost:1208/timemap/json
  -w WORKERS, --workers WORKERS
                        Max number of worker processes spawned. Defaults to 5
  -r REQUESTS, --requests REQUESTS
                        How many requests should be queued per chunk. Defaults
                        to 10
  -d DUMP, --dump DUMP  Directory to dump the TimeMaps in. Defaults to
                        <cwd>/timemaps
  -u URLS, --urls URLS  Path to file (.txt, .csv, .json) containing list of
                        URLs. File type detected by considering extension. If
                        .csv must supply -k <key> so we know where to get the
                        url
  -k KEY, --key KEY     The csv key for the urls
  -j, --json            Download TimeMaps in json format. Default format
  -l, --link            Download TimeMaps in link format
  -c, --cdxj            Download TimeMaps in cdxj format

URL List Format

  • .txt: 1 URL per line
  • .csv: Requires -k or --key {KEY} argument. KEY is the csv column containing the URL
  • .json: List of URLs

License

MIT

memgatorbulkdownload's People

Contributors

n0tan3rd avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

happy-ferret

memgatorbulkdownload's Issues

Format must be specified

The README seems to imply that there is a "default format". When I:
python3 download.py -l uris.txt
...I am told, "download: error: one of the arguments -j/--json -n/--link -c/--cdxj is required". If a format parameter is required, then there is no "default". Which is it?

Empty directory after running trivial example

Using f5b3376 and Python 3.6.4. I created a \r\n delimited list of 3 URIs. I start my local MemGator 1.0-rc7 (on the default port) and ran python3 download.py -j -l uris.txt. A timemaps directory is created but no files are contained within. The aforementioned command returns very quickly with MemGator showing any sign of being interacted with when invoked via memgator -V server.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.