Code Monkey home page Code Monkey logo

node-warc's Introduction

node-warc

Parse Web Archive (WARC) files or create WARC files using Electron or chrome-remote-interface

Run npm install node-warc or yarn add node-warc to ge started

npm Package

API

Full API documentation available at n0tan3rd.github.io/node-warc

Example usage

Example 1: Both .warc and .warc.gz

const AutoWARCParser = require('node-warc')

const parser = new AutoWARCParser('<path-to-warcfile>')
parser.on('record', record => { console.log(record) })
parser.on('done', finalRecord => { console.log(finalRecord) })
parser.on('error', error => { console.error(error) })
parser.start()

Example 2: Only .warc.gz

const WARCGzParser = require('node-warc').WARCGzParser

const parser = new WARCGzParser('<path-to-warcfile>')
parser.on('record', record => { console.log(record) })
parser.on('done', finalRecord => { console.log(finalRecord) })
parser.on('error', error => { console.error(error) })
parser.start()

Example 3: Only .warc

const WARCParser = require('node-warc').WARCParser

const parser = new WARCParser('<path-to-warcfile>')
parser.on('record', record => { console.log(record) })
parser.on('done', finalRecord => { console.log(finalRecord) })
parser.on('error', error => { console.error(error) })
parser.start()

Benchmark

UN-GZIPPED

  • 145.9MB (8,026 records) took 2s. Max node process usage 22 MiB
  • 268MB (852 records) took 2s. Max node process usage 77 MiB
  • 2GB (76,980 records) took 21s. Max node process usage 100 MiB
  • 4.8GB (185,662 records) took 1m. Max node process usage 144.3 MiB

GZIPPED

  • 7.7MB (1,269 records) took 297ms. Max node process memory usage 7.1 MiB
  • 819.1MB (34,253 records) took 16s. Max node process memory usage 190.3 MiB
  • 2.3GB (68,020 records) took 45s. Max node process memory usage 197.6 MiB
  • 5.3GB (269,464 records) took 4m. Max node process memory usage 198.2 MiB

JavaScript Style Guide

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.