Code Monkey home page Code Monkey logo

cairn's Introduction

Cairn


    //   ) )                              
   //         ___     ( )  __       __    
  //        //   ) ) / / //  ) ) //   ) ) 
 //        //   / / / / //      //   / /  
((____/ / ((___( ( / / //      //   / /   

Cairn is an npm package and CLI tool for saving the web page as a single HTML file, it is TypeScript implementation of Obelisk.

Features

Usage

As CLI tool

npm install -g @wabarc/cairn
$ cairn -h

Usage: cairn [options] url1 [url2]...[urlN]

CLI tool for saving web page as single HTML file

Options:
  -v, --version              output the current version
  -o, --output <string>      path to save archival result
  -u, --user-agent <string>  set custom user agent
  -t, --timeout <number>     maximum time (in second) request timeout
  --no-js                    disable JavaScript
  --no-css                   disable CSS styling
  --no-embeds                remove embedded elements (e.g iframe)
  --no-medias                remove media elements (e.g img, audio)
  -h, --help                 display help for command

As npm package

npm install @wabarc/cairn
import { Cairn } from '@wabarc/cairn';
// const cairn = require('@wabarc/cairn');

const cairn = new Cairn();

cairn
  .request({ url: url })
  .options({ userAgent: 'Cairn/2.0.0' })
  .archive()
  .then((archived) => {
    console.log(archived.url, archived.webpage.html());
  })
  .catch((err) => console.warn(`${url} => ${JSON.stringify(err)}`));

Instance methods

cairn#request({ url: string }): this
cairn#options({}): this
  • userAgent?: string;
  • disableJS?: boolean;
  • disableCSS?: boolean;
  • disableEmbeds?: boolean;
  • disableMedias?: boolean;
  • timeout?: number;
cairn#archive(): Promise
cairn#Archived
  • url: string;
  • webpage: cheerio.Root;
  • status: 200 | 400 | 401 | 403 | 404 | 500 | 502 | 503 | 504;
  • contentType: 'text/html' | 'text/plain' | 'text/*';

Request Params

request
{
  // `url` is archival target.
  url: 'https://www.github.com'
}
options
{
  userAgent: 'Cairn/2.0.0',

  disableJS: true,
  disableCSS: false,
  disableEmbeds: false,
  disableMedias: true,

  timeout: 30
}

Response Schema

for v1.x:

The archive method will return webpage body as string.

for v2.x:

{
  url: 'https://github.com/',
  webpage: cheerio.Root,
  status: 200,
  contentType: 'text/html'
}

License

This software is released under the terms of the GNU General Public License v3.0. See the LICENSE file for details.

cairn's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar waybackarchiver avatar web-flow avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.