Code Monkey home page Code Monkey logo

tall's Introduction

tall

npm version Build Status codecov.io JavaScript Style Guide Written in TypeScript

Promise-based, No-dependency URL unshortner (expander) module for Node.js 16+.

Note: This library is written in TypeScript and type definitions are provided.

Install

Using npm

npm install --save tall

or with yarn

yarn add tall

Usage

ES6+ usage:

import { tall } from 'tall'

tall('http://www.loige.link/codemotion-rome-2017')
  .then((unshortenedUrl) => console.log('Tall url', unshortenedUrl))
  .catch((err) => console.error('AAAW πŸ‘»', err))

With Async await:

import { tall } from 'tall'

async function someFunction() {
  try {
    const unshortenedUrl = await tall(
      'http://www.loige.link/codemotion-rome-2017'
    )
    console.log('Tall url', unshortenedUrl)
  } catch (err) {
    console.error('AAAW πŸ‘»', err)
  }
}

someFunction()

ES5:

var { tall } = require('tall')
tall('http://www.loige.link/codemotion-rome-2017')
  .then(function (unshortenedUrl) {
    console.log('Tall url', unshortenedUrl)
  })
  .catch(function (err) {
    console.error('AAAW πŸ‘»', err)
  })

Options

It is possible to specify some options as second parameter to the tall function.

Available options are the following:

  • method (default "GET"): any available HTTP method
  • maxRedirects (default 3): the number of maximum redirects that will be followed in case of multiple redirects.
  • headers (default {}): change request headers - e.g. {'User-Agent': 'your-custom-user-agent'}
  • timeout: (default: 120000): timeout in milliseconds after which the request will be cancelled
  • plugins: (default: [locationHeaderPlugin]): a list of plugins for adding advanced behaviours

In addition, any other options available on http.request() or https.request() are accepted. This for example includes rejectUnauthorized to disable certificate checks.

Example:

import { tall } from 'tall'

tall('http://www.loige.link/codemotion-rome-2017', {
  method: 'HEAD',
  maxRedirect: 10
})
  .then((unshortenedUrl) => console.log('Tall url', unshortenedUrl))
  .catch((err) => console.error('AAAW πŸ‘»', err))

Plugins

Since tall v5+, a plugin system for extending the default behaviour of tall is available.

By default tall comes with 1 single plugin, the locationHeaderPlugin which is enabled by default. This plugin follows redirects by looking at the location header in the HTTP response received from the source URL.

You might want to write your own plugins to have more sophisticated behaviours.

Some example?

  • Normalise the final URL if the final page has a <link rel="canonical" href="http://example.com/page/" /> tag in the <head> of the document
  • Follow HTML meta refresh redirects (<meta http-equiv="refresh" content="0;URL='http://example.com/'" />)

Known plugins

Did you create a plugin for tall? Send us a PR to have it listed here!

How to write a plugin

A plugin is simply a function with a specific signature:

export interface TallPlugin {
  (url: URL, response: IncomingMessage, previous: Follow | Stop): Promise<
    Follow | Stop
  >
}

So the only thing you need to do is to write your custom behaviour following this interface. But let's discuss briefly what the different elements mean here:

  • url: Is the current URL being crawled
  • response: is the actual HTTP response object representing the current
  • previous: the decision from the previous plugin execution (continue following a given URL or stop at a given URL)

Every plugin is executed asynchronously, so a plugin returns a Promise that needs to resolve to a Follow or a Stop decision.

Let's deep dive into these two concepts. Follow and Stop are defined as follows (touchΓ©):

export class Follow {
  follow: URL
  constructor(follow: URL) {
    this.follow = follow
  }
}

export class Stop {
  stop: URL
  constructor(stop: URL) {
    this.stop = stop
  }
}

Follow and Stop are effectively simple classes to express an intent: should we follow the follow URL or should we stop at the stop URL?

Plugins are executed following the middleware pattern (or chain of responsibility): they are executed in order and the information is propagated from one to the other.

For example, if we initialise tall with { plugins: [plugin1, plugin2] }, for every URL, plugin1 will be executed before plugin2 and the decision of plugin1 will be passed over onto plugin2 using the previous) parameter.

How to write and enable a plugin

Let's say we want to add a plugin that allows us to follow HTML meta refresh redirects, the code could look like this:

// metarefresh-plugin.ts
import { IncomingMessage } from 'http'
import { Follow, Stop } from 'tall'

export async function metaRefreshPlugin(
  url: URL,
  response: IncomingMessage,
  previous: Follow | Stop
): Promise<Follow | Stop> {
  let html = ''
  for await (const chunk of response) {
    html += chunk.toString()
  }

  // note: This is just a dummy example to illustrate how to use the plugin API.
  // It's not a great idea to parse HTML using regexes.
  // If you are looking for a plugin that does this in a better way check out
  // https://npm.im/tall-plugin-meta-refresh
  const metaHttpEquivUrl = html.match(
    /meta +http-equiv="refresh" +content="\d;url=(http[^"]+)"/
  )?.[1]

  if (metaHttpEquivUrl) {
    return new Follow(new URL(metaHttpEquivUrl))
  }

  return previous
}

Then, this is how you would use your shiny new plugin:

import { tall, locationHeaderPlugin } from 'tall'
import { metaRefreshPlugin } from './metarefresh-plugin'

const finalUrl = await tall('https://loige.link/senior', {
  plugins: [locationHeaderPlugin, metaRefreshPlugin]
})

console.log(finalUrl)

Note that we have to explicitly pass the locationHeaderPlugin if we want to retain tall original behaviour.

Contributing

Everyone is very welcome to contribute to this project. You can contribute just by submitting bugs or suggesting improvements by opening an issue on GitHub.

Note: Since Tall v6, the project structure is a monorepo, so you'll need to use a recent version of npm that supports workspaces (e.g. npm 8.5+)

License

Licensed under MIT License. Β© Luciano Mammino.

tall's People

Contributors

aheissenberger avatar cawa-93 avatar fiveboroughs avatar gusruss89 avatar karlhorky avatar leftshift avatar lmammino avatar polilluminato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tall's Issues

Not throwing exception on "Error: getaddrinfo ENOTFOUND"

When using tall() inside a try/catch block, the following kind of error isn't thrown and catchable, but instead, the node process exits with:

Error: getaddrinfo ENOTFOUND XXXXXX XXXXXX
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:57:26)

TypeError: The "listener" argument must be of type Function. Received type object

I can't seem to get this to work. I have installed the package, and I am running nextjs.

Here is my code:

import { tall } from 'tall';
var airbnblink = 'https://abnb.me/YEO24YyMisb';
tall(airbnblink).then(function(response) {
console.log(response);
})
.catch(function(err) {
console.log(err);
})

But I can't get this to run - console keeps saying:
TypeError: The "listener" argument must be of type Function. Received type object

Any ideas why?

Support following <meta http-equiv="refresh" /> ?

Hi @lmammino πŸ‘‹

First of all, thanks so much for creating and maintaining this module, super simple and works!

What are your thoughts on supporting unshortening the <meta http-equiv="refresh" /> redirects that pages can return in the response body?

For example, the html below would instantly redirect to https://example.com/abc/:

<html>
  <head>
    <meta http-equiv="refresh" content="0;url=https://example.com/abc/">
  </head>
  <body></body>
</html>

Youtube links not resolved in v3.0.0

Hey there,

I tried upgrading from v2 to v3 and youtube links are not resolved anymore. You may try this example I took from the README:

import { tall } from 'tall';

async function someFunction() {
  try {
    const LINK = 'https://youtu.be/TCB_RSlgTqY';
    const unshortenedUrl = await tall(LINK, {
      maxRedirect: 10
    });
    console.log('Tall url', unshortenedUrl);
  } catch (error) {
    console.error('AAAW πŸ‘»', error);
  }
}

someFunction();

Am I doing something wrong?

Thanks in advance,

Improve compatibility with more default headers?

Hey @lmammino πŸ‘‹ hope things are going well!

In our experience with using tall against a large number of domains, some domains will refuse to send back a response without a certain set of headers.

Some of these headers could be set automatically by tall as default options to improve out-of-the-box compatibility (there are libraries to keep up to date with latest Accepts and User-Agents values):

const accepts = [
  // Chrome 112
  'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
  // Firefox 113
  'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
];

const userAgents = [
  // Chrome 112
  'Mozilla/5.0 (Linux; Android 13; Pixel 7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Mobile Safari/537.36',
  // Firefox 113
  'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/113.0'
]

tall(url, {
  headers: {
    'Accept-Encoding': 'gzip, deflate, br',
    Connection: 'keep-alive',
    Accept: accepts[Math.floor(Math.random() * (accepts.length - 1))],
    'User-Agent':
      userAgents[Math.floor(Math.random() * (userAgents.length - 1))],
  },
});

What do you think about setting these as default headers options?

Tiktok links not resolving?

Hey, just using this package to resolve TikTok links and it appears that tall is not working for those links

Array of urls not able to resolved

Hi,

As investigating while trying to get this work with array of urls => does not work?
If there is a away to do it, write it down?

Simple as but not able to get it work.

Certificate verification error

Hi @lmammino ! πŸ‘‹

It seems like running tall on the https://eyeondesign.aiga.org/the-era-of-nonchalant-web-design-is-here/ website on a deployed Ubuntu environment fails with a certificate validation error.

const {tall} = require('tall')
tall('https://eyeondesign.aiga.org/the-era-of-nonchalant-web-design-is-here/')

> Uncaught Error: unable to verify the first certificate
    at TLSSocket.onConnectSecure (node:_tls_wrap:1530:34)
    at TLSSocket.emit (node:events:390:28)
    at TLSSocket.emit (node:domain:537:15)
    at TLSSocket._finishInit (node:_tls_wrap:944:8)
    at TLSWrap.ssl.onhandshakedone (node:_tls_wrap:725:12)
    at TLSWrap.callbackTrampoline (node:internal/async_hooks:130:17) {
  code: 'UNABLE_TO_VERIFY_LEAF_SIGNATURE'

The certificate for the website is valid in the browser:

Screen Shot 2022-05-30 at 12 32 52

Following the advice from Stack Overflow I tried installing [email protected] to install the root CAs, but this did not change anything for tall... πŸ€”

Similar to #5

Handling ENOTFOUND

Hello,

I know this was supposedly handled in #12 but I have2.2.0 installed (the one with this PR) and it seems to still fail when fed a shortened URL pointing to an invalid URL. I'd like to know if it's me handling this poorly somehow or if the bug is still present.

How to reproduce:

  • Get on twitter or bit.ly, and ask to shorten to an invalid URL like z.chat
  • Feed the shortened url to tall: tall("https://t.co/FqLTQ7TRYS")
  • The promise doesn't reject anything, instead the node process stops with error:
Error: getaddrinfo ENOTFOUND z.chat z.chat:80
    at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:58:26)
Emitted 'error' event at:
    at Socket.socketErrorListener (_http_client.js:397:9)
    at Socket.emit (events.js:193:13)
    at emitErrorNT (internal/streams/destroy.js:91:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
    at processTicksAndRejections (internal/process/task_queues.js:81:17)

Am I missing something, am I handling the promise rejection wrong?

  const promises = urls.map((shortUrl, idx) =>
    tall(shortUrl).then(longUrl => ({
      shortUrl,
      longUrl,
      idx
    }))
  );

  Promise.all(promises)
    .then(results => {
        // Handle success
    })
    .catch(e => {
      // THIS NEVER GETS CALLED
      log("The elusive buggerino!");
      log(`Total message: ${text}`);
      log(`URL we tried shortening: ${shortUrl}`);
      log(e);
    });

Thanks for your time

TypeError: tall is not a function

When using tall, I get the error mentioned in the title. I'm using Firebase to run my code.

Here's my package.json:

{
  "name": "functions",
  "description": "Cloud Functions for Firebase",
  "scripts": {
    "lint": "eslint .",
    "serve": "firebase emulators:start --only functions",
    "shell": "firebase functions:shell",
    "start": "npm run shell",
    "deploy": "firebase deploy --only functions",
    "logs": "firebase functions:log"
  },
  "engines": {
    "node": "14"
  },
  "main": "index.js",
  "dependencies": {
    "firebase-admin": "^9.2.0",
    "firebase-functions": "^3.11.0",
    "needle": "^2.6.0",
    "tall": "^4.0.1",
    "twilio": "^3.62.0"
  },
  "devDependencies": {
    "eslint": "^7.6.0",
    "eslint-config-google": "^0.14.0",
    "firebase-functions-test": "^0.2.0"
  },
  "private": true
}

Here's my code:

const tall = require("tall");

...

async function unshortenLink(link) {
  var fullLink = null;
  try {
    fullLink = await tall(link);
  } catch (err) {
    console.log("Error: tall did not unshorten link. Link: ", link);
    console.log("Error: tall did not unshorten link: ", err);
  }
  return fullLink;
}

Any help would be appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.