Code Monkey home page Code Monkey logo

veliovgroup / spiderable-middleware Goto Github PK

View Code? Open in Web Editor NEW
34.0 3.0 4.0 432 KB

๐Ÿค– Prerendering for JavaScript powered websites. Great solution for PWAs (Progressive Web Apps), SPAs (Single Page Applications), and other websites based on top of front-end JavaScript frameworks

Home Page: https://www.npmjs.com/package/spiderable-middleware

License: BSD 3-Clause "New" or "Revised" License

JavaScript 100.00%
meteor middleware crawler seo seo-optimization meteor-package npm-package spiderable nodejs npm

spiderable-middleware's Introduction

Spiderable middleware

Google, Facebook, Twitter, Yahoo, and Bing and all other crawlers and search engines are constantly trying to view your website. If your website is built on top of the JavaScript framework like, but not limited to - Angular, Backbone, Ember, Meteor, React, MEAN most of the front-end solutions returns basic HTML-markup and script-tags to crawlers, but not content of your page. The mission of spiderable-middleware and ostr.io are to boost your SEO experience without a headache.

Why Pre-render?

  • ๐Ÿ•ธ Execute JavaScript, as web-crawlers and search engines can't run JS code;
  • ๐Ÿƒโ€โ™‚๏ธ Boost response rate and decrease response time with caching;
  • ๐Ÿš€ Optimized HTML markup for best SEO score;
  • ๐Ÿ–ฅ Support for PWAs and SPAs;
  • ๐Ÿ“ฑ Support for mobile-like crawlers;
  • ๐Ÿ’… Support styled-components;
  • โšก๏ธ Support AMP (Accelerated Mobile Pages);
  • ๐Ÿค“ Works with Content-Security-Policy and other "complicated" front-end security;
  • โค๏ธ Search engines and social network crawlers love straightforward and pre-rendered pages;
  • ๐Ÿ“ฑ Consistent link previews in messaging apps, like iMessage, Messages, Facebook, Slack, Telegram, WhatsApp, Viber, VK, Twitter, etc.;
  • ๐Ÿ’ป Image, title, and description previews for posted links at social networks, like Facebook, Twitter, VK and others.

ToC

About Package

This package acts as middleware and intercepts requests to your Node.js application from web crawlers. All requests proxy passed to the Prerendering Service, which returns static, rendered HTML.

This is SERVER only package. For NPM make sure you're importing library only in Node.js. For Meteor.js please make sure library imported and executed only on SERVER.

We made this package with developers in mind. It's well written in a very simple way, hackable, and easily tunable to meet your needs, can be used to turn dynamic pages into rendered, cached, and lightweight static pages, just set botsUA to ['.*']. This is the best option to offload servers unless a website gets updated more than once in 4 hours.

  • Note: This package proxies real HTTP headers and response code, to reduce overwhelming requests, try to avoid HTTP-redirect headers, like Location and others. Read how to return genuine status code and handle JS-redirects.
  • Note: This is server only package. For isomorphic environments, like Meteor.js, this package should be imported/initialized only at server code base.

This middleware was tested and works like a charm with:

All other frameworks which follow Node's middleware convention - will work too.

This package was originally developed for ostr.io service. But it's not limited to, and can proxy-pass requests to any other rendering-endpoint.

Installation

Install spiderable-middleware package from NPM for Node.js usage, alternatively see Meteor.js specific setup documentation

npm install spiderable-middleware --save

Usage

Get ready in a few lines of code

Basic usage

See all examples.

First, add fragment meta-tag to your HTML template:

<html>
  <head>
    <meta name="fragment" content="!">
    <!-- ... -->
  </head>
  <body>
    <!-- ... -->
  </body>
</html>
// Make sure this code isn't exported to the Browser bundle
// and executed only on SERVER (Node.js)
const express = require('express');
const Spiderable = require('spiderable-middleware');

const spiderable = new Spiderable({
  rootURL: 'http://example.com',
  auth: 'APIUser:APIPass'
});

const app = express();
app.use(spiderable.handler).get('/', (req, res) => {
  res.send('Hello World');
});

app.listen(3000);

We provide various options for serviceURL as "Rendering Endpoints", each endpoint has its own features to fit every project needs.

Return genuine status code

To pass expected response code from front-end JavaScript framework to browser/crawlers, you need to create specially formatted HTML-comment. This comment can be placed in any part of HTML-page. head or body tag is the best place for it.

Format

html:

<!-- response:status-code=404 -->

jade:

// response:status-code=404

This package support any standard and custom status codes:

  • 201 - <!-- response:status-code=201 -->
  • 401 - <!-- response:status-code=401 -->
  • 403 - <!-- response:status-code=403 -->
  • 500 - <!-- response:status-code=500 -->
  • 514 - <!-- response:status-code=514 --> (non-standard)

Speed-up rendering

To speed-up rendering, you should tell to the Spiderable engine when your page is ready. Set window.IS_RENDERED to false, and once your page is ready set this variable to true. Example:

<html>
  <head>
    <meta name="fragment" content="!">
    <script>
      window.IS_RENDERED = false;
    </script>
  </head>
  <body>
    <!-- ... -->
    <script type="text/javascript">
      //Somewhere deep in your app-code:
      window.IS_RENDERED = true;
    </script>
  </body>
</html>

Detect request from Pre-rendering engine during runtime

Pre-rendering engine will set window.IS_PRERENDERING global variable to true. Detecting requests from pre-rendering engine are as easy as:

if (window.IS_PRERENDERING) {
  // This request is coming from Pre-rendering engine
}

Note: window.IS_PRERENDERING can be undefined on initial page load, and may change during runtime. That's why we recommend to pre-define a setter for IS_PRERENDERING:

let isPrerendering = false;
Object.defineProperty(window, 'IS_PRERENDERING', {
  set(val) {
    isPrerendering = val;
    if (isPrerendering === true) {
      // This request is coming from Pre-rendering engine
    }
  },
  get() {
    return isPrerendering;
  }
});

JavaScript redirects

If you need to redirect browser/crawler inside your application, while a page is loading (imitate navigation), you're free to use any of classic JS-redirects as well as your framework's navigation, or History.pushState()

window.location.href = 'http://example.com/another/page';
window.location.replace('http://example.com/another/page');

Router.go('/another/page'); // framework's navigation !pseudo code

Note: Only 4 redirects are allowed during one request after 4 redirects session will be terminated.

API

Constructor new Spiderable([opts])

  • opts {Object} - Configuration options
  • opts.serviceURL {String} - Valid URL to Spiderable endpoint (local or foreign). Default: https://render.ostr.io. Can be set via environment variables: SPIDERABLE_SERVICE_URL or PRERENDER_SERVICE_URL
  • opts.rootURL {String} - Valid root URL of your website. Can be set via an environment variable: ROOT_URL
  • opts.auth {String} - [Optional] Auth string in next format: user:pass. Can be set via an environment variables: SPIDERABLE_SERVICE_AUTH or PRERENDER_SERVICE_AUTH. Default null
  • opts.sanitizeUrls {Boolean} - [Optional] Sanitize URLs in order to "fix" badly composed URLs. Default false
  • opts.botsUA {[String]} - [Optional] An array of strings (case insensitive) with additional User-Agent names of crawlers you would like to intercept. See default bot's names. Set to ['.*'] to match all browsers and robots, to serve static pages to all users/visitors
  • opts.ignoredHeaders {[String]} - [Optional] An array of strings (case insensitive) with HTTP header names to exclude from response. See default list of ignored headers. Set to ['.*'] to ignore all headers
  • opts.ignore {[String]} - [Optional] An array of strings (case sensitive) with ignored routes. Note: it's based on first match, so route /users will cause ignoring of /part/users/part, /users/_id and /list/of/users, but not /user/_id or /list/of/blocked-users. Default null
  • opts.only {[String|RegExp]} - [Optional] An array of strings (case sensitive) or regular expressions (could be mixed). Define exclusive route rules for pre-rendering. Could be used with opts.onlyRE rules. Note: To define "safe" rules as {RegExp} it should start with ^ and end with $ symbols, examples: [/^\/articles\/?$/, /^\/article\/[A-z0-9]{16}\/?$/]
  • opts.onlyRE {RegExp} - [Optional] Regular Expression with exclusive route rules for pre-rendering. Could be used with opts.only rules
  • opts.timeout {Number} - [Optional] Number, proxy-request timeout to rendering endpoint in milliseconds. Default: 180000
  • opts.requestOptions {Object} - [Optional] Options for request module (like: timeout, lookup, insecureHTTPParser), for all available options see http API docs
  • opts.debug {Boolean} - [Optional] Enable debug and extra logging, default: false

Note: Setting .onlyRE and/or .only rules are highly recommended. Otherwise, all routes, including randomly generated by bots will be subject of Pre-rendering and may cause unexpectedly higher usage.

// CommonJS
// const Spiderable = require('spiderable-middleware');

// ES6 import
// import Spiderable from 'spiderable-middleware';

const spiderable = new Spiderable({
  rootURL: 'http://example.com',
  auth: 'APIUser:APIPass'
});

// More complex setup (recommended):
const spiderable = new Spiderable({
  rootURL: 'http://example.com',
  serviceURL: 'https://render.ostr.io',
  auth: 'APIUser:APIPass',
  only: [
    /\/?/, // Root of the website
    /^\/posts\/?$/, // "/posts" path with(out) trailing slash
    /^\/post\/[A-z0-9]{16}\/?$/ // "/post/:id" path with(out) trailing slash
  ],
  // [Optionally] force ignore for secret paths:
  ignore: [
    '/account/', // Ignore all routes under "/account/*" path
    '/billing/' // Ignore all routes under "/billing/*" path
  ]
});

Configuration via env.vars

Same configuration can get achieved via setting up environment variables:

ROOT_URL='http://example.com'
SPIDERABLE_SERVICE_URL='https://render.ostr.io'
SPIDERABLE_SERVICE_AUTH='APIUser:APIPass'

alternatively if you're migrating from other pre-rendering service you can keep using existing variables, we support it for compatibility

ROOT_URL='http://example.com'
PRERENDER_SERVICE_URL='https://render.ostr.io'
PRERENDER_SERVICE_AUTH='APIUser:APIPass'

spiderable.handler(req, res, next)

Middleware handler. Alias: spiderable.handle.

// Express, Connect:
app.use(spiderable.handler);

// Meteor:
WebApp.connectHandlers.use(spiderable);

//HTTP(s) Server
http.createServer((req, res) => {
  spiderable.handler(req, res, () => {
    // Callback, triggered if this request
    // is not a subject of spiderable pre-rendering
    res.writeHead(200, {'Content-Type': 'text/plain; charset=UTF-8'});
    res.end('Hello vanilla NodeJS!');
    // Or do something else ...
  });
}).listen(3000);

AMP Support

To properly serve pages for Accelerated Mobile Pages (AMP) we support following URI schemes:

# Regular URIs:
https://example.com/index.html
https://example.com/articles/article-title.html
https://example.com/articles/article-uniq-id/article-slug

# AMP optimized URIs (prefix):
https://example.com/amp/index.html
https://example.com/amp/articles/article-title.html
https://example.com/amp/articles/article-uniq-id/article-slug

# AMP optimized URIs (extension):
https://example.com/amp/index.amp.html
https://example.com/amp/articles/article-title.amp.html

All URLs with .amp. extension and /amp/ prefix will be optimized for AMP.

Rendering Endpoints

  • render (default) - https://render.ostr.io - This endpoint has "optimal" settings, and should fit 98% cases. Respects cache headers of both Crawler and your server;
  • render-bypass (devel/debug) - https://render-bypass.ostr.io - This endpoint has bypass caching mechanism (almost no cache). Use it if you're experiencing an issue, or during development, to make sure you're not running into the intermediate cache. You're safe to use this endpoint in production, but it may result in higher usage and response time.
  • render-cache (under attack) - https://render-cache.ostr.io - This endpoint has the most aggressive caching mechanism. Use it if you're looking for the shortest response time, and don't really care about outdated pages in cache for 6-12 hours

To change default endpoint, grab integration examples code and replace render.ostr.io, with endpoint of your choice. For NPM/Meteor integration change value of serviceURL option.

Note: Described differences in caching behavior related to intermediate proxy caching, Cache-Control header will be always set to the value defined in "Cache TTL". Cached results at the "Pre-rendering Engine" end can be purged at any time.

Debugging

To make sure a server can reach our rendering endpoint run cURL command or send request via Node.js to (replace example.com with your domain name) https://test:[email protected]/?url=http://example.com.

In this example we're using render-bypass.ostr.io endpoint to avoid any possible cached results, read more about rendering endpoints. As API credentials we're using test:test, this part of URL can be replaced with auth option from Node.js example. Your API credentials and instructions can be found at the very bottom of Pre-rendering Panel of a host, click on "Integration Guide".

# cURL example:
curl -v "https://test:[email protected]/?url=http://example.com"
// Node.js example:
const https = require('https');

https.get('https://test:[email protected]/?url=http://example.com', (resp) => {
  let data = '';

  resp.on('data', (chunk) => {
    data += chunk.toString('utf8');
  });

  resp.on('end', () => {
    console.log(data);
  });
}).on('error', (error) => {
  console.error(error);
});

Running Tests

  1. Clone this package
  2. In Terminal (Console) go to directory where package was cloned
  3. Then run:

Node.js/Mocha

# Install development NPM dependencies:
npm install --save-dev
# Install NPM dependencies:
npm install --save
# Run tests:
ROOT_URL=http://127.0.0.1:3003 npm test

# Run same tests with extra-logging
DEBUG=true ROOT_URL=http://127.0.0.1:3003 npm test
# http://127.0.0.1:3003 can be changed to any local address, PORT is required!

spiderable-middleware's People

Contributors

dependabot[bot] avatar dr-dimitru avatar gustawdaniel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

spiderable-middleware's Issues

TypeError: app.use() requires a middleware function

I added

import Spiderable from 'meteor/ostrio:spiderable-middleware';

WebApp.connectHandlers.use(
	new Spiderable({
		rootURL: 'https://a8a2-176-74-87-61.ngrok-free.app',
		serviceURL: 'https://render.ostr.io',
		auth: 'APIUser:APIPass'
	})
);

and see this error:

W20240310-15:29:13.171(4)? (STDERR) TypeError: app.use() requires a middleware function
W20240310-15:29:13.171(4)? (STDERR)     at Function.use (/home/daniel/.meteor/packages/webapp/.2.0.0-beta300.0.fqumlc.j85e5++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/express/lib/application.js:217:11)
W20240310-15:29:13.171(4)? (STDERR)     at module.wrapAsync.self (server/ssr.tsx:40:24)
W20240310-15:29:13.171(4)? (STDERR)     at Module.wrapAsync (/home/daniel/.meteor/packages/modules/.0.19.1-beta300.0.28ut9u.9egzk++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/@meteorjs/reify/lib/runtime/index.js:251:8)
W20240310-15:29:13.171(4)? (STDERR)     at module (/home/daniel/pro/mrr/.meteor/local/build/programs/server/app/app.js:53964:9)
W20240310-15:29:13.171(4)? (STDERR)     at fileEvaluate (packages/modules-runtime.js:335:7)
W20240310-15:29:13.171(4)? (STDERR)     at Module.require (packages/modules-runtime.js:237:14)
W20240310-15:29:13.171(4)? (STDERR)     at Module.mod.require (/home/daniel/.meteor/packages/modules/.0.19.1-beta300.0.28ut9u.9egzk++os+web.browser+web.browser.legacy+web.cordova/npm/node_modules/@meteorjs/reify/lib/runtime/index.js:30:33
)                                                          
W20240310-15:29:13.171(4)? (STDERR)     at Object.require (packages/modules-runtime.js:257:21)
W20240310-15:29:13.171(4)? (STDERR)     at evaluateNextModule (packages/core-runtime.js:183:26)
W20240310-15:29:13.171(4)? (STDERR)     at evaluateNextModule (packages/core-runtime.js:218:7)
W20240310-15:29:13.171(4)? (STDERR)     at evaluateNextModule (packages/core-runtime.js:218:7)
W20240310-15:29:13.171(4)? (STDERR)     at evaluateNextModule (packages/core-runtime.js:218:7)
W20240310-15:29:13.171(4)? (STDERR)     at evaluateNextModule (packages/core-runtime.js:218:7)
W20240310-15:29:13.171(4)? (STDERR)     at evaluateNextModule (packages/core-runtime.js:218:7)
W20240310-15:29:13.171(4)? (STDERR)     at evaluateNextModule (packages/core-runtime.js:218:7)
W20240310-15:29:13.171(4)? (STDERR)     at evaluateNextModule (packages/core-runtime.js:218:7)
=> Exited with code: 1                                     

I am using Meteor 3

NextJS Guide?

I'd love to be the one to add it but I'm not so familiar with how this should be integrated.

I've integrated prerender in the past and their NextJS middleware is pretty straight forward, not sure if just doing a copy pasta for spiderable would work?

For reference, here are the prerender docs for Next:

Ideally, we edit Next middleware to check if the request is a bot, and if so we send it to Ostr for prerendering.

This is a touchy topic (regarding SEO and not screwing up page/domain scores), so that why I'm reticent to do it without the proper experience of how spiderable works.

This would potentially unlock a lot of new users, right now NextJS users have only one easy/default option.

Invalid argument on Meteor 2.5-beta

I'm getting the following error on Galaxy when running on Meteor 2.5 beta, which crashes the container:

t3y2m
2021-10-07 08:05:25+02:00TypeError [ERR_INVALID_ARG_TYPE] [ERR_INVALID_ARG_TYPE]: The first argument must be of type string or an instance of Buffer, ArrayBuffer, or Array or an Array-like Object. Received function onStreamEnd
t3y2m
2021-10-07 08:05:25+02:00 at Function.from (buffer.js:330:9)
t3y2m
2021-10-07 08:05:25+02:00 at toBuffer (/app/bundle/programs/server/npm/node_modules/meteor/webapp/node_modules/compression/index.js:286:14)
t3y2m
2021-10-07 08:05:25+02:00 at ServerResponse.end (/app/bundle/programs/server/npm/node_modules/meteor/webapp/node_modules/compression/index.js:115:22)
t3y2m
2021-10-07 08:05:25+02:00 at Curl.<anonymous> (/app/bundle/programs/server/npm/node_modules/meteor/ostrio_spiderable-middleware/node_modules/request-libcurl/index.js:222:24)
t3y2m
2021-10-07 08:05:25+02:00 at Curl.emit (events.js:400:28)
t3y2m
2021-10-07 08:05:25+02:00 at Curl.emit (domain.js:470:12)
t3y2m
2021-10-07 08:05:25+02:00 at /app/bundle/programs/server/npm/node_modules/meteor/ostrio_spiderable-middleware/node_modules/node-libcurl/dist/Curl.js:198:22
t3y2m
2021-10-07 08:05:25+02:00 at wrapper (/app/bundle/programs/server/npm/node_modules/meteor/ostrio_spiderable-middleware/node_modules/node-libcurl/dist/Curl.js:188:23)
t3y2m
2021-10-07 08:05:25+02:00 at Curl.onEnd (/app/bundle/programs/server/npm/node_modules/meteor/ostrio_spiderable-middleware/node_modules/node-libcurl/dist/Curl.js:189:9)
t3y2m
2021-10-07 08:05:25+02:00 at Easy.<anonymous> (/app/bundle/programs/server/npm/node_modules/meteor/ostrio_spiderable-middleware/node_modules/node-libcurl/dist/Curl.js:47:22)

After removing the package things go back to normal.

Crashing My App

code

import Spiderable from "meteor/ostrio:spiderable-middleware";

WebApp.connectHandlers.use(
  new Spiderable({ // this line is where error occurs line 4
    rootURL: "https://site.org",
    serviceURL: "https://render.ostr.io",
    auth: "APIUser:APIPass"
  })
);
TypeError: object is not a function
  W20170927-14:25:38.965(1)? (STDERR)     at meteorInstall.server.Spiderable.js (server/Spiderable.js:4:3)
  W20170927-14:25:38.967(1)? (STDERR)     at fileEvaluate (packages/modules-
  runtime/.npm/package/node_modules/install/install.js:153:1)
  W20170927-14:25:38.968(1)? (STDERR)     at require (packages/modules-
  runtime/.npm/package/node_modules/install/install.js:82:1)
  W20170927-14:25:38.969(1)? (STDERR)     at 
  C:\projects\app2\.meteor\local\build\programs\server\app\app.js:2933:1
  W20170927-14:25:38.970(1)? (STDERR)     at 
  C:\projects\app\.meteor\local\build\programs\server\boot.js:297:10
  W20170927-14:25:38.971(1)? (STDERR)     at Array.forEach (native)
  W20170927-14:25:38.972(1)? (STDERR)     at Function._.each._.forEach (C:\Users\DESKTOP-
  PU4PNV5\AppData\Local\.meteor\packages\meteor-tool\1.3.5_1\mt-
  os.windows.x86_32\dev_bundle\server-lib\node_modules\underscore\underscore.js:79:11)
  W20170927-14:25:38.975(1)? (STDERR)     at 
  C:\projects\app\.meteor\local\build\programs\server\boot.js:133:5

auth: 'APIUser:APIPass'

Where do I get this auth: 'APIUser:APIPass' ?

Is it meant to be on server (I implemented on server)

Check if prerendering

(Using Meteor) How can I check whether the code is currently being prerendered or not? Something like location.href.indexOf('?_escaped_fragment_=')? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.