microlinkhq / browserless Goto Github PK

View Code? Open in Web Editor NEW

1.5K 10.0 76.0 23.32 MB

The headless Chrome/Chromium driver on top of Puppeteer.

Home Page: https://browserless.js.org

License: MIT License

JavaScript 96.67% HTML 3.33%

puppeteer puppeteer-core headless-chrome headless-chromium

browserless's Introduction

The headless Chrome/Chromium driver on top of Puppeteer.

Highlights

Compatible with Puppeteer API (text, screenshot, html, pdf).
Built-in adblocker for canceling unnecessary requests.
Shell interaction via Browserless CLI.
Easy Google Lighthouse integration.
Automatic retry & error handling.
Sensible good defaults.

Installation

You can install it via npm:

npm install browserless puppeteer --save

Browserless runs on top of Puppeteer, so you need that installed to get started.

You can choose between puppeteer, puppeteer-core, and puppeteer-firefox depending on your use case.

Usage

Here is a complete example showcasing some of Browserless capabilities:

const createBrowser = require('browserless')
const termImg = require('term-img')

// First, create a browserless factory
// This is similar to opening a browser for the first time
const browser = createBrowser()

// Browser contexts are like browser tabs
// You can create as many as your resources can support
// Cookies/caches are limited to their respective browser contexts, just like browser tabs
const browserless = await browser.createContext()

// Perform your required browser actions.
// e.g., taking screenshots or fetching HTML markup
const buffer = await browserless.screenshot('http://example.com', {
  device: 'iPhone 6'
})

console.log(termImg(buffer))

// After your task is done, destroy your browser context
await browserless.destroyContext()

// At the end, gracefully shutdown the browser process
await browser.close()

As you can see, Browserless is implemented using a single browser process which allows you to create and destroy several browser contexts all within that process.

If you're already using Puppeteer in your project, you can layer Browserless on top of that by simply installing it.

You can also pull in additional Browserless packages for your specific needs, all of which work well with Puppeteer.

CLI

Using the Browserless command-line tool, you can interact with Browserless through a terminal window, or use it as part of an automated process:

cli.webm

Start by installing @browserless/cli globally on your system using your favorite package manager:

npm install -g @browserless/cli

Then run browserless in your terminal to see the list of available commands.

Initializing a browser

Initializing Browserless creates a headless browser instance.

const createBrowser = require('browserless')

const browser = createBrowser({
  timeout: 25000,
  lossyDeviceName: true,
  ignoreHTTPSErrors: true
})

This instance provides several high-level methods.

For example:

// Call `createContext` to create a browser tab
const browserless = await browser.createContext({ retry: 2 })

const buffer = await browserless.screenshot('https://example.com')

// Call `destroyContext` to close the browser tab.
await browserless.destroyContext()

The browser keeps running until you explicitly close it:

// At the end, gracefully shutdown the browser process
await browser.close()

.constructor(options)

The createBrowser method supports puppeteer.launch#options.

Browserless provides additional options you can use when creating a browser instance:

defaultDevice

This will set your browser viewport to that of the specified device:

type: string
default: 'Macbook Pro 13'

lossyDeviceName

type: boolean
default: false

This allows for a margin of error when setting the device name.

// Initialize browser instance
const browser = require('browserless')({ lossyDeviceName: true });

(async () => {
    // Create context/tab
    const tabInstance = await browser.createContext();

    // The device property is consistently set to that of a MacBook Pro even when misspelt
    console.log(tabInstance.getDevice({ device: 'MacBook Pro' }))
    console.log(tabInstance.getDevice({ device: 'macbook pro 13' }))
    console.log(tabInstance.getDevice({ device: 'MACBOOK PRO 13' }))
    console.log(tabInstance.getDevice({ device: 'macbook pro' }))
    console.log(tabInstance.getDevice({ device: 'macboo pro' }))
})()

The provided name will be resolved to closest matching device.

This comes in handy in situations where the device name is set by a third-party.

mode

type: string
default: launch
values: 'launch' | 'connect'

This specifies if the browser instance should be spawned using puppeteer.launch or puppeteer.connect.

timeout

type: number
default: 30000

This setting will change the default maximum navigation time.

puppeteer

type: Puppeteer
default: puppeteer|puppeteer-core|puppeteer-firefox

By default, it automatically detects which libary is installed (thus either puppeteer, puppeteer-core or puppeteer-firefox) based on your installed dependecies.

.createContext(options)

After initializing the browser, you can create a browser context which is equivalent to opening a tab:

const browserless = browser.createContext({
  retry: 2
})

Each browser context is isolated, thus cookies/cache stay within its corresponding browser contexts just like with browser tabs. Each context can also have different options during its creation.

options

All of Puppeteer's browser.createBrowserContext#options are supported.

Browserless provides additional browser context options:

retry

type: number
default: 2

The number of retries that can be performed before considering a navigation as failed.

.browser()

It returns the internal Browser instance.

const headlessBrowser = await browser.browser()

console.log('My headless browser PID is', headlessBrowser.process().pid)

.respawn()

It will respawn the internal browser.

const getPID = promise => (await promise).process().pid

console.log('Process PID:', await getPID(browser.browser()))

await browser.respawn()

console.log('Process PID:', await getPID(browser.browser()))

This method is an implementation detail, normally you don't need to call it.

.close()

Used to close the internal browser.

const { onExit } = require('signal-exit')
// automatically teardown resources after
// `process.exit` is called
onExit(browser.close)

Built-in

.html(url, options)

Used to serialize the content of a target url into HTML.

const html = await browserless.html('https://example.com')

console.log(html)
// => "<!DOCTYPE html><html><head>…"

options

Check out browserless.goto to see the full list of supported values and options.

.text(url, options)

Used to serialize the content from the target url into plain text.

const text = await browserless.text('https://example.com')

console.log(text)
// => "Example Domain\nThis domain is for use in illustrative…"

options

See browserless.goto to know all the options and values supported.

.pdf(url, options)

It generates the PDF version of a website behind a url.

const buffer = await browserless.pdf('https://example.com')

console.log(`PDF generated in ${buffer.byteLength()} bytes`)

options

This method uses the following options by default:

{
  margin: '0.35cm',
  printBackground: true,
  scale: 0.65
}

Check out browserless.goto to see the full list of supported values and options.

Also, all of Puppeteer's page.pdf options are supported.

Additionally, you can setup:

margin

type: string | string[]
default: '0.35cm'

Used to set screen margins. Supported units include:

px for pixel.
in for inches.
cm for centimeters.
mm for millimeters.

You can set the margin properties by passing them in as an object:

const buffer = await browserless.pdf(url.toString(), {
  margin: {
    top: '0.35cm',
    bottom: '0.35cm',
    left: '0.35cm',
    right: '0.35cm'
  }
})

In case a single margin value is provided, this will be used for all sides:

const buffer = await browserless.pdf(url.toString(), {
  margin: '0.35cm'
})

.screenshot(url, options)

Used to generate screenshots based on a specified url.

const buffer = await browserless.screenshot('https://example.com')

console.log(`Screenshot taken in ${buffer.byteLength()} bytes`)

options

This method uses the following options by default:

{
  device: 'macbook pro 13'
}

Check out browserless.goto to see the full list of supported values and options.

Also, all of Puppeteer's page.screenshot options are supported.

Additionally, Browserless provides the following options:

codeScheme

type: string
default: 'atom-dark'

Whenever the incoming response 'Content-Type' is set to 'json', The JSON payload will be presented as a formatted JSON string, beautified using the provided codeScheme theme or by default atom-dark.

The color schemes is based on the Prism library.

The Prism repository offers a wide range of themes to choose from as well as a CDN option.

element

type: string

Returns the first instance of a matching DOM element based on a CSS selector. This operation remains unresolved until the element is displayed on screen or the specified maximum timeout is reached.

overlay

type: object

Once the screenshot has been taken, this option allows you to apply an overlay(backdrop).

You can configure the overlay by specifying the following:

browser: Specifies the color of the browser stencil to use, thus either light or dark for light and dark mode respecitively.
background: Specifies the background to use. A number of value types are supported:
- Hexadecimal/RGB/RGBA color codes, eg. #c1c1c1.
- CSS gradients, eg. linear-gradient(225deg, #FF057C 0%, #8D0B93 50%, #321575 100%)
- Image URLs, eg. https://source.unsplash.com/random/1920x1080.

const buffer = await browserless.screenshot(url.toString(), {
  styles: ['.crisp-client, #cookies-policy { display: none; }'],
  overlay: {
    browser: 'dark',
    background:
      'linear-gradient(45deg, rgba(255,18,223,1) 0%, rgba(69,59,128,1) 66%, rgba(69,59,128,1) 100%)'
  }
})

.destroyContext(options)

Destroys the current browser context.

const browserless = await browser.createContext({ retry: 0 })

const content = await browserless.html('https://example.com')

await browserless.destroyContext()

options

force

type: string
default: 'force'

When force is set, it prevents the recreation of the context in case a browser action is being executed.

.getDevice(options)

Used to set a specific device type, this method sets the device properties.

browserless.getDevice({ device: 'Macbook Pro 15' })

// => {
//   userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safari/537.36',
//   viewport: {
//     width: 1440,
//     height: 900,
//     deviceScaleFactor: 2,
//     isMobile: false,
//     hasTouch: false,
//     isLandscape: false
//   }
// }

This method extends the Puppeteer.KnownDevices list by adding some missing devices.

options

device

type: string

The device descriptor name. It's used to fetch preset values associated with a device.

When lossyDeviceName is enabled, a fuzzy search rather than a strict search will be performed in order to maximize getting a result back.

viewport

type: object

Used to set extra viewport settings. These settings will be merged with the preset settings.

browserless.getDevice({
  device: 'iPad',
  viewport: {
    isLandscape: true
  }
})

headers

type: object

Extra headers that will be merged with the device presets.

browserless.getDevice({
  device: 'iPad',
  headers: {
    'user-agent': 'googlebot'
  }
})

.evaluate(fn, gotoOpts)

It exposes an interface for creating your own evaluate function, passing you the page and response.

The fn will receive page and response as arguments:

const ping = browserless.evaluate((page, response) => ({
  statusCode: response.status(),
  url: response.url(),
  redirectUrls: response.request().redirectChain()
}))

await ping('https://example.com')
// {
//   "statusCode": 200,
//   "url": "https://example.com/",
//   "redirectUrls": []
// }

You don't need to close the page; It will be closed automatically.

Internally, the method performs a browserless.goto, making it possible to pass extra arguments as a second parameter:

const serialize = browserless.evaluate(page => page.evaluate(() => document.body.innerText), {
  waitUntil: 'domcontentloaded'
})

await serialize('https://example.com')
// => '<!DOCTYPE html><html><div>…'

.goto(page, options)

It performs a page.goto with a lot of extra capabilities:

const page = await browserless.page()
const { response, device } = await browserless.goto(page, { url: 'http://example.com' })

options

Any option passed here will bypass to page.goto.

Additionally, you can setup:

abortTypes

type: array
default: []

It sets the ability to abort requests based on the ResourceType.

adblock

type: boolean
default: true

It enabled the built-in adblocker by Cliqz that aborts unnecessary third-party requests associated with ads services.

animations

type: boolean
default: false

Disable CSS animations and transitions, also it sets prefers-reduced-motion consequently.

click

type: string | string[]

Click the DOM element matching the given CSS selector.

colorScheme

type: string
default: 'no-preference'

Sets prefers-color-scheme CSS media feature, used to detect if the user has requested the system use a 'light' or 'dark' color theme.

device

type: string
default: 'macbook pro 13'

It specifies the device descriptor used to retrieve userAgent`` and viewport`.

headers

type: object

An object containing additional HTTP headers to be sent with every request.

const browserless = require('browserless')

const page = await browserless.page()
await browserless.goto(page, {
  url: 'http://example.com',
  headers: {
    'user-agent': 'googlebot',
    cookie: 'foo=bar; hello=world'
  }
})

This sets visibility: hidden on the matched elements.

html

type: string

In case you provide HTML markup, a page.setContent avoiding fetch the content from the target URL.

javascript

type: boolean
default: true

When it's false, it disables JavaScript on the current page.

mediaType

type: string
default: 'screen'

Changes the CSS media type of the page using page.emulateMediaType.

modules

type: string | string[]

Injects <script type="module"> into the browser page.

It can accept:

Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js').
Local file (e.g., `'local-file.js').
Inline code (e.g., "document.body.style.backgroundColor = 'red'").

const buffer = await browserless.screenshot(url.toString(), {
  modules: [
    'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js',
    'local-file.js',
    "document.body.style.backgroundColor = 'red'"
  ]
})

onPageRequest

type:function

Associate a handler for every request in the page.

scripts

type: string | string[]

Injects <script> into the browser page.

It can accept:

Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/@microlink/[email protected]/src/browser.js').
Local file (e.g., `'local-file.js').
Inline code (e.g., "document.body.style.backgroundColor = 'red'").

const buffer = await browserless.screenshot(url.toString(), {
  scripts: [
    'https://cdn.jsdelivr.net/npm/[email protected]/dist/jquery.min.js',
    'local-file.js',
    "document.body.style.backgroundColor = 'red'"
  ]
})

Prefer to use modules whenever possible.

scroll

type: string

Scroll to the DOM element matching the given CSS selector.

styles

type: string | string[]

Injects <style> into the browser page.

It can accept:

Absolute URLs (e.g., 'https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css').
Local file (e.g., `'local-file.css').
Inline code (e.g., "body { background: red; }").

const buffer = await browserless.screenshot(url.toString(), {
  styles: [
    'https://cdn.jsdelivr.net/npm/[email protected]/dist/dark.css',
    'local-file.css',
    'body { background: red; }'
  ]
})

timezone

type: string

It changes the timezone of the page.

url

type: string

The target URL.

viewport

It will setup a custom viewport, using page.setViewport method.

waitForSelector

type:string

Wait a quantity of time, selector or function using page.waitForSelector.

waitForTimeout

type:number

Wait a quantity time in milliseconds.

waitUntil

When to consider navigation successful.

If you provide an array of event strings, navigation is considered to be successful after all events have been fired.

Events can be either:

'auto': A combination of 'load' and 'networkidle2' in a smart way to wait the minimum time necessary.
'load': Consider navigation to be finished when the load event is fired.
'domcontentloaded': Consider navigation to be finished when the DOMContentLoaded event is fired.
'networkidle0': Consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
'networkidle2': Consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.

.context()

It returns the BrowserContext associated with your instance.

const browserContext = await browserless.context()

console.log({ isIncognito: browserContext.isIncognito() })
// => { isIncognito: true }

.withPage(fn, [options])

It returns a higher-order function as convenient way to interact with a page:

const getTitle = browserless.withPage((page, goto) => opts => {
  const result = await goto(page, opts)
  return page.title()
})

The function will be invoked in the following way:

const title = getTitle({ url: 'https://example.com' })

fn

type: function

The function to be executed. It receives page, goto as arguments.

options

timeout

type: number
default: browserless.timeout

This setting will change the default maximum navigation time.

.page()

It returns a standalone Page associated with the current browser context.

const page = await browserless.page()
await page.content()
// => '<html><head></head><body></body></html>'

Extended

function

The @browserless/function package provides an isolated VM scope to run arbitrary JavaScript code with runtime access to a browser page:

const createFunction = require('@browserless/function')

const code = async ({ page }) => page.evaluate('jQuery.fn.jquery')

const version = createFunction(code)

const { isFulfilled, isRejected, value } = await version('https://jquery.com')

// => {
//   isFulfilled: true,
//   isRejected: false,
//   value: '1.13.1'
// }

options

Besides the following properties, any other argument provided will be available during the code execution.

vmOpts

The hosted code is also running inside a secure sandbox created via vm2.

gotoOpts

Any goto#options can be passed for tuning the internal URL resolution.

lighthouse

The @browserless/lighthouse package provides you the setup for running Lighthouse reports backed by browserless.

const createLighthouse = require('@browserless/lighthouse')
const createBrowser = require('browserless')
const { writeFile } = require('fs/promises')
const { onExit } = require('signal-exit')

const browser = createBrowser()
onExit(browser.close)

const lighthouse = createLighthouse(async teardown => {
  const browserless = await browser.createContext()
  teardown(() => browserless.destroyContext())
  return browserless
})

const report = await lighthouse('https://microlink.io')
await writeFile('report.json', JSON.stringify(report, null, 2))

The report will be generated for the provided URL. This extends the lighthouse:default settings. These settings are similar to the Google Chrome Audits reports on Developer Tools.

options

The Lighthouse configuration that will extend 'lighthouse:default' settings:

const report = await lighthouse(url, {
  onlyAudits: ['accessibility']
})

Also, you can extend from a different preset of settings:

const report = await lighthouse(url, {
  preset: 'desktop',
  onlyAudits: ['accessibility']
})

Additionally, you can setup:

The lighthouse execution runs as a worker thread, any worker#options are supported.

logLevel

type: string
default: 'error'
values: 'silent' | 'error' | 'info' | 'verbose'

The level of logging to enable.

output

type: string | string[]
default: 'json'
values: 'json' | 'csv' | 'html'

The type(s) of report output to be produced.

timeout

type: number
default: browserless.timeout

This setting will change the default maximum navigation time.

screencast

The @browserless/screencast package allows you to capture each frame of a browser navigation using puppeteer.

preview.mp4

This API is similar to screenshots, but you have a more granular control over the frame and the output:

const createScreencast = require('@browserless/screencast')
const createBrowser = require('browserless')

const browser = createBrowser()
const browserless = await browser.createContext()
const page = await browserless.page()

const screencast = createScreencast(page, { 
  maxWidth: 1280, 
  maxHeight: 800 
})

const frames = []
screencast.onFrame(data => frames.push(data))

screencast.start()
await browserless.goto(page, { url, waitForTimeout: 300 })
await screencast.stop()

console.log(frames)

Check a full example generating a GIF as output.

page

type: object

The Page object.

options

See Page.startScreencast to know all the options and values supported.

Packages

browserless is internally divided into multiple packages, this way you only use code you need.

Package	Version
browserless
@browserless/benchmark
@browserless/cli
@browserless/devices
@browserless/errors
@browserless/examples
@browserless/function
@browserless/goto
@browserless/lighthouse
@browserless/pdf
@browserless/screencast
@browserless/screenshot

FAQ

Q: Why use browserless over puppeteer?

browserless does not replace puppeteer, it complements it. It's just a syntactic sugar layer over official Headless Chrome oriented for production scenarios.

Q: Why do you block ads scripts by default?

Headless navigation is expensive compared to just fetching the content from a website.

To speed up the process, we block ad scripts by default because most of them are resource-intensive.

Q: My output is different from the expected

Probably browserless was too smart and it blocked a request that you need.

You can active debug mode using DEBUG=browserless environment variable in order to see what is happening behind the code:

Consider opening an issue with the debug trace.

Q: I want to use browserless with my AWS Lambda like project

Yes, check chrome-aws-lambda to setup AWS Lambda with a binary compatible.

License

browserless © Microlink, released under the MIT License.
Authored and maintained by Microlink with help from contributors.

The logo has been designed by xinh studio.

microlink.io · GitHub microlinkhq · Twitter @microlinkhq

browserless's People

Contributors

Stargazers

Watchers

Forkers

nilportugues staabm mdeora subratamal plesiecki asiellb josemf aartiles captainjackrana remusao wuqundong520 lucleray maxcareer awesome-archive dhavalw timkor hoangpq adityawankhede5 vziatkov longjohncoder liufanghua2012 h3xium satish2832 charligoose maivanteo praydaily emperorsreeni dosycorps alexandr-poloz-es zanachka sytamant lgs tiamat-tech mohammad-sarwar pixelastic priest671 hongqi-lgs catpea fishykz icodein admariner forkkit apivision web-sys1 abdihaikal fairhopeweb rtrvrtg sirodiaz guns2410 pkhandke cmd-xyz eradparvar twinkcode sunfshine 123job-group isabella232 jash459 bobindian geekwolverine rsoorajs jeromehan some-say aruz djun kados dosycorps mastermu2022 ukaserge makwana-ashish mohinsandhi thinker007 tripss mettaversesociety prowlee jadenblack eddymens

browserless's Issues

[goto] Add proxy support

Detect username and password from proxy server URI

const args = baseArgs.concat([`--proxy-server=${proxyUrl}`])

Auto authenticate a page based on that credentials

// if you're using an authenticated proxy
await page.authenticate({ username, password })

related

goto: Rename `media` into `mediaType`

Problem with screenshot.

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Subject of the issue

I couldn't try it.

Steps to reproduce

const browserless = require("browserless");

const saveBufferToFile = (buffer, fileName) => {
const wstream = require("fs").createWriteStream(fileName);
wstream.write(buffer);
wstream.end();
};

browserless
.screenshot("https://bot.sannysoft.com", { device: "iPhone 6" })
.then((buffer) => {
const fileName = "screenshot.png";
saveBufferToFile(buffer, fileName);
console.log(your screenshot is here: , fileName);
});

Expected behaviour

Save a screenshot.

Actual behaviour

$ node index.js
/home/juliolima/projects/poc_browserless/index.js:10
.screenshot("https://bot.sannysoft.com", { device: "iPhone 6" })
^

TypeError: browserless.screenshot is not a function
at Object. (/home/juliolima/projects/poc_browserless/index.js:10:4)
at Module._compile (internal/modules/cjs/loader.js:1085:14)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1114:10)
at Module.load (internal/modules/cjs/loader.js:950:32)
at Function.Module._load (internal/modules/cjs/loader.js:790:12)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:76:12)
at internal/main/run_main_module.js:17:47
error Command failed with exit code 1.

[goto] pretty JSON response

Inspired by pretty-json.now.sh.

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on all branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because we are using your CI build statuses to figure out when to notify you about breaking changes.

Since we did not receive a CI status on the greenkeeper/initial branch, we assume that you still need to configure it.

If you have already set up a CI for this repository, you might need to check your configuration. Make sure it will run on all new branches. If you don’t want it to run on every branch, you can whitelist branches starting with greenkeeper/.

We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

Once you have installed CI on this repository, you’ll need to re-trigger Greenkeeper’s initial Pull Request. To do this, please delete the greenkeeper/initial branch in this repository, and then remove and re-add this repository to the Greenkeeper integration’s white list on Github. You'll find this list on your repo or organiszation’s settings page, under Installed GitHub Apps.

browserless not using proxy server

Prerequisites

["browserless": "^8.7.11" ] I'm using the last version.
My node version is the same as declared as package.json.

browserless not using proxy server

despite passing the proxy server details in the args, browserless doesnt launch puppeteer with my proxy server. I have tested the proxy server independently with with puppeteer and puppeteer-extra. so the issue isnt the proxy server

Steps to reproduce

import puppeteer from 'puppeteer-extra'
import StealthPlugin from 'puppeteer-extra-plugin-stealth'
import browserles from 'browserless'

puppeteer.use(StealthPlugin())
const browserless = browserles()

(async function () {
    try {
        let pageHtml = await browserless.html('https://httpbin.org/ip', {
            puppeteer:puppeteer,
            args:[`--proxy-server=${'my-proxy-deatils'}`,'--disable-gpu', '--single-process', '--no-zygote', '--no-sandbox', '--hide-scrollbars'],
            incognito: true
        })
        console.log(pageHtml)
    } catch(e){ 
        throw new Error(e.message)
    }
})()

Expected behaviour

ip address be the IP address of my proxy server

Actual behaviour

shows ip of my machine instead

[devices] adapt puppeter 3.x interface

Puppeteer 2.x uses an array interface:

[
  {
    name: 'Blackberry PlayBook',
    userAgent: 'Mozilla/5.0 (PlayBook; U; RIM Tablet OS 2.1.0; en-US) AppleWebKit/536.2+ (KHTML like Gecko) Version/7.2.1.0 Safari/536.2+',
    viewport: {
      width: 600,
      height: 1024,
      deviceScaleFactor: 1,
      isMobile: true,
      hasTouch: true,
      isLandscape: false
    }
  },

while puppeteer 3.x is using an object map:

'Nexus 6 landscape': {
    name: 'Nexus 6 landscape',
    userAgent: 'Mozilla/5.0 (Linux; Android 7.1.1; Nexus 6 Build/N6F26U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3765.0 Mobile Safari/537.36',
    viewport: {
      width: 732,
      height: 412,
      deviceScaleFactor: 3.5,
      isMobile: true,
      hasTouch: true,
      isLandscape: true
    }
  },

The current implementation is failing cause concat is a method for arrays.

That actually a good thing since right now we are converting array into objects so just need to remove that conversion since it is no longer necessary 🙂

Add ad-block

Need to add the ability to load ABP rules and parse them.

screenshot: Move some API parameters out of screenshot

Move:

hide
click
modules
scripts
styles
scrollTo

From screenshot prepare https://github.com/microlinkhq/browserless/blob/master/packages/screenshot/src/prepare.js

to goto package, then the rest of methods (like pdf) can take these query parameters advantages

Add visual tests

https://github.com/americanexpress/jest-image-snapshot

Improve PDF support

https://github.com/mikeal/snapkit/blob/master/index.js

Implement timeout on goto method

Puppeteer is going to implement a global timeout in the next breaking version:
puppeteer/puppeteer#3158

In the middle time, we need to control timeout with enough granularity for ensuring we don't' waste resources.

The current implementation is leveraging the action out of the library, making impossible close the page under timeout:
https://github.com/Kikobeats/html-get/blob/master/src/index.js#L89

Improve screenshot

Some suggestions:

Ability to inject jQuery or other scripts.
Possibility of hidden/remove elements.
Be possible disable animations
Add scroll/click support.
Put the screenshot inside a overlay

Inspiration

Using SocksProxyAgent fails

Prerequisites

I'm using the last version. (9.1.6)
My node version is the same as declared as package.json. (v14.17.5)

Subject of the issue

Trying to use proxy as described in documentation results in error for me. Could you advise what is the proper way to define it? In past issues I only found some other way, not described in the docs ( #259 ). Thanks

Steps to reproduce

const browserless = require('browserless')
const { SocksProxyAgent } = require('socks-proxy-agent')

function testf(url) {
    (async () => {
        const browserless_factory = browserless()
        const browser = await browserless_factory.createContext({
            // agent: undefined
            agent: new SocksProxyAgent({
                host: 'localhost',
                port: 9050
            })
        })
        page_content = await browser.html(url)
        await browser.destroyContext();
        console.log(page_content)
    })()
}

testf('http://ip-api.com/json')

Expected behaviour

Requests made through specified proxy.

Actual behaviour

Running the script above results in the following error:

(node:29627) UnhandledPromiseRejectionWarning: Error: Request is already handled!
    at Object.assert (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
    at HTTPRequest.continue (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPRequest.js:283:21)
    at PuppeteerBlocker.onRequest (<MY_PATH>/node_modules/@cliqz/adblocker-puppeteer/dist/cjs/adblocker.js:225:33)
    at BlockingContext.onRequest (<MY_PATH>/node_modules/@cliqz/adblocker-puppeteer/dist/cjs/adblocker.js:65:47)
    at <MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:226:52
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async HTTPRequest.finalizeInterceptions (<MY_PATH>/node_modules/puppeteer/lib/cjs/puppeteer/common/HTTPRequest.js:132:9)
(Use `node --trace-warnings ...` to show where the warning was created)
(node:29627) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). To terminate the node process on unhandled promise rejection, use the CLI flag `--unhandled-rejections=strict` (see https://nodejs.org/api/cli.html#cli_unhandled_rejections_mode). (rejection id: 2)
(node:29627) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Note that everything works ok if I change agent to undefined.

[screenshot] consider jimp alternatives

specially https://github.com/nuxt-community/jimp-compact

goto: Rename `disableAnimations` into `animations`

Auto close cookies banners

URLs for testing

Inspiration

Related

Looks like the plugin can't be loaded directly at puppeteer layer: puppeteer/puppeteer#1286

But I suppose we can do something in execution time.

[screenshot] image url overlay

emulateMedia → emulateMediaType

https://github.com/GoogleChrome/puppeteer/releases/tag/v2.0.0

Typescript definitions

Would love to have Typescript type definitions.

cursor emulation support

https://github.com/Xetera/ghost-cursor?auto_subscribed=false

Example of pool?

I'm trying to launch 100 instance of browser at the same time to load test my website. Any example for that using browserless pool?

Lighthouse: images for desktop reports returning mobile interface

Bug Report

Current Behavior
When I use the following MQL API, the report returns the result.data.insights.lighthouse.audits['final-screenshot'] is returned as a base64 encoded image. However, this image is of the mobile view and not of the desktop view of the website.

const url = 'https://anywebsitehere.com';
const payload = {
  meta: false,
  insights: {
    lighthouse: {
      device: 'desktop',
      onlyCategories: ['performance', 'best-practices', 'accessibility', 'seo'],
    },
    technologies: false,
  },
};

const result = await mql(url, payload);

Expected behavior/code

I'd expect the above to return the desktop variation of the image and not a mobile version.

Additional context/Screenshots

Can be provided upon request.

Installing browserless via npm throws an error

Prerequisites

[ x] I'm using the last version.
[ x] My node version is the same as declared as package.json.

Subject of the issue

Installing browserless via npm would fail and throw an error:

> node scripts/postinstall

/Project/node_modules/hooman/hooman.js:14
const instance = got.extend({
                     ^

TypeError: got.extend is not a function
    at Object.<anonymous> (/Project/node_modules/hooman/hooman.js:14:22)
    at Module._compile (internal/modules/cjs/loader.js:1133:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)
    at Module.load (internal/modules/cjs/loader.js:977:32)
    at Function.Module._load (internal/modules/cjs/loader.js:877:14)
    at Module.require (internal/modules/cjs/loader.js:1019:19)
    at require (internal/modules/cjs/helpers.js:77:18)
    at Object.<anonymous> (/Project/node_modules/top-user-agents/scripts/postinstall.js:5:13)
    at Module._compile (internal/modules/cjs/loader.js:1133:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1153:10)

Steps to reproduce

Note: You can reproduce the code using interactive Node.js shell by Runkit.

npm i -S browserless

[goto] implement `waitUntil: 'auto'`

People don't care if they need to wait for 'load' or 'networkidle*' event.

Implement an auto mode that follows this behavior.

Run 'networkidle*' and 'load' event on parallel (maybe two tabs?).
Add timeout over network event for preventing infinite waiting (10s?).
If the timeout is not exceeded, use 'networkidle*' event response. Otherwise, fallback into 'load'.

Testing URLs

Related

puppeteer/puppeteer#1353 (comment)

Add pool support

Inspired in

Implement using

Better tracking support

The current tracking implementation is a poor port of disconnect rules

We need to implement a built-in solution.

Related:

Enable data saver

https://chrome.google.com/webstore/detail/data-saver/pfmgfdlgomnbgkofeojodiodmgpgmkac

v9 iteration

whishlist

Better process management.
- keep running a single browser process.
- ensure to setup default viewport.
- find a way to setup proxy requests without flags.
- use connet/disconnect model.
Reduce I/O operations.
- use in memory (/dev/shm or similar) for userDataDir & temporal files. (puppeteer/puppeteer#7243)
- ensure temporal files are removed on after .destroy()

[goto] Disable JavaScript

When rendering PDF's server side it's often better to disable JavaScript execution. It just uses unnecessary server resources while it's most of the time not necessary to render the page.

It would be awesome if we could pass a property to /pdf where we can disable JavaScript in:
https://github.com/microlinkhq/browserless/blob/master/packages/pdf/src/index.js

Puppeteer supports this out of the box if you just use page.setJavaScriptEnabled(false) before navigation to a page:
https://github.com/puppeteer/puppeteer/blob/master/docs/api.md#pagesetjavascriptenabledenabled

Evasion techniques

Libraries

URLs to test

Related

https://timvanscherpenzeel.github.io/detect-gpu/

Review default ads lists

https://my.nextdns.io/configuration/f1624c/lists

[screenshot] image url overlay

Consider use `jimp` over `sharp`

It is a more lightweight dependency and only jpeg/png support is necessary.

https://github.com/oliver-moran/jimp

Add Firefox support

Sample from documentation site not working out of the box

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Subject of the issue

My environment:

nvm: 0.38.0
node: v12.22.6
npm: 6.14.15

Just want to test browserless following the starting guide from documentation but I am getting the error:

const browserless = await browserlessFactory.createContext()
                    ^^^^^

SyntaxError: await is only valid in async function
    at wrapSafe (internal/modules/cjs/loader.js:915:16)
    at Module._compile (internal/modules/cjs/loader.js:963:27)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:60:12)
    at internal/main/run_main_module.js:17:47

Steps to reproduce

Installing

mkdir test
cd test
nvm use v12.22.6
npm init -y
npm install browserless puppeteer --save

Sample project

touch index.js

`index.js` content

const createBrowserless = require('browserless')
const termImg = require('term-img')

// First, create a browserless factory 
// that it will keep a singleton process running
const browserlessFactory = createBrowserless()

// After that, you can create as many browser context
// as you need. The browser contexts won't share cookies/cache 
// with other browser contexts.
const browserless = await browserlessFactory.createContext()

// Perform the action you want, e.g., getting the HTML markup
const buffer = await browserless.screenshot('https://basecamp.com', {
  device: 'iPhone 6'
})

console.log(termImg(buffer))

// After your task is done, destroy your browser context
await browserless.destroyContext()

// At the end, gracefully shutdown the browser process
await browserless.close()

Running

node index.js

Runningkit: https://runkit.com/embed/a7xjdfhnz7xi

[screenshot] mobile overlay

Similar to

https://github.com/sindresorhus/capture-website/pull/27/files?short_path=f1d7f01#diff-f1d7f01715e29ea2a7cbaf4f2f8117cc

[screenshot] add url on top bar

Error in top-user-agent module

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Subject of the issue

Dependency error

Steps to reproduce

Attempts to run with multiple docker containers at once.

Expected behaviour

Normal operation

Actual behaviour

When running multiple times at once, a 503 error occurs in the request sent by top-user-agent.

It is presumed that 'https://techblog.willshouse.com/2012/01/03/most-common-user-agents/' used by top-user-agent has Cloudflare defense.

HTTPError: Response code 503 (Service Temporarily Unavailable)
crawler.4.@DESKTOP-2     |     at Request.<anonymous> (/home/webdriver/.../crawlers/node_modules/got/dist/source/as-promise/index.js:117:42)
crawler.4.@DESKTOP-2     |     at processTicksAndRejections (internal/process/task_queues.js:93:5) {
crawler.4.@DESKTOP-2     |   code: undefined,
crawler.4.@DESKTOP-2     |   timings: {
crawler.4.@DESKTOP-2     |     start: 1630547054142,
crawler.4.@DESKTOP-2     |     socket: 1630547054144,
crawler.4.@DESKTOP-2     |     lookup: 1630547054203,
crawler.4.@DESKTOP-2     |     connect: 1630547054241,
crawler.4.@DESKTOP-2     |     secureConnect: 1630547054285,
crawler.4.@DESKTOP-2     |     upload: 1630547054285,
crawler.4.@DESKTOP-2     |     response: 1630547054330,
crawler.4.@DESKTOP-2     |     end: 1630547054335,
crawler.4.@DESKTOP-2     |     error: undefined,
crawler.4.@DESKTOP-2     |     abort: undefined,
crawler.4.@DESKTOP-2     |     phases: {
crawler.4.@DESKTOP-2     |       wait: 2,
crawler.4.@DESKTOP-2     |       dns: 59,
crawler.4.@DESKTOP-2     |       tcp: 38,
crawler.4.@DESKTOP-2     |       tls: 44,
crawler.4.@DESKTOP-2     |       request: 0,
crawler.4.@DESKTOP-2     |       firstByte: 45,
crawler.4.@DESKTOP-2     |       download: 5,
crawler.4.@DESKTOP-2     |       total: 193
crawler.4.@DESKTOP-2     |     }
crawler.4.@DESKTOP-2     |   }
crawler.4.@DESKTOP-2     | }

Add types

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Subject of the issue

When you import the browserless on a typescript based project, the error is given that it has no types, i also don't see them in the repo or in @types.

Steps to reproduce

Just create a new typescript project and import the browserless.

import createBrowserless from 'browserless';

Tell us how to reproduce this issue.

Expected behaviour

Browserless should have types, so that it can be used easily in typescript and help the user with intellisense.

Actual behaviour

It has no inbuilt types or no info on them being installed separately.

Failed with vercel deployments or pkg bundling

Prerequisites

I'm using the last version.
My node version is the same as declared as package.json.

Failed to deploy apps relying on browserless to vercel

browserless is using prism-themes as dependency, but prism-themes doesn't have a valid main entry (should be a valid js file path). It caused deployment failures on vercel. I also tried with vercel/pkg, it failed as well.

Steps to reproduce

I'm deploying next-imagegen-example to vercel with puppeteer provider (which is using browserless)

you can use vercel/pkg to bundle imagegen-puppeteer-provider as well

Expected behaviour

Deployment should work well

Actual behaviour

Deployment failed with error that they couldn't resolve module prism-themes

Ideal Workaround

change require.resolve for prism-themes to sth else. maybe path.resolve

Cannot find module 'puppeteer'

Saw this linked on echo.js, decided to give a couple of the examples a shot.

Copy/pasted the screenshot example from the docs into a js file, added a package.json, installed and saved browserless, then ran node on the js file. I'm assuming that would be a standard use-case for the lib.

Here's the error. I will spend some time chasing it down when I get home from work later.

    throw err;                                                                                                                                                                                            
    ^                                                                                                                                                                                                     
                                                                                                                                                                                                          
Error: Cannot find module 'puppeteer'                                                                                                                                                                     
    at Function.Module._resolveFilename (module.js:557:15)                                                                                                                                                
    at Function.Module._load (module.js:484:25)                                                                                                                                                           
    at Module.require (module.js:606:17)                                                                                                                                                                  
    at require (internal/module.js:11:18)                                                                                                                                                                 
    at Object.<anonymous> (/home/mike/dev/test/node_modules/browserless/index.js:6:19)                                                                                                                    
    at Module._compile (module.js:662:30)                                                                                                                                                                 
    at Object.Module._extensions..js (module.js:673:10)                                                                                                                                                   
    at Module.load (module.js:575:32)                                                                                                                                                                     
    at tryModuleLoad (module.js:515:12)                                                                                                                                                                   
    at Function.Module._load (module.js:507:3) ```

buffer screenshot support

according with puppeteer documentation:

path: The file path to save the image to. The screenshot type will be inferred from file extension. If path is a relative path, then it is resolved relative to current working directory. If no path is provided, the image won't be saved to the disk.

so the temporal file could be moved out of the library