Code Monkey home page Code Monkey logo

parser's Introduction

Postlight Parser - Extracting content from chaos

CircleCI Greenkeeper badge Apache License MITC License Gitter chat

Postlight's Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more.

Postlight Parser powers Postlight Reader, a browser extension that removes ads and distractions, leaving only text and images for a beautiful reading view on any site.

Postlight Parser allows you to easily create custom parsers using simple JavaScript and CSS selectors. This allows you to proactively manage parsing and migration edge cases. There are many examples available along with documentation.

How? Like this.

Installation

# If you're using yarn
yarn add @postlight/parser

# If you're using npm
npm install @postlight/parser

Usage

import Parser from '@postlight/parser';

Parser.parse(url).then(result => console.log(result));

// NOTE: When used in the browser, you can omit the URL argument
// and simply run `Parser.parse()` to parse the current page.

The result looks like this:

{
  "title": "Thunder (mascot)",
  "content": "... <p><b>Thunder</b> is the <a href=\"https://en.wikipedia.org/wiki/Stage_name\">stage name</a> for the...",
  "author": "Wikipedia Contributors",
  "date_published": "2016-09-16T20:56:00.000Z",
  "lead_image_url": null,
  "dek": null,
  "next_page_url": null,
  "url": "https://en.wikipedia.org/wiki/Thunder_(mascot)",
  "domain": "en.wikipedia.org",
  "excerpt": "Thunder Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos",
  "word_count": 4677,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

If Parser is unable to find a field, that field will return null.

parse() Options

Content Formats

By default, Postlight Parser returns the content field as HTML. However, you can override this behavior by passing in options to the parse function, specifying whether or not to scrape all pages of an article, and what type of output to return (valid values are 'html', 'markdown', and 'text'). For example:

Parser.parse(url, { contentType: 'markdown' }).then(result =>
  console.log(result)
);

This returns the the page's content as GitHub-flavored Markdown:

"content": "...**Thunder** is the [stage name](https://en.wikipedia.org/wiki/Stage_name) for the..."
Custom Request Headers

You can include custom headers in requests by passing name-value pairs to the parse function as follows:

Parser.parse(url, {
  headers: {
    Cookie: 'name=value; name2=value2; name3=value3',
    'User-Agent':
      'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1',
  },
}).then(result => console.log(result));
Pre-fetched HTML

You can use Postlight Parser to parse custom or pre-fetched HTML by passing an HTML string to the parse function as follows:

Parser.parse(url, {
  html:
    '<html><body><article><h1>Thunder (mascot)</h1><p>Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos</p></article></body></html>',
}).then(result => console.log(result));

Note that the URL argument is still supplied, in order to identify the web site and use its custom parser, if it has any, though it will not be used for fetching content.

The command-line parser

Postlight Parser also ships with a CLI, meaning you can use it from your command line like so:

Postlight Parser CLI Basic Usage

# Install Postlight Parser globally
yarn global add @postlight/parser
#   or
npm -g install @postlight/parser

# Then
postlight-parser https://postlight.com/trackchanges/mercury-goes-open-source

# Pass optional --format argument to set content type (html|markdown|text)
postlight-parser https://postlight.com/trackchanges/mercury-goes-open-source --format=markdown

# Pass optional --header.name=value arguments to include custom headers in the request
postlight-parser https://postlight.com/trackchanges/mercury-goes-open-source --header.Cookie="name=value; name2=value2; name3=value3" --header.User-Agent="Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1"

# Pass optional --extend argument to add a custom type to the response
postlight-parser https://postlight.com/trackchanges/mercury-goes-open-source --extend credit="p:last-child em"

# Pass optional --extend-list argument to add a custom type with multiple matches
postlight-parser https://postlight.com/trackchanges/mercury-goes-open-source --extend-list categories=".meta__tags-list a"

# Get the value of attributes by adding a pipe to --extend or --extend-list
postlight-parser https://postlight.com/trackchanges/mercury-goes-open-source --extend-list links=".body a|href"

# Pass optional --add-extractor argument to add a custom extractor at runtime.
postlight-parser https://postlight.com/trackchanges/mercury-goes-open-source --add-extractor ./src/extractors/fixtures/postlight.com/index.js

License

Licensed under either of the below, at your preference:

Contributing

For details on how to contribute to Postlight Parser, including how to write a custom content extractor for any site, see CONTRIBUTING.md

Unless it is explicitly stated otherwise, any contribution intentionally submitted for inclusion in the work, as defined in the Apache-2.0 license, shall be dual licensed as above without any additional terms or conditions.


🔬 A Labs project from your friends at Postlight. Happy coding!

parser's People

Contributors

adampash avatar austinmbrown avatar benubois avatar david0leong avatar dependabot[bot] avatar droob avatar dviramontes avatar fdsimms avatar george-haddad avatar ginatrapani avatar greenkeeper[bot] avatar jadtermsani avatar janetleekim avatar jbrayton avatar jfix avatar johnholdun avatar kev5873 avatar kik0220 avatar mtashley avatar mutewinter avatar nitinthewiz avatar ralphjbeily avatar sdoire avatar silasburton avatar spiffytoy avatar svenwiegand avatar touchred avatar toufic-m avatar wajeehzantout avatar zhemaituk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parser's Issues

[REQUEST] Parsing pre-fetched html WITHOUT url

  • Platform: Windows dev, Android emulator
  • Mercury Parser Version: 2.0.0

Continuing from #277

Supplying both url and html as params to Mercury.parse() works. BUT, it won't work without a url, or any url that won't resolve. See below test, which just truncates url to first 17 chars.

Is there a way to parse raw html to extract to "content", without any parsing or calling of url? This is to avoid double url calling and loading, when that url has obviously already been called elsewhere with resultant html fetched.

Mercury.parse(url.substr(0, 17), {html: rawHtml}).then(result => console.log('Mercury parse:', result)).catch((error) => {
  console.log(index, 'Mercury error:', error)
})

Chrome devtools log something like this:

"Mercury parse:" Error: CORS request rejected: https://www.cnbc.
    at on_response (blob:http://localhos…3a5eac74cd03:144137)
    at XMLHttpRequest.on_state_change (blob:http://localhos…3a5eac74cd03:144120)
    at XMLHttpRequest.dispatchEvent (blob:http://localhos…-3a5eac74cd03:29193)
    at XMLHttpRequest.setReadyState (blob:http://localhos…-3a5eac74cd03:28931)
    at XMLHttpRequest.__didCompleteResponse (blob:http://localhos…-3a5eac74cd03:28773)
    at blob:http://localhos…-3a5eac74cd03:28883
    at RCTDeviceEventEmitter.emit (blob:http://localhos…6-3a5eac74cd03:8785)
    at MessageQueue.__callFunction (blob:http://localhos…6-3a5eac74cd03:8207)
    at blob:http://localhos…6-3a5eac74cd03:7980
    at MessageQueue.__guard (blob:http://localhos…6-3a5eac74cd03:8161)

An in-range update of eslint-plugin-import is breaking the build 🚨

The devDependency eslint-plugin-import was updated from 2.16.0 to 2.17.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

eslint-plugin-import is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: test-node: Your tests failed on CircleCI (Details).
  • ci/circleci: test-web: Your tests passed on CircleCI! (Details).

Commits

The new version differs by 61 commits.

  • 0499050 bump to v2.17.0
  • f479635 [webpack] v0.11.1
  • 8a4226d Merge pull request #1320 from bradzacher/export-ts-namespaces
  • 988e12b fix(export): Support typescript namespaces
  • 70c3679 [docs] make rule names consistent
  • 6ab25ea [Tests] skip a TS test in eslint < 4
  • 405900e [Tests] fix tests from #1319
  • 2098797 [fix] export: false positives for typescript type + value export
  • 70a59fe [fix] Fix overwriting of dynamic import() CallExpression
  • e4850df [ExportMap] fix condition for checking if block comment
  • 918567d [fix] namespace: add check for null ExportMap
  • 2d21c4c Merge pull request #1297 from echenley/ech/fix-isBuiltIn-local-aliases
  • 0ff1c83 [dev deps] lock typescript to ~, since it doesn’t follow semver
  • 40bf40a [*] [deps] update resolve
  • 28dd614 Merge pull request #1304 from bradennapier/feature/typescript-export-type

There are 61 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Feature request: JSON output

Similar to #316, I would love to be able to specify a different output mode. In this case, JSON instead of HTML. Example usage:

Mercury.parse({ url: myURL, outputMode: 'json' }).then(result => console.log(result));

or

Mercury.parse({myURL, 'json').then(result => console.log(result));

Example result:

{
  "title": "Every developer should write a personal automation API",
  "content": [
    { "type": "header1", "value": "Why I love it" },
    { "type": "paragraph", "value": "A few years ago I fell in love..."},
    { "type": "image", "src": "https://..." },
    { "type": "unordered-list", "value": [
      { "type": "list-item", "value": "list item one" }
    ]}
  ],
  "author": "Wikipedia Contributors",
  "date_published": "2016-09-16T20:56:00.000Z",
  "lead_image_url": null,
  "dek": null,
  "next_page_url": null,
  "url": "https://www.anotherdevblog.net/posts/every-developer-should-write-a-personal-automation-api",
  "domain": "www.anotherdevblog.net",
  "excerpt": "Every developer should write a personal automation API",
  "word_count": 4677,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

Reddit URLs not supported

The following reddit url will not get parsed correctly

https://www.reddit.com/r/BetterEveryLoop/comments/9yz5o2/warmth/

curl -i -X GET \
   -H "Content-Type:application/json" \
   -H "x-api-key:xxxxxxxx" \
 'https://mercury.postlight.com/parser?url=https%3A%2F%2Fwww.reddit.com%2Fr%2FBetterEveryLoop%2Fcomments%2F9yz5o2%2Fwarmth%2F'

Returns

HTTP/1.1 502 Bad Gateway
Content-Length: 36
Content-Type: application/json
Date: Tue, 22 Jan 2019 09:19:41 GMT
Via: 1.1 501ad2910f631f0520a6d389d6f053e9.cloudfront.net (CloudFront)
X-Amz-Apigw-Id: T5f1iE-nIAMF8cg=
X-Amz-Cf-Id: HIkrESehbkfLFequ3G2yuVC0J2jf5nsGQeo2b1t_7_1IPV4B3JDuzA==
X-Amzn-Requestid: d3573917-1e26-11e9-8168-4922f69c2a07
X-Cache: Error from cloudfront
Age: 10
Connection: keep-alive
Server: Netlify
X-NF-Request-ID: 3006aa4c-a2b7-46c8-b9cf-57fb160bc529-581598

{"message": "Internal server error"}

Before and after screenshots using mercury reader

merc_before

merc_after

Using a prepared response buffer always throws "No children, likely a bad parse."

  • Platform: Darwin mickel 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64
  • Mercury Parser Version: 1.1.1
  • Node Version (if a Node bug): 8.12.0

Expected Behavior

When using Mercury.parse with a prepared html response, I expect it to be able to parse it.

Current Behavior

Throws error "No children, likely a bad parse."

Steps to Reproduce

const html = Buffer.from(`<html><head><meta charset="utf-8"></head><body><p id="PXeWGg">This is a test</p> <p id="NSYfDg">Hello!.</p></body></html>`, 'utf8')
const result = await Mercury.parse('https://example.com/page', { html }) // throws error

Bug: Whitespaces like CRLF are removed within <pre> tags

  • Platform:
    Darwin 16.7.0 Darwin Kernel Version 16.7.0: Thu Dec 20 21:53:35 PST 2018; root:xnu-3789.73.31~1/RELEASE_X86_64 x86_64
  • Mercury Parser Version:
    2.0.0
  • Node Version (if a Node bug):
    v11.10.1

Expected Behavior

When parsing a website containing a <pre> tag, whitespaces within the tag should not be changed. If there are new lines/CRLF, tabs or multiple spaces, these characters should remain unchanged.

Current Behavior

When parsing a website containing a <pre> tag, the whitespaces within the tag are removed, in particular newlines and tabs. This is breaking the contract of a pre tag and breaks e.g. code listings on websites.

Steps to Reproduce

Parse https://usr42.github.io/mercury-parser-pre-tag/ with following html code:

<!DOCTYPE html>
<html>
    <head>
        <title>Page With pre tag</title>
    </head>
    <body>
        <p>Here comes the code:</p>
        <pre>
#!/bin/sh

cd $(dirname "$0") && pwd
        </pre>
    </body>
</html>

Run:

mercury-parser https://usr42.github.io/mercury-parser-pre-tag/

Output (abridged):

{
  "content": "<body> <p>Here comes the code:</p> <pre>\n#!/bin/sh cd $(dirname &quot;$0&quot;) &amp;&amp; pwd </pre> </body>",
}

Expected output (abridged):

{
  "content": "<body> <p>Here comes the code:</p> <pre>\n#!/bin/sh\n\ncd $(dirname &quot;$0&quot;) &amp;&amp; pwd\n        </pre> </body>",
}

Possible Solution

This issue seems to be related to the usage of normalizeSpaces() in src/extractors/generic/content/extractor.js:83.
Here is a possible diff:

diff --git a/src/extractors/generic/content/extractor.js b/src/extractors/generic/content/extractor.js
index b4e57a7..05e6599 100644
--- a/src/extractors/generic/content/extractor.js
+++ b/src/extractors/generic/content/extractor.js
@@ -2,7 +2,6 @@ import cheerio from 'cheerio';

 import { nodeIsSufficient } from 'utils/dom';
 import { cleanContent } from 'cleaners';
-import { normalizeSpaces } from 'utils/text';

 import extractBestNode from './extract-best-node';

@@ -80,7 +79,7 @@ const GenericContentExtractor = {
       return null;
     }

-    return normalizeSpaces($.html(node));
+    return $.html(node);
   },
 };

After changing this line, the pre content is fine, but also other whitespaces are not removed.
The parser then generates following output (abridged):

{
  "content": "<body>\n        <p>Here comes the code:</p>\n        <pre>\n#!/bin/sh\n\ncd $(dirname &quot;$0&quot;) &amp;&amp; pwd\n        </pre>\n    </body>",
}

Thanks and best regards,
Balthasar aka usr42

Can't resolve 'cheerio' in mercury-parser/dist'

  • Platform:
  • Mercury Parser Version:
  • Node Version (if a Node bug):
  • Browser Version (if a browser bug):

after importing import Mercury from '@postlight/mercury-parser'; in my react app I have
Failed to compile.

./node_modules/@postlight/mercury-parser/dist/mercury.js
Module not found: Can't resolve 'cheerio' in '/Users/farbodaprin/Desktop/MASTER THISES/Master resources/SALMON/SALMON-PROGRAMING/Private Salmon/FeryAndSOmy/node_modules/@postlight/mercury-parser/dist'

Expected Behavior

can use mercury as mentioned in readme.md

Current Behavior

Steps to Reproduce

Detailed Description

Possible Solution

Feature Request: paywall authentication

I'm not even sure Mercury Reader is the right place to add this feature, so I apologize in advance if this request is unreasonable.

If would like to see support for reading content from subscription-based websites, which have paywalls before their content. Obviously, this already works if I use Mercury Reader in a browser, and I manually log-in to the paywalled site.
However, when using Mercury indirectly via reading apps like Feedbin, I'm only seeing partial content, or nothing at all.
Pocket provides such a feature: https://help.getpocket.com/article/1065-saving-from-subscription-based-sites
It would be amazing if this could be supported by other reading apps like Reeder, Instapaper, etc.
I'm pretty sure this is not at all easy to implement, and comes with a lot of issues like password storage and other security implications. I have zero background in web technology, so I can't image how difficult this would be to implement, but I imagine it would need a lot of custom solutions for different websites.
I just want to kindly ask for feedback here. Do you think the increasing amount of paywalled content on the internet could be accessed in the nice, clean way that Mercury is able to provide for unrestricted websites?
Or is this something that services like Feedbin would have to provide, with Mercury acting as the back-end?

Feature Request: Support .txt Files

And perhaps other common document types. I believe that Readability supported these with a hard-coded exception based on file extension or mime type.

curl -H "x-api-key: API_KEY_HERE" "https://mercury.postlight.com/parser?url=https://wordpress.org/plugins/about/readme.txt"

Returns null

Feature Request: Markdown output

Similar to #317, I would love to be able to specify a different output mode. In this case, markdown instead of HTML. Example usage:

Mercury.parse({ url: myURL, outputMode: 'markdown' }).then(result => console.log(result));

or

Mercury.parse({myURL, 'markdown').then(result => console.log(result));

Example result:

{
  "title": "Every developer should write a personal automation API",
  "content": "## Why I love it\n\nA few years ago I fell in love with [If This Then That](https://ifttt.com/) (IFTTT). It's a remarkable...",
  "author": "Wikipedia Contributors",
  "date_published": "2016-09-16T20:56:00.000Z",
  "lead_image_url": null,
  "dek": null,
  "next_page_url": null,
  "url": "https://www.anotherdevblog.net/posts/every-developer-should-write-a-personal-automation-api",
  "domain": "www.anotherdevblog.net",
  "excerpt": "Every developer should write a personal automation API",
  "word_count": 4677,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

Provide standalone parser package

I believe downloading document is out of scope for a parser package and in most cases there are already some downloading solutions existing in project that may vary widely, pulling addition dependencies that won't be used directly by the project seems redundant.

In my opinion providing standalone parser would greatly reduce complexity of the package, reduce reliance on external packages and allow easy inclusion in lightweight projects.

ENOENT error when deploying to Firebase Cloud Functions

  • Platform: Linux 4.15.0-1028-gcp (Firebase cloud function deployment)
  • Mercury Parser Version: 2.0.0
  • Node Version (if a Node bug): v6.16.0
  • Browser Version (if a browser bug):

Expected Behavior

I want to deploy my Firebase Cloud functions file with Mercury:

In package.json:

{
  "name": "functions",
  "description": "Cloud Functions for Firebase",
  "scripts": {
    "serve": "firebase serve --only functions",
    "shell": "firebase functions:shell",
    "start": "npm run shell",
    "deploy": "firebase deploy --only functions",
    "logs": "firebase functions:log"
  },
  "dependencies": {
    "@postlight/mercury-parser": "^2.0.0",
    "firebase-admin": "~7.0.0",
    "firebase-functions": "^2.2.0"
  },
  "private": true
}

In my index.js file:

const functions = require('firebase-functions');
const admin = require('firebase-admin');
const Mercury = require('@postlight/mercury-parser');

admin.initializeApp(functions.config().firebase);

exports.fetchArticleData = functions.database.ref('/articles/{articleId}')
	.onCreate(snapshot => {
		const article = snapshot.val();
		const { url } = article;

		console.log('New URL added. Fetching data for', url);

		return Mercury.parse(url)
			.then(data => {
				console.log('Received data', data);
				snapshot.ref.update(data);
			})
			.catch(err => {
				console.error('Could not fetch data for', url, err);
				snapshot.ref.update({
					parserError: true,
				});
			});
	});

Current Behavior

When I deploy to Firebase (firebase deploy --only functions I get an npm build error:

⚠  functions[fetchArticleData(us-central1)]: Deployment error.
Build failed: exit status 1
npm WARN deprecated [email protected]: Use uuid module instead
npm WARN deprecated [email protected]: This version is no longer maintained. Please upgrade to the latest version.
npm WARN deprecated [email protected]: This version is no longer maintained. Please upgrade to the latest version.
npm WARN deprecated [email protected]: This version is no longer maintained. Please upgrade to the latest version.
functions@ /workspace
`-- (empty)

npm WARN In @postlight/[email protected] replacing bundled version of browser-request with [email protected]
npm ERR! Linux 4.15.0-1028-gcp
npm ERR! argv "/nodejs/bin/node" "/nodejs/bin/npm" "--global-style" "--production" "--fetch-retries=5" "--fetch-retry-factor=2" "--fetch-retry-mintimeout=1000" "install" "/workspace"
npm ERR! node v6.16.0
npm ERR! npm  v3.10.10
npm ERR! path /workspace/node_modules/.staging/jquery-59911f29
npm ERR! code ENOENT
npm ERR! errno -2
npm ERR! syscall rename

npm ERR! enoent ENOENT: no such file or directory, rename '/workspace/node_modules/.staging/jquery-59911f29' -> '/workspace/node_modules/@postlight/mercury-parser/node_modules/jquery'
npm ERR! enoent ENOENT: no such file or directory, rename '/workspace/node_modules/.staging/jquery-59911f29' -> '/workspace/node_modules/@postlight/mercury-parser/node_modules/jquery'
npm ERR! enoent This is most likely not a problem with npm itself
npm ERR! enoent and is related to npm not being able to find a file.
npm ERR! enoent

Steps to Reproduce

I'm not sure if this is reproducible without Firebase

Detailed Description

As far as I can tell, this is an issue with Mercury on Firebase (perhaps due to the Node version it uses?) as I am able to install and run it fine locally.

I have scoured around for Firebase-related fixes for this but can't find anything. Other packages I use in my Firebase functions file work fine.

Any suggestions for fixes or things to try are much appreciated!

Iconv-lite warning: decode()-ing strings is deprecated.

Installed mercury-parser from NPM v2.0.0

Works great, thanks!

Every execution however throws an "Iconv-lite warning: decode()-ing strings is deprecated." error.

We are passing in strings per the documentation. This is fixable by converting the string to a buffer before passing it to Mercury using something along the lines of Buffer.from(htmlSource, 'utf8');

Expected behavior would be to either disallow sending strings to Mercury, to convert strings automagically to buffers, or to override the iconv error with iconv.skipDecodeWarning = true; per the iconv-lite author's recommendation

src attributes on img tags are being returned as concatenated strings with all srcset sources

  • Platform: macOS 10.14.3
  • Mercury Parser Version: 2.0.0
  • Node Version (if a Node bug): v10.15.0

Expected Behavior

On content the src attributes on img tags should be preserved when there is also a srcset attribute.

Current Behavior

src on img is being returned as a concatenated string with all srcset.

Steps to Reproduce

Install Mercury globally:
npm -g install @postlight/mercury-parser

Then:
mercury-parser https://www.theverge.com/2019/2/13/18223847/minecraft-hackers-charged-united-kingdom-california-apophox-squad

When there is no srcset the img src are OK:
mercury-parser https://www.macrumors.com/2019/03/05/disney-bob-iger-apple-board-seat-at-risk/

Detailed Description

mercury-parser https://www.theverge.com/2019/2/13/18223847/minecraft-hackers-charged-united-kingdom-california-apophox-squad

{
  "title": "Two hackers charged with Minecraft-linked bomb threats that led to school evacuations",
  "content": "<div><div><figure class=\"e-image e-image--hero\">\n  <span class=\"e-image__inner\">\n    <span class=\"e-image__image \">\n      \n        <picture class=\"c-picture\">\n  \n\n\n  <source srcset=\"https://cdn.vox-cdn.com/thumbor/4WLlAQ7-Hi1PvxsUkI3_3H1hftQ=/0x0:1100x774/320x213/filters:focal(462x299:638x475):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 320w, https://cdn.vox-cdn.com/thumbor/CkUEuwH-zGxXjEvihzYrpHME6Y0=/0x0:1100x774/620x413/filters:focal(462x299:638x475):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 620w, https://cdn.vox-cdn.com/thumbor/vbEObpYmNOu9DCNzjUBCgwxiPOQ=/0x0:1100x774/920x613/filters:focal(462x299:638x475):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 920w, https://cdn.vox-cdn.com/thumbor/7Fcd-rAdcus0Gdk49qJ9ij5_ZtA=/0x0:1100x774/1220x813/filters:focal(462x299:638x475):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 1220w, https://cdn.vox-cdn.com/thumbor/UehIXdLBVtAgS6r2Fst1JTx9sLQ=/0x0:1100x774/1520x1013/filters:focal(462x299:638x475):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 1520w, https://cdn.vox-cdn.com/thumbor/gVxKb16-R8CXYgwoS1aR0q8CvNM=/0x0:1100x774/1820x1213/filters:focal(462x299:638x475):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 1820w, https://cdn.vox-cdn.com/thumbor/RxpERpxPmgG67EKTD8YIxg0IdA4=/0x0:1100x774/2120x1413/filters:focal(462x299:638x475):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 2120w, https://cdn.vox-cdn.com/thumbor/RW1infQmHP7hPZyWRmpX_IztYeg=/0x0:1100x774/2420x1613/filters:focal(462x299:638x475):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 2420w\">\n\n\n<img srcset=\"https://cdn.vox-cdn.com/thumbor/lYPgSQb-qmXamVMxZKxE_z-Hk-U=/0x0:1100x774/320x213/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 320w, https://cdn.vox-cdn.com/thumbor/nfMWsBpgxnLoEFvkzTcMGQdhYBE=/0x0:1100x774/620x413/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 620w, https://cdn.vox-cdn.com/thumbor/E7Vr0BumtnEBIxRQ8nLLLpPtcsU=/0x0:1100x774/920x613/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 920w, https://cdn.vox-cdn.com/thumbor/DT7-un_yVJl5f03Bnf_KeUYg-A0=/0x0:1100x774/1220x813/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 1220w, https://cdn.vox-cdn.com/thumbor/clPb8fv3wVRLilOiqTIoIanYYv0=/0x0:1100x774/1520x1013/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 1520w, https://cdn.vox-cdn.com/thumbor/4a1tof1AlujY5LOqz2Irr-vr2j8=/0x0:1100x774/1820x1213/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 1820w, https://cdn.vox-cdn.com/thumbor/qscK1LhlSlo8X_yeWIKhhvMD3iU=/0x0:1100x774/2120x1413/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 2120w, https://cdn.vox-cdn.com/thumbor/Zw__sdOMexcTr91TDCdam30fIeo=/0x0:1100x774/2420x1613/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg 2420w\" alt src=\"https://cdn.vox-cdn.com/thumbor/lYPgSQb-qmXamVMxZKxE_z-Hk-U=/0x0:1100x774/320x213/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg%20320w,%20https://cdn.vox-cdn.com/thumbor/nfMWsBpgxnLoEFvkzTcMGQdhYBE=/0x0:1100x774/620x413/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg%20620w,%20https://cdn.vox-cdn.com/thumbor/E7Vr0BumtnEBIxRQ8nLLLpPtcsU=/0x0:1100x774/920x613/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg%20920w,%20https://cdn.vox-cdn.com/thumbor/DT7-un_yVJl5f03Bnf_KeUYg-A0=/0x0:1100x774/1220x813/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg%201220w,%20https://cdn.vox-cdn.com/thumbor/clPb8fv3wVRLilOiqTIoIanYYv0=/0x0:1100x774/1520x1013/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg%201520w,%20https://cdn.vox-cdn.com/thumbor/4a1tof1AlujY5LOqz2Irr-vr2j8=/0x0:1100x774/1820x1213/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg%201820w,%20https://cdn.vox-cdn.com/thumbor/qscK1LhlSlo8X_yeWIKhhvMD3iU=/0x0:1100x774/2120x1413/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg%202120w,%20https://cdn.vox-cdn.com/thumbor/Zw__sdOMexcTr91TDCdam30fIeo=/0x0:1100x774/2420x1613/filters:focal(462x299:638x475)/cdn.vox-cdn.com/uploads/chorus_image/image/63049664/Minecraft-360.0.jpg%202420w\">\n\n</picture>\n\n      \n    </span>\n    \n  </span>\n  \n</figure><div class=\"c-entry-content\">\n  <p id=\"4cSpNl\">Two alleged hackers in Southern California <a href=\"https://www.courthousenews.com/hackers-charged-with-bomb-threats-against-lax-schools/\">have been arrested and charged</a> with multiple offenses following <a href=\"https://kotaku.com/someone-made-400-school-bomb-threats-because-they-were-1823905955\">widespread <em>Minecraft</em>-related hoaxes</a> that led to hundreds of schools in the UK and US being evacuated over false bomb threats. </p>\n<p id=\"ka0iif\">Timothy Dalton Vaughn and George Duke-Cohan, otherwise known by their respective hacker aliases &#x201C;wantedbyfeds&#x201D; and &#x201C;DigitalCrimes,&#x201D; <a href=\"https://www.courthousenews.com/wp-content/uploads/2019/02/Apophis-Squad-INDICTMENT-2.pdf\">were charged on February 8th</a> with hacking crimes, conspiracy, and &#x201C;interstate threats involving explosives,&#x201D; referring to the numerous bomb threats sent to different schools. Court documents claim that Vaughn and Duke-Cohan, acting within a hacker organization known as Apophis Squad, coordinated a series of &#x201C;bomb and school-shooting threats designed to cause fear of imminent danger and did cause the closure of hundreds of schools on two continents on multiple occasions.&#x201D; </p>\n<p id=\"6XMW3x\">Reports of Vaughn and Duke-Cohan&#x2019;s actions first sprung up in March 2018. Emails were sent out to schools, forcing evacuations, but a statement from the <a href=\"https://twitter.com/northumbriapol/status/975679757206638592\">Northumbria Police in the United Kingdom confirmed</a> it was a hoax that traced back to the United States. </p>\n<p id=\"pK2PIM\">&#x201C;Detectives have looked into the emails &#x2014; which appear to originate from the US &#x2014; and can confirm that there is no viable threat,&#x201D; the department wrote on Twitter. </p>\n<p id=\"UNnFNS\">Sky News <a href=\"https://news.sky.com/story/school-bomb-hoaxes-revealed-to-be-part-of-minecraft-gamer-feud-11297062?DCMP=afc-103504&amp;awc=11005_1521490496_148c618f92b92330a9f76e94723f7955\">later reported</a> that the emails were &#x201C;spoofed&#x201D; in an attempt to get the domain for VeltPvP, a popular <em>Minecraft </em>server, suspended. Prosecutors allege that Apophis Squad also used the emails to make it sound like the threats were coming from the Mayor of London and Zonix, a client often used for <em>Minecraft, </em>alongside VeltPvP. Vaughan and Duke-Cohen used Discord servers and IRC rooms with other members of the Apophis Squad, according to court documents, to coordinate emails to various schools. </p>\n<p id=\"NdYDjl\">Threats were sent over the course of several months, with numerous incidents mentioned in court filings. Apophis Squad would use Twitter to ask people who wanted a day off school to send the hacker squad cash, and would send out a hoax email in response. On April 28th, 2018, Duke-Cohen tweeted under the Apophis Squad Twitter handle that they were &#x201C;planning to hit as many schools as possible on Monday.&#x201D;</p>\n<p id=\"5cOvVx\">&#x201C;We hope anyone that just wants to have a day off or get out of that math test you have! will email us any schools,&#x201D; Duke-Cohen tweeted, according to documents. &#x201C;We are working hard on getting 100K emails.&#x201D;</p>\n</div></div></div>",
  "author": "Julia Alexander",
  "date_published": "2019-02-13T22:15:55.000Z",
  "lead_image_url": "https://cdn.vox-cdn.com/thumbor/ppJpih9nRVFLeBqsErdb8I1gVBE=/0x99:1100x675/fit-in/1200x630/cdn.vox-cdn.com/assets/1142867/Minecraft-360.jpg",
  "dek": null,
  "next_page_url": null,
  "url": "https://www.theverge.com/2019/2/13/18223847/minecraft-hackers-charged-united-kingdom-california-apophox-squad",
  "domain": "www.theverge.com",
  "excerpt": "Defendants belonged to Apophis Squad",
  "word_count": 383,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

mercury-parser https://www.macrumors.com/2019/03/05/disney-bob-iger-apple-board-seat-at-risk/

{
  "title": "Disney CEO's Board Seat 'at Risk' With Apple Planning to Launch Video Service",
  "content": "<div><div class=\"article\">\n          \n          \n          <div class=\"content\">Apple&apos;s upcoming video streaming service and its work on original TV content could spell trouble for Apple board member and Disney CEO Bob Iger, reports <em><a href=\"https://www.bloomberg.com/news/articles/2019-03-05/apple-s-video-plans-put-disney-chief-s-board-seat-at-risk\">Bloomberg</a></em>, citing the potential for competition between the two companies.\r<br>\n\r<br>\nIger is potentially at risk of losing his seat on Apple&apos;s board as Apple prepares to launch its streaming TV service. Apple already has more than two dozen original TV shows in the works and has purchased rights to several movies, with all of that content set to be offered via the upcoming service.\r<br>\n\r<br>\n<img src=\"https://cdn.macrumors.com/article-new/2019/03/disneybobigerbloomberg-800x533.jpg\" alt width=\"800\" class=\"aligncenter size-large wp-image-683121\"><em><center>Image via <a href=\"https://www.bloomberg.com/news/articles/2019-03-05/apple-s-video-plans-put-disney-chief-s-board-seat-at-risk\">Bloomberg</a></center></em>\r<br>\nApple&apos;s service, which it <a href=\"https://www.macrumors.com/2019/02/13/apple-tv-service-launch-march-25/\">plans to introduce</a> at a March 25 event but <a href=\"https://www.macrumors.com/2019/02/14/apple-tv-service-launch-months-off/\">launch later in the year</a>, will also incorporate add-on content from other providers like SHOWTIME.\r<br>\n\r<br>\nDisney, like Apple, is <a href=\"https://www.macrumors.com/2018/11/08/disney-streaming-service-late-2019/\">working on its own streaming service</a>, Disney+, and is potentially set to be one of Apple&apos;s competitors. Disney+ will offer Disney, Star Wars, and Marvel content (including content made just for Disney+), and like Apple&apos;s TV service, it will launch in 2019. Disney also recently acquired Fox&apos;s assets, giving it majority control over Hulu and other channels and film franchises.\r<br>\n\r<br>\nApple proxy filings that have detailed &quot;arms-length commercial dealings&quot; with Disney have specified that Iger does not have a &quot;material direct or indirect interest&quot; in the deals, but <em>Bloomberg</em> suggests that could change when both companies have launched their streaming services.\r<br>\n\r<br>\nJohn Coffee, director of the Center on Corporate Governance at Columbia Law School told <em>Bloomberg</em> that Disney and Apple &quot;might have to recognize that they will become active competitors in the near future.&quot; Both companies likely have legal advisers exploring whether Iger should continue to be on Apple&apos;s board, according to Coffee.\r<br>\n\r<br>\nIger, who was a good friend of Steve Jobs, has been on Apple&apos;s board since 2011, but there is precedent for a board member leaving due to increasing competition. Former Google CEO Eric Schmidt was previously on Apple&apos;s board, but resigned in 2009 after Google entered the smartphone market.<br><br></div>\n          \n        </div></div>",
  "author": "Juli Clover",
  "date_published": "2019-03-05T21:21:00.000Z",
  "lead_image_url": "https://cdn.macrumors.com/article-new/2019/03/disneybobigerbloomberg.jpg?retina",
  "dek": null,
  "next_page_url": null,
  "url": "https://www.macrumors.com/2019/03/05/disney-bob-iger-apple-board-seat-at-risk/",
  "domain": "www.macrumors.com",
  "excerpt": "Apple's upcoming video streaming service and its work on original TV content could spell trouble for Apple board member and Disney CEO Bob Iger,...",
  "word_count": 326,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

Possible Solution

Seems to be something around here but I'm not sure how to fix this:

https://github.com/postlight/mercury-parser/blob/15f7fa1e27fe6b47c87da40ba4fce9b2db7934ec/src/utils/dom/make-links-absolute.js#L45-L49

install instructions do not work

first off, big fan of the api - i've been using it for a side project for awhile. disappointed that the web api is breaking down / being taken down, but i suppose it must just be a big money sink.

second, js is not the world i usually live in, so i'm probably doing something wrong. maybe you can help me out.

  • Platform: Darwin WOOFWOOFMBP2016.fios-router.home 18.2.0 Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64 x86_64
  • Mercury Parser Version: version "2.0.0"
  • Node Version (if a Node bug): v11.11.0
  • Browser Version (if a browser bug):

Expected Behavior

Follow install instructions to run usage example

Current Behavior

unable to run code in example.
unable to build project via yarn and webpack.

there is no mention of a build step needed, but also i've been seeing reports that suggest building is needed to run the es6 in the example. so maybe a build step is needed?

i'm trying to build in webpack after installing via yarn (yarn add @postlight/mercury-parser). i am getting errors where webpack cannot resolve the dependencies cheerio and iconv-lite referenced in dist/mercury.js

ERROR in ./node_modules/@postlight/mercury-parser/dist/mercury.js
Module not found: Error: Can't resolve 'cheerio' in '/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/dist'
resolve 'cheerio' in '/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/dist'
  Parsed request is a module
  using description file: /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/package.json (relative path: ./dist)
    aliased from description file /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/package.json with mapping 'cheerio' to './src/shims/cheerio-query'
      using description file: /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/package.json (relative path: .)
        using description file: /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/package.json (relative path: ./src/shims/cheerio-query)
          no extension
            /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/cheerio-query doesn't exist
          .js
            /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/cheerio-query.js doesn't exist
          as directory
            /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/cheerio-query doesn't exist
[/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/cheerio-query]
[/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/cheerio-query.js]
 @ ./node_modules/@postlight/mercury-parser/dist/mercury.js 10:30-48
 @ ./src/index.js

ERROR in ./node_modules/@postlight/mercury-parser/dist/mercury.js
Module not found: Error: Can't resolve 'iconv-lite' in '/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/dist'
resolve 'iconv-lite' in '/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/dist'
  Parsed request is a module
  using description file: /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/package.json (relative path: ./dist)
    aliased from description file /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/package.json with mapping 'iconv-lite' to './src/shims/iconv-lite'
      using description file: /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/package.json (relative path: .)
        using description file: /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/package.json (relative path: ./src/shims/iconv-lite)
          no extension
            /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/iconv-lite doesn't exist
          .js
            /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/iconv-lite.js doesn't exist
          as directory
            /Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/iconv-lite doesn't exist
[/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/iconv-lite]
[/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/src/shims/iconv-lite.js]
 @ ./node_modules/@postlight/mercury-parser/dist/mercury.js 12:28-49
 @ ./src/index.js

from what i'm seeing here, it looks sort of like i might need to run some more project specific build steps? the package.json has some build scripts that i tried to run, like rollup, but then i got

ENOENT: no such file or directory, lstat '/Users/alexchoi/hnfd/js/node_modules/@postlight/mercury-parser/rollup.config.js'

which i see in the repo but is missing in the tgz yarn downloaded

resolved "https://registry.yarnpkg.com/@postlight/mercury-parser/-/mercury-parser-2.0.0.tgz#1f5f123a6ed4df58731f96f2cf6dd97336f689e0"

anyhow, i'd love to just get the

import Mercury from '@postlight/mercury-parser';

Mercury.parse(url).then(result => console.log(result));

example going - maybe you could help me out.

ultimately i am trying to get this built out into some javascript i can put into an ios app either

  • in a jscontext to call from swift (executed in a corejs engine)
    or
  • shoved into a webview ( executed in a browser)

thanks!

Steps to Reproduce

try to do this

yarn add @postlight/mercury-parser

import Mercury from '@postlight/mercury-parser';

Mercury.parse(url).then(result => console.log(result));

Detailed Description

Possible Solution

Parsing a website that has its content inside a cleaned element

I'm having an issue with getting this website parsed: https://www.mckinsey.com/industries/financial-services/our-insights/blockchains-occam-problem

After running mercury everything that is getting extracted is the cookie warning. I think the problem is that they decided to place the article content inside a <form> element...

For example the xpath expression to get to the headline is //*[@id="form1"]/div[3]/main/section[1]/div[3]/div/div[1]/h1. The actual article content is similarly nested inside this form.

It seems that this <form> element gets filtered out during a cleanup step. This filtering apparently happens here: https://github.com/postlight/mercury-parser/blob/262dda94b30cf94c180972cbcd7671758c65c9a4/src/resource/utils/dom/constants.js#L6

I started generating a custom extractor but I'm stuck because the actual content is not included in the downloaded html in fixtures/... either.

It seems in the extractor config it's already too late because once the extraction happens the markup is already cleaned up.

Is there a way to tell mercury to not clean up specific tags from a given extractor? Or do you have any ideas on how to approach this?

An in-range update of @octokit/rest is breaking the build 🚨

The devDependency @octokit/rest was updated from 16.23.3 to 16.23.4.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

@octokit/rest is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: test-node: Your tests failed on CircleCI (Details).
  • ci/circleci: test-web: Your tests passed on CircleCI! (Details).

Release Notes for v16.23.4

16.23.4 (2019-04-09)

Bug Fixes

  • package: update @octokit/request to version 3.0.0 (a09060d)
Commits

The new version differs by 2 commits.

  • bb55d90 chore(package): update lockfile package-lock.json
  • a09060d fix(package): update @octokit/request to version 3.0.0

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Web API Support discontinued ?

Has Mercury Reader discontinued support for the Web API ? If so , can you please tell me how can I run Mercury Reader on iOS ?

Stopped working

Hello,

I am a regular on Windows 10, it simply stopped working...

Any ideas on how to fix it?

Thanks,

Phil.

Duplicate meta entries --> fail

I'm having trouble parsing attributes for this page:

https://cosmonaut.blog/2019/02/20/no-bernie/

This might very much be my non-existent JS/CSS skills, so feel free to close and sorry for the disturbance.
The problem I have is with the lead_image_url selectors. The "default" (for most extractors) for this one would be [['meta[property="og:image"]', 'content']] or [['meta[name="twitter:image"]','value']], but both of those, when executed, return two near-identical entries, causing the whole thing to fall apart (because if I read the tutorial correctly, they'd need to return exactly one item).

The other idea would be to query the image directly from the page, using [['img.wp-post-image', 'src']], but this is an image with srcset and so the result ends up being a concatenation with multiple URLs (each of which would be acceptable to me) which I cannot further process in the simple selector: [...] setting.

Am I missing something here?

  • Platform: Linux my-desktop 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  • Mercury Parser Version: master (2a3ade706dc445ecb09cce552b087c850d2cb817)

Error: ETIMEDOUT

  • Platform: Ubuntu 16.04 LTS
  • Mercury Parser Version: 2.0.0
  • Node Version (if a Node bug): 10.15.3
  • Browser Version (if a browser bug):

Expected Behavior

Current Behavior

Sometime returns this error:

{ Error: ETIMEDOUT
at Timeout._onTimeout (/home/mitrandir/node_modules/request/request.js:760:15)
at ontimeout (timers.js:436:11)
at tryOnTimeout (timers.js:300:5)
at listOnTimeout (timers.js:263:5)
at Timer.processTimers (timers.js:223:10) code: 'ETIMEDOUT', connect: false }

Steps to Reproduce

Detailed Description

Possible Solution

I think this is due to operations time.

Error when trying to build react app with Mercury installed

  • Platform: Darwin Johns-MacBook-Pro.local 18.2.0 Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64 x86_64
  • Mercury Parser Version: ^2.0.0
  • Node Version (if a Node bug): v11.1.0
  • Browser Version (if a browser bug): N/A

Expected Behavior

Build should succeed, with mercury-reader successfully accessing cheerio package.

Current Behavior

Both npm run build and yarn build output the following error

Failed to compile.

./node_modules/@postlight/mercury-parser/dist/mercury.js
Cannot find module: 'cheerio'. Make sure this package is installed.

You can install this package by running: yarn add cheerio.

error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Steps to Reproduce

Open terminal, use the following:

  1. npx create-react-app new-chrome-extension
  2. cd new-chrome-extension
  3. Adjust manifest.json to represent a chrome-extension
    (e.g.
    {
    "manifest_version": 2,
    "name": "New Chrome Extension",
    "author": "ToBeCreated",
    "version": "1.0.1",
    "description": "Example manifest for recreating error.",
    "incognito": "split",
    "icons": {
    "16": "16.png",
    "48": "48.png",
    "128": "128.png"
    },
    "permissions": ["storage"]
    }
    3.1. OPTIONAL: Replace "build" in package JSON with "INLINE_RUNTIME_CHUNK=false react-scripts build"
  4. Install mercury-reader with npm install @postlight/mercury-parser
  5. npm install
  6. npm run build
  7. Error produced

Detailed Description

Prevents use of mercury-parser inside react-based chrome extension

Possible Solution

Russian Webpage parsing support.

  • Platform: Mac
  • Mercury Parser Version: Web based api (at moment)
  • Node Version (if a Node bug):
  • Browser Version (if a browser bug):

Expected Behavior

Proper encoding for Russian language.

Current Behavior

When parsing this link https://www.finam.ru/analysis/newsitem/putin-nagradil-grefa-ordenom-20190208-203615/?utm_source=rss&utm_medium=new_compaigns&utm_campaign=news_to_finamb it doesn't give proper encode output and hence format is messed up when rendering in html.

Steps to Reproduce

  1. Parse link https://www.finam.ru/analysis/newsitem/putin-nagradil-grefa-ordenom-20190208-203615/?utm_source=rss&utm_medium=new_compaigns&utm_campaign=news_to_finamb
  2. Check the content output
  3. Try to render that content with Cyrillic font
  4. You will see instead of proper format it shows bunch of '�'

Detailed Description

I use this API for parsing articles in my reader app. And there are some Russian news feed try to use and are not able to get proper format output.

Possible Solution

Empty content is returned when parsing `https://www.ncbi.nlm.nih.gov/pubmed/28978556?dopt=Abstract`

  • Platform: Mac
  • Mercury Parser Version: 2.00

Empty content is returned when parsing https://www.ncbi.nlm.nih.gov/pubmed/28978556

Current Behavior

When I run mercury-parser cli as following:
$ mercury-parser https://www.ncbi.nlm.nih.gov/pubmed/28978556

This is what I receive:

{
  "title": "Genomic Heterogeneity as a Barrier to Precision Medicine in Gastroesophageal Adenocarcinoma. - PubMed - NCBI",
  "author": null,
  "date_published": null,
  "dek": null,
  "lead_image_url": null,
  "content": "<div class=\"grid\"> </div>",
  "next_page_url": null,
  "url": "https://www.ncbi.nlm.nih.gov/pubmed/28978556",
  "domain": "www.ncbi.nlm.nih.gov",
  "excerpt": "",
  "word_count": 1,
  "direction": "ltr",
  "total_pages": 1,
  "rendered_pages": 1
}

An in-range update of rollup is breaking the build 🚨

The devDependency rollup was updated from 1.9.2 to 1.9.3.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

rollup is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: test-node: Your tests failed on CircleCI (Details).
  • ci/circleci: test-web: Your tests failed on CircleCI (Details).

Release Notes for v1.9.3

2019-04-10

Bug Fixes

  • Simplify return expressions that are evaluated before the surrounding function is bound (#2803)

Pull Requests

  • #2803: Handle out-of-order binding of identifiers to improve tree-shaking (@lukastaegert)
Commits

The new version differs by 3 commits.

  • 516a06d 1.9.3
  • a5526ea Update changelog
  • c3d73ff Handle out-of-order binding of identifiers to improve tree-shaking (#2803)

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

decoding

  • Platform:
  • Mercury Parser Version:
  • Node Version (if a Node bug):
  • Browser Version (if a browser bug):

Expected Behavior

Current Behavior

Steps to Reproduce

Detailed Description

Possible Solution

Feature proposal - parsing metadata

I have a feature proposal:

Being able to extract metadata from websites. The problem is to define a schema for various types of data. In example from imdb you can extract movie release date, actors, director, even movie trailers links etc. From youtube you can extract duration, votes (likes and dislikes), info about soundtrack used in the movie (artist and title). From game reviews website you can extract rating, PEGI rating, screenshots.

Do you like the idea or do you want to focus on just making websites clear to read?

bash: mercury-parser: command not found

Platform: Linux s2e24de3f 2.6.32-042stab134.3 #1 SMP Sun Oct 14 12:26:01 MSK 2018 x86_64 GNU/Linux
Node Version: v10.15.1

I installed yarn and npm 10 versions, then executed the command:

yarn global add @ postlight / mercury-parser
root @ s2e24de3f: ~ / mercury-parser # yarn global add @ postlight / mercury-parser
yarn global v1.13.0
[1/4] Resolving packages ...
[2/4] Fetching packages ...
[3/4] Linking dependencies ...
[4/4] Building fresh packages ...
success Installed "@ postlight / mercury-parser @ 2.0.0" with binaries:
       - mercury-parser
Done in 2.62s.

And then:

mercury-parser https://postlight.com/trackchanges/mercury-goes-open-source
The message appears:
-bash: mercury-parser: command not found

Command-line `mercury-parser` tool fails with UnhandledPromiseRejectionWarning on timeout with https://www.washingtonpost.com/education/2019/02/06/woman-thought-she-had-ghost-her-closet-she-found-man-wearing-her-clothes

  • Platform: Darwin pefbook.local 18.2.0 Darwin Kernel Version 18.2.0: Mon Nov 12 20:24:46 PST 2018; root:xnu-4903.231.4~2/RELEASE_X86_64 x86_64
  • Mercury Parser Version: [installed five minutes ago]
  • Node Version (if a Node bug): 11.6.0
  • Browser Version (if a browser bug):

Expected Behavior

bash> mercury-parser https://www.washingtonpost.com/education/2019/02/06/woman-thought-she-had-ghost-her-closet-she-found-man-wearing-her-clothes

[parsed output]

Current Behavior

bash> mercury-parser https://www.washingtonpost.com/education/2019/02/06/woman-thought-she-had-ghost-her-closet-she-found-man-wearing-her-clothes

(node:74795) UnhandledPromiseRejectionWarning: Error: ETIMEDOUT
at Timeout._onTimeout (/Users/ford/.config/yarn/global/node_modules/request/request.js:760:15)
at listOnTimeout (timers.js:324:15)
at processTimers (timers.js:268:5)
(node:74795) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
(node:74795) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Steps to Reproduce

See above; just bails.

Detailed Description

Possible Solution

Worth catching timeouts explicitly and delivering an error message instead of returning promise chain.

Cannot parse a javascript-rendered page

  • Platform: Linux 66784da07322 4.9.125-linuxkit #1 SMP Fri Sep 7 08:20:28 UTC 2018 x86_64 Linux (node-alpine docker image)
  • Mercury Parser Version: @postlight/mercury-parser@^2.0.0
  • Node Version (if a Node bug): v10.15.3

Expected Behavior

Content should be parsed

Current Behavior

Content is returned blank

{ title:
   'News Feature: Deadly deficiency at the heart of an environmental mystery',
  author: null,
  date_published: '2018-10-16T00:00:00.000Z',
  dek: null,
  lead_image_url:
   'https://www.pnas.org/sites/default/files/highwire/pnas/115/42.cover-source.jpg',
  content: '',
  next_page_url: null,
  url: 'https://www.pnas.org/content/115/42/10532',
  domain: 'www.pnas.org',
  excerpt:
   'Researchers are puzzling over a widespread vitamin B shortage that appears to be killing wildlife . During spring and summer, busy colonies of a duck called the common eider ( Somateria mollissima )&hellip;',
  word_count: 1,
  direction: 'ltr',
  total_pages: 1,
  rendered_pages: 1 }

Steps to Reproduce

Attempt to parse https://www.pnas.org/content/115/42/10532

Possible Solution

Honestly not sure if there is one besides rendering the page in phantom (or similar), and pulling from that

Content not fully displaying for hablarcondios.org

The site http://www.hablarcondios.org/pt/meditacaodiaria.aspx is not being parsed correctly and may need a custom parser.

curl -i -X GET \
   -H "Content-Type:application/json" \
   -H "x-api-key:xxxxxxxxx" \
 'https://mercury.postlight.com/parser?url=http://www.hablarcondios.org/pt/meditacaodiaria.aspx'

Returns

HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Content-Length: 953
Content-Type: application/json
Date: Tue, 22 Jan 2019 09:21:50 GMT
Via: 1.1 2afd697fc5d0058ea30d6c4b939e714d.cloudfront.net (CloudFront)
X-Amz-Apigw-Id: T5gLSEd5IAMFvKA=
X-Amz-Cf-Id: THzwiih-XtwXqX07rvIZKM-y6ITbVSKE67O0x0eVC9RbkXyu3mJt_g==
X-Amzn-Requestid: 264eb18c-1e27-11e9-a684-0974c98fb9ae
X-Amzn-Trace-Id: Root=1-5c46e0ae-90d042bb04f12751ad47257f;Sampled=0
X-Cache: Miss from cloudfront
Age: 0
Connection: keep-alive
Server: Netlify
X-NF-Request-ID: 2045b1b0-69b6-46b9-835a-716abf9101e4-7155

{"title":"Hablar con Dios, Francisco Fernández-Carvajal","author":null,"date_published":null,"dek":null,"lead_image_url":null,"content":"<div><p class=\"cookie-contenedor\" id=\"cookie\"> <span>Esta web utiliza cookies propias y de terceros para analizar su navegaci&#xF3;n y ofrecerle un servicio m&#xE1;s personalizado y publicidad acorde a sus intereses. Continuar navegando implica la aceptaci&#xF3;n de nuestra <a class=\"cookie-enlace\" href=\"http://www.hablarcondios.org/pt/cookies.aspx\">Pol&#xED;tica de Cookies</a> </span> </p></div>","next_page_url":null,"url":"http://www.hablarcondios.org/pt/meditacaodiaria.aspx","domain":"www.hablarcondios.org","excerpt":"Esta web utiliza cookies propias y de terceros para analizar su navegación y ofrecerle un servicio más personalizado y publicidad acorde a sus intereses. Continuar navegando implica la aceptación de&hellip;","word_count":34,"direction":"ltr","total_pages":1,"rendered_pages":1}

merc_before

merc_after

Feature request: might Mercury reader work together with Vimium C?

Some users of Vimium (an extension to use keyboard to control webpages and Chrome actions) has requested supports for the Mercury reader's page: philc/vimium#3221, philc/vimium#3285 . But, because Mercury reader creates iframes with a "src" of ”chrome-extension://...", Vimium can not work on your iframes in theory.

However I've made a customized Vimium (https://github.com/gdh1995/vimium-c) and it supports injection into other extensions, if only the target extension declares a CSP allowing Vimium C and adds <script src="[Vimium C injector]"> to HTML files.

Therefore, I wonder if your extension (Mercury reader) might add an option to allow users to enable the Vimium C's injection.

Besides, if not, could I publish a customized version of Mercury reader to support injection of Vimium C (I'll follow all the licenses) ?

Thanks a lot for this helpful project!

Firefox Port

  • Platform:
  • Mercury Parser Version:
  • Node Version (if a Node bug):
  • Browser Version (if a browser bug):

Expected Behavior

mercury reader appears, I can install.

Current Behavior

mercury reader is not found.

Steps to Reproduce

  1. Go to http://addons.mozilla.org/
  2. Search for mercury reader

Detailed Description

Possible Solution

Feature proposal - support Microdata

I tested mercury-parser against my website out of curiosity and found out that it doesn't currently support extracting at least author and datePublished from Microdata. I believe this feature could improve this tool's reach and it shouldn't be hard to provide an initial support, since extractors already do rely on selectors to extract metadata:

I say initial support because, for example, a page could have an article and multiple comments, and those could also include metadata. In that case, multiple authors and publish dates would be found and an heuristic would be needed.

It seems to me that the current heuristic for selectors is to use the first matched element (correct me if I'm wrong). If so, this approach seems fine to work fine. If a stricter version is desired, there are libraries that extract Microdata, like microdata-node, that could be used to query for the main content's author and publish date, among other information.

If this feature seems desirable for the project, I would like to work on a PR.

Feature request: Support pastebin sites

  • Platform:
  • Mercury Parser Version:
  • Node Version (if a Node bug):
  • Browser Version (if a browser bug):

Expected Behavior

Current Behavior

Steps to Reproduce

Pastebin sites like e.g. hastebin.com, del.dog or pastebin.com currently produce pretty broken results.

Examples:
https://hastebin.com/about.md
image
https://del.dog/about.md
image
https://pastebin.com/pipKY6pX
image

Detailed Description

I am the creator of dogbin and would obviously love if it would beautifully show up here. This does however seem to generally apply to formatted code blocks and pastebin.com seems to be broken in other ways as well.

Possible Solution

TypeError: $ is not a function

  • Platform:
    Darwin Mac-Pro.local 18.2.0 Darwin Kernel Version 18.2.0: Thu Dec 20 20:46:53 PST 2018; root:xnu-4903.241.1~1/RELEASE_X86_64 x86_64

    Linux rss 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 x86_64 x86_64 GNU/Linux

  • Mercury Parser Version:
    Latest. Installed with yarn global add @postlight/mercury-parser

  • Node Version (if a Node bug):
    v11.10.0 (I don't think this is a node bug)

    Also shows in serverless

  • Browser Version (if a browser bug):
    n.a.

Expected Behavior

A parsed page and not an error

Current Behavior

$ mercury-parser https://tweakers.net/nieuws/149154/nederlandse-supercomputerbouwer-clustervision-is-failliet.html

Mercury Parser encountered a problem trying to parse that resource.

TypeError: $ is not a function
    at /Users/fred/.config/yarn/global/node_modules/@postlight/mercury-parser/dist/mercury.js:6079:12
    at Array.find (<anonymous>)
    at detectByHtml (/Users/fred/.config/yarn/global/node_modules/@postlight/mercury-parser/dist/mercury.js:6078:46)
    at getExtractor (/Users/fred/.config/yarn/global/node_modules/@postlight/mercury-parser/dist/mercury.js:6090:60)
    at Object._callee$ (/Users/fred/.config/yarn/global/node_modules/@postlight/mercury-parser/dist/mercury.js:6458:27)
    at tryCatch (/Users/fred/.config/yarn/global/node_modules/regenerator-runtime/runtime.js:62:40)
    at Generator.invoke [as _invoke] (/Users/fred/.config/yarn/global/node_modules/regenerator-runtime/runtime.js:288:22)
    at Generator.prototype.(anonymous function) [as next] (/Users/fred/.config/yarn/global/node_modules/regenerator-runtime/runtime.js:114:21)
    at asyncGeneratorStep (/Users/fred/.config/yarn/global/node_modules/@babel/runtime-corejs2/helpers/asyncToGenerator.js:5:24)
    at _next (/Users/fred/.config/yarn/global/node_modules/@babel/runtime-corejs2/helpers/asyncToGenerator.js:27:9)

If you believe this was an error, please file an issue at:

    https://github.com/postlight/mercury-parser/issues/new

Steps to Reproduce

mercury-parser https://tweakers.net/nieuws/149154/nederlandse-supercomputerbouwer-clustervision-is-failliet.html

Also in API

Not certain if this is helpful or not. This problem also happens when using the API variant of mercury. As far as I know pages from this site DID work for the online version.

content seemingly parsed - render of parsed content broken

  • Platform: Windows 10, 64-bit

  • Mercury Parser Version: whatever the Mercury Reader Chrome plug-in is using today

  • Node Version (if a Node bug): n/a

  • Browser Version (if a browser bug): Google Chrome Version 72.0.3626.121 (Official Build) (64-bit)

  • Stack from Chrome dev tools
    redesign
    tinypass.min.js:1 TP: Invalid containerSelector
    (unknown) ANB
    (unknown) --[executed]--> MPS Head Additions (1)
    (unknown) --[executed]--> MPS Header Additions (2)
    (unknown) Bento Feed Desktop & Tablet
    2
    pubads_impl_319.js:1 updateCorrelator has been deprecated. Please see the Google Ad Manager help page on "Pageviews in GPT" for more information: https://support.google.com/admanager/answer/183281?hl=en
    loader.js:405 Error while loading taboola feed: Cannot read property 'placement' of undefined
    impl.350-59-RELEASE.js:3 Exit TRCRBox.loadScriptCallback(retry=0): no items in response - thumbnails-i
    2
    loader.js:405 Error while loading taboola feed: Cannot read property 'placement' of undefined
    (unknown) --[executed]--> MPS Footer Additions (3)
    VM235:39 >>> Sailthru tracked with URL PAGE
    VM237:3 Vilynx Recommendation plugin loaded
    tinypass.min.js:1 TP: Invalid containerSelector
    (unknown) ANB
    /.well-known/pubvendors.json:1 Failed to load resource: the server responded with a status of 404 ()
    tinypass.min.js:1 TP: Invalid containerSelector
    (unknown) [Deprecation] 'window.webkitStorageInfo' is deprecated. Please use 'navigator.webkitTemporaryStorage' or 'navigator.webkitPersistentStorage' instead.
    0.hola/:1 Failed to load resource: net::ERR_NAME_NOT_RESOLVED
    (unknown) ###
    socialize.js?apikey=…dQ7gYFCESWKMM1sP:24 proxy request timeout
    VM108:17 Uncaught TypeError: Cannot read property 'dataset' of null
    at Object.mps.refreshDetect (:17:33)
    at Object.mps.refreshAds (:181:22)
    at Object.mps.responsiveApply (:105:11)
    at main-31bb0c2….js:12
    at Object.refreshValue (main-31bb0c2….js:12)
    at main-31bb0c2….js:12
    at main-31bb0c2….js:12
    at Array.forEach ()
    at Object.trigger (main-31bb0c2….js:12)
    at main-31bb0c2….js:12

  • Original URL: https://www.cnbc.com/2019/03/18/heres-how-cybersecurity-vendors-drive-the-hacking-news-cycle.html

Expected Behavior

Parsed content should be displayable

Current Behavior

image

I can see what looks like transformed content in the resulting page source after Mercury Reader is applied. This content is not being rendered in a way that's displayable.

Steps to Reproduce

Install the Mercury Reader plug-in in Chrome
Visit this article: https://www.cnbc.com/2019/03/18/heres-how-cybersecurity-vendors-drive-the-hacking-news-cycle.html
Open the article in Mercury Reader

Detailed Description

https://www.cnbc.com/2019/03/18/heres-how-cybersecurity-vendors-drive-the-hacking-news-cycle.html

Possible Solution

Can't launch a parser

Platform: MacOS HeighSierra 10.13.6
Mercury-Parser Version: "@postlight/mercury-parser": "^2.0.0"
Node Version: v11.11.0

Hello.

I have some problems to start a script in browser. When it starts, console shows:
Unexpected identifier Mercury. import call expects exactly one argument..

Is this Yarn problem or maybe I have to fix some parts of a code?

Error when trying to lunch application that has mercury installed

Platform: 64-bit Windows 10
Mercury-Parser Version: "^2.0.0"
Node Version: v10.15.3

I have installed mercury-parser through npm in an angular application. When running the application, it fails when doing the npm start for the following:
ERROR in ./node_modules/@postlight/mercury-parser/dist/mercury.js
Module not found: Error: Can't resolve 'cheerio' in 'C:\Projects\furry-garbanzo\link-manager\node_modules@postlight\mercury-parser\dist'
ERROR in ./node_modules/@postlight/mercury-parser/dist/mercury.js
Module not found: Error: Can't resolve 'iconv-lite' in 'C:\Projects\furry-garbanzo\link-manager\node_modules@postlight\mercury-parser\dist'

I have tried: installing cheerio & iconv-lite both in the application and globally, but it fails still. Below is the log:

0 info it worked if it ends with ok
1 verbose cli [ 'C:\\Program Files\\nodejs\\node.exe',
1 verbose cli   'C:\\Program Files\\nodejs\\node_modules\\npm\\bin\\npm-cli.js',
1 verbose cli   'start' ]
2 info using [email protected]
3 info using [email protected]
4 verbose run-script [ 'prestart', 'start', 'poststart' ]
5 info lifecycle [email protected]~prestart: [email protected]
6 info lifecycle [email protected]~start: [email protected]
7 verbose lifecycle [email protected]~start: unsafe-perm in lifecycle true
8 verbose lifecycle [email protected]~start: PATH: C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\node-gyp-bin;C:\Projects\furry-garbanzo\link-manager\node_modules\.bin;C:\Program Files\Git\mingw64\bin;C:\Program Files\Git\usr\bin;C:\Users\mhaspert\bin;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0;C:\Windows\System32\OpenSSH;C:\Program Files (x86)\AMD\ATI.ACE\Core-Static;C:\Program Files\dotnet;C:\Program Files\Microsoft SQL Server\130\Tools\Binn;C:\Program Files (x86)\Microsoft SQL Server\Client SDK\ODBC\130\Tools\Binn;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn;C:\Program Files (x86)\Microsoft SQL Server\140\DTS\Binn;C:\Program Files (x86)\Microsoft SQL Server\140\Tools\Binn\ManagementStudio;C:\Program Files\Microsoft VS Code\bin;C:\Program Files (x86)\ATI Technologies\ATI.ACE\Core-Static;C:\Program Files\Git\cmd;C:\Program Files\nodejs;C:\Users\mhaspert\AppData\Local\Microsoft\WindowsApps;C:\Users\mhaspert\AppData\Roaming\npm
9 verbose lifecycle [email protected]~start: CWD: C:\Projects\furry-garbanzo\link-manager
10 silly lifecycle [email protected]~start: Args: [ '/d /s /c', 'ng build && node ./server/app.js' ]
11 silly lifecycle [email protected]~start: Returned: code: 1  signal: null
12 info lifecycle [email protected]~start: Failed to exec start script
13 verbose stack Error: [email protected] start: `ng build && node ./server/app.js`
13 verbose stack Exit status 1
13 verbose stack     at EventEmitter.<anonymous> (C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\index.js:301:16)
13 verbose stack     at EventEmitter.emit (events.js:189:13)
13 verbose stack     at ChildProcess.<anonymous> (C:\Program Files\nodejs\node_modules\npm\node_modules\npm-lifecycle\lib\spawn.js:55:14)
13 verbose stack     at ChildProcess.emit (events.js:189:13)
13 verbose stack     at maybeClose (internal/child_process.js:970:16)
13 verbose stack     at Process.ChildProcess._handle.onexit (internal/child_process.js:259:5)
14 verbose pkgid [email protected]
15 verbose cwd C:\Projects\furry-garbanzo\link-manager
16 verbose Windows_NT 10.0.17134
17 verbose argv "C:\\Program Files\\nodejs\\node.exe" "C:\\Program Files\\nodejs\\node_modules\\npm\\bin\\npm-cli.js" "start"
18 verbose node v10.15.3
19 verbose npm  v6.4.1
20 error code ELIFECYCLE
21 error errno 1
22 error [email protected] start: `ng build && node ./server/app.js`
22 error Exit status 1
23 error Failed at the [email protected] start script.
23 error This is probably not a problem with npm. There is likely additional logging output above.
24 verbose exit [ 1, true ]

Expected Behavior

The application runs properly and the dependencies of mercury-parser work

Thank you in advance

Can't install

npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] install: node install.js
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] install script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! /home/mitrandir/.npm/_logs/2019-03-12T09_10_45_527Z-debug.log

I am trying install this module. Please help.

Impossible to install

  • Platform: Debian GNU/Linux 9
  • Node Version (if a Node bug): v10.11.0

Expected Behavior

Not able to complete installation.

Current Behavior

If I run sudo npm -g install @postlight/mercury-parser, I have

npm ERR! code 128
npm ERR! Command failed: /usr/bin/git clone --depth=1 -q -b feat-add-headers-to-response git://github.com/postlight/browser-request.git /root/.npm/_cacache/tmp/git-clone-56594296
npm ERR! fatal: could not create leading directories of '/root/.npm/_cacache/tmp/git-clone-56594296': Permission denied
npm ERR!

npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2019-02-19T12_32_47_619Z-debug.log

How to solve?

Thank you

How do you run it on AWS Lambda?

As far as I can see you seems to use AWS Lambda to run the parser.
But I can't see any handlers, serverless or SAM conf.

So, how do you deploy it to AWS? Are you using some extra configuration, like an other repository which encapsulate this one?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.