Code Monkey home page Code Monkey logo

parser-api's Introduction

Postlight Parser API

Greenkeeper badge

This repo provides a drop-in replacement for the Postlight Parser API. In fact, this AWS Lambda-based API for running the Postlight Parser is the same code and serverless infrastructure that powered the Postlight Parser API.

Installation

# If you don't already have the Postlight Parser api installed, do that
git clone https://github.com/postlight/parser-api.git

# Install dependencies
yarn install

API Gateway-like local dev server

To spin up a local dev server that will more closely match the API Gateway endpoint/experience:

yarn serve

Deploy

Assuming you've already set up your default AWS credentials (or have set a different AWS profile via the profile field), simply run:

yarn deploy

yarn deploy will deploy to "dev" environment. You can deploy to stage or prod with:

yarn deploy:stage

# -- or --

yarn deploy:prod

After you've deployed, the output of the deploy script will give you the API endpoint for your deployed function(s), so you should be able to test the deployed API via that URL.

License

Licensed under either of the below, at your preference:

Contribution

Unless it is explicitly stated otherwise, any contribution intentionally submitted for inclusion in the work, as defined in the Apache-2.0 license, shall be dual licensed as above without any additional terms or conditions.

parser-api's People

Contributors

adampash avatar greenkeeper[bot] avatar johnholdun avatar wajeehzantout avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parser-api's Issues

An in-range update of eslint-plugin-react is breaking the build 🚨

The devDependency eslint-plugin-react was updated from 7.14.0 to 7.14.1.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

eslint-plugin-react is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: test: CircleCI is running your tests (Details).
  • ci/circleci: build: Your tests failed on CircleCI (Details).

Release Notes for v7.14.1

Fixed

  • Fix prop-types crash on multiple destructuring (#2319 @golopot)
Commits

The new version differs by 3 commits.

  • 62255af Update CHANGELOG and bump version
  • 655eb01 Merge pull request #2320 from golopot/issue-2319
  • 9639d82 [Fix] prop-types: fix crash on multiple destructuring

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

An in-range update of webpack is breaking the build 🚨

The devDependency webpack was updated from 4.35.3 to 4.36.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

webpack is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: test: Your tests failed on CircleCI (Details).
  • ci/circleci: build: Your tests failed on CircleCI (Details).

Release Notes for v4.36.0

Features

  • SourceMapDevToolPlugin append option now supports the default placeholders in addition to [url]
  • Arrays in resolve and parser options (Rule and Loader API) support backreferences with "..." when overriding options.
Commits

The new version differs by 42 commits.

  • 95d21bb 4.36.0
  • aa1216c Merge pull request #9422 from webpack/feature/dot-dot-dot-merge
  • b3ec775 improve merging of resolve and parsing options
  • 53a5ae2 Merge pull request #9419 from vankop/remove-valid-jsdoc-rule
  • ab75240 Merge pull request #9413 from webpack/dependabot/npm_and_yarn/ajv-6.10.2
  • 0bdabf4 Merge pull request #9418 from webpack/dependabot/npm_and_yarn/eslint-plugin-jsdoc-15.5.2
  • f207cdc remove valid jsdoc rule in favour of eslint-plugin-jsdoc
  • 31333a6 chore(deps-dev): bump eslint-plugin-jsdoc from 15.3.9 to 15.5.2
  • 036adf0 Merge pull request #9417 from webpack/dependabot/npm_and_yarn/eslint-plugin-jest-22.8.0
  • 37d4480 Merge pull request #9411 from webpack/dependabot/npm_and_yarn/simple-git-1.121.0
  • ce2a183 chore(deps-dev): bump eslint-plugin-jest from 22.7.2 to 22.8.0
  • 0beeb7e Merge pull request #9391 from vankop/create-hash-typescript
  • bf1a24a #9391 resolve super call discussion
  • bd7d95b #9391 resolve discussions, AbstractMethodError
  • 4190638 chore(deps): bump ajv from 6.10.1 to 6.10.2

There are 42 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Self-hosting?

Hi! I'm interested in self-hosting Postlight Parser API on my local machine. Would you be accepting a pull request that adds:

  • a simple web server (maybe express?) that serves the request; and
  • Dockerfile?

(You might consider introducing some GitHub actions to deploy the official Docker image, but that's another story.)

An in-range update of eslint-plugin-import is breaking the build 🚨

The devDependency eslint-plugin-import was updated from 2.16.0 to 2.17.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

eslint-plugin-import is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: test: Your tests failed on CircleCI (Details).
  • ci/circleci: build: Your tests passed on CircleCI! (Details).

Commits

The new version differs by 61 commits.

  • 0499050 bump to v2.17.0
  • f479635 [webpack] v0.11.1
  • 8a4226d Merge pull request #1320 from bradzacher/export-ts-namespaces
  • 988e12b fix(export): Support typescript namespaces
  • 70c3679 [docs] make rule names consistent
  • 6ab25ea [Tests] skip a TS test in eslint < 4
  • 405900e [Tests] fix tests from #1319
  • 2098797 [fix] export: false positives for typescript type + value export
  • 70a59fe [fix] Fix overwriting of dynamic import() CallExpression
  • e4850df [ExportMap] fix condition for checking if block comment
  • 918567d [fix] namespace: add check for null ExportMap
  • 2d21c4c Merge pull request #1297 from echenley/ech/fix-isBuiltIn-local-aliases
  • 0ff1c83 [dev deps] lock typescript to ~, since it doesn’t follow semver
  • 40bf40a [*] [deps] update resolve
  • 28dd614 Merge pull request #1304 from bradennapier/feature/typescript-export-type

There are 61 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Action required: Greenkeeper could not be activated 🚨

🚨 You need to enable Continuous Integration on Greenkeeper branches of this repository. 🚨

To enable Greenkeeper, you need to make sure that a commit status is reported on all branches. This is required by Greenkeeper because it uses your CI build statuses to figure out when to notify you about breaking changes.

Since we didn’t receive a CI status on the greenkeeper/initial branch, it’s possible that you don’t have CI set up yet. We recommend using Travis CI, but Greenkeeper will work with every other CI service as well.

If you have already set up a CI for this repository, you might need to check how it’s configured. Make sure it is set to run on all new branches. If you don’t want it to run on absolutely every branch, you can whitelist branches starting with greenkeeper/.

Once you have installed and configured CI on this repository correctly, you’ll need to re-trigger Greenkeeper’s initial pull request. To do this, please click the 'fix repo' button on account.greenkeeper.io.

Every accented characters are corrupted

  • Platform: AWS Lambda

Expected Behavior

When you POST a request with only the URL parameter. The response is UTF-8 friendly.
When I use the html parameter, response should be utf-8 friendly too.

The API should return a title like this : "Le démantèlement des réacteurs nucléaires, véritable filière industrielle"
And content like this :
... <p><strong>Dans les prochaines ann&#xE9;es, avec la transition &#xE9;nerg&#xE9;tique et le d&#xE9;mant&#xE8;lement ...

Current Behavior

Title returned : "Le d�mant�lement des r�acteurs nucl�aires, v�ritable fili�re industrielle"
Content returned:
...<p><strong>Dans les prochaines ann**&#xFFFD;**es, avec la transition &#xFFFD;nerg&#xFFFD;tique et le d&#xFFFD;mant&#xFFFD;lement ...

Steps to Reproduce

I just do a POST to the parse-html endpoint
{ "url": "https://www.europeanscientist.com/fr/energie/demantelement-reacteurs-nucleaires-dechets-pngmdr/", "html" : [copy_paste_of_html_code] }

Possible Solution

I tried to force header's request Content-type to utf-8 with application/json; charset=utf-8 but it doesn't change the result.
While running this request locally, I've got an Iconv-lite deprecation warning related to encoding
Iconv-lite warning: decode()-ing strings is deprecated. Refer to https://github.com/ashtuchkin/iconv-lite/wiki/Use-Buffers-when-decoding

Failure: tunneling socket could not be established, statusCode=407

Serverless: GET /parser (λ: mercuryParser)
Serverless: Failure: tunneling socket could not be established, statusCode=407
Error: tunneling socket could not be established, statusCode=407
at ClientRequest.onConnect (C:\Users\a662140\Desktop\Karan\mercury-parser-api\node_modules\request\node_modules\tunnel-agent\index.js:165:19)
at Object.onceWrapper (events.js:313:26)
at ClientRequest.emit (events.js:223:5)
at ClientRequest.EventEmitter.emit (domain.js:475:20)
at Socket.socketOnData (_http_client.js:490:11)
at Socket.emit (events.js:223:5)
at Socket.EventEmitter.emit (domain.js:475:20)
at addChunk (_stream_readable.js:309:12)
at readableAddChunk (_stream_readable.js:290:11)
at Socket.Readable.push (_stream_readable.js:224:10)
at TCP.onStreamRead (internal/stream_base_commons.js:181:23)

  • Platform:
  • Mercury Parser API Version:
  • Node Version:

Expected Behavior

Current Behavior

Steps to Reproduce

Detailed Description

Possible Solution

Might be helpful to link to sample/suggested AWS IAM permissions for deploy

Ran through an AWS lambda install this morning -- flawless, and worked great, right out of the box. Kudos for providing such a smooth path for transition for your users. Highly appreciated.

The one place I did get stuck for a while was setting the specific IAM/cloudformation permissions within the AWS console -- this took several tries to get right. It would be helpful to state (or link to) a sample of what the permissions required for AWS deploy are for this project.

<img ....> gets corrupted when using parse-html

  • Mercury Parser API Version:Latest
  • Node Version:8

Expected Behavior

The parser should not corrupt the <img> content.

Current Behavior

The <img> tag originally is

  <img src=\"https://cdn.example-domain.com/example1.jpg"/>

and after parsing

 <img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example1.jpg/%22/">

Steps to Reproduce

  1. Take the following HTML
<html>
<head>
<body>     
Main content
<br/>
<img src="https://cdn.example-domain.com/example1.jpg"/>
More content
<br/>
More Content to Simulate main content.
<img src="https://cdn.example-domain.com/example2.jpg"/>
</body>
</html>
  1. Call the api with the path /parse-html. The API takes a POST with a JSON object containing a URL and HTML. The HMTL is the HTML as provided in step 1 but is first converted to the following format:
<html>\\n<head>\\n<body>\\nMain content\\n<br/>\\n<img src=\"https://cdn.example-domain.com/example1.jpg\"/>\\nMore content\\n<br/>\\nMore Content to Simulate main content.\\n<img src=\"https://cdn.example-domain.com/example2.jpg\"/>\\n</body>\\n</html>\\n

and the URL value that is passed is https://www.example-domain.com

  1. The JSON result content being returned contains the main content including the images. The image values are however corrupted:
<img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example1.jpg/%22/">
<img src="https://www.example-domain.com/%22https://cdn.example-domain.com/example2.jpg/%22/">

Question/Comment

Am I using the API in a correct way? I could not find any documentation so this is a bit of reverse engineering.

The reason for not doing this directly, i.e. using the /parser?url=..... is that I am trying to work around a problem where a TypeError is returned. See. The page gives back a 202 which the parser cannot handle. I am now downloading the content and try to pass the HTML into the API as a workaround instead. Unfortunately it doesn't react as I expected it would.

Does not seem to support contentType

  • Platform: AWS Lambda
  • Mercury Parser API Version: 0.0.1
  • Node Version: v12.11.1

Expected Behavior

API endpoint should allow contentType as text or markdown

Current Behavior

It seems the contentType argument is not present in the code, as the code only takes url and html from the request -

https://github.com/postlight/mercury-parser-api/blob/b6a04af54b3d734e96aa72e487659c087ba09295/src/parse-html.js#L6

Steps to Reproduce

  1. Hosted on AWS Lambda as per instructions, can provide link if needed.
  2. I just tried to make a GET and POST calls, with the contentType set to text and that doesn't seem to work

Detailed Description

Possible Solution

Code should be updated to accept opts as per Mercury parse - https://github.com/postlight/mercury-parser/blob/b0e708aac6a4e7e10986448a132d60df57c45b00/src/mercury.js#L13

An in-range update of serverless is breaking the build 🚨

The devDependency serverless was updated from 1.49.0 to 1.50.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

serverless is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details
  • ci/circleci: build: CircleCI is running your tests (Details).
  • ci/circleci: test: Your tests failed on CircleCI (Details).

Release Notes for 1.50.0 (2019-08-14)

Meta

Commits

The new version differs by 123 commits.

  • 210d50c Merge pull request #6540 from serverless/release
  • b4813f1 Release v1.50.0
  • 6effab2 Bump dependencies
  • 312b4f5 Merge pull request #6537 from mthenw/fix-depeloy-individually
  • 896631d Refactor test
  • 6a84748 Merge pull request #6531 from serverless/apigw-logs-setup-fix
  • f0a8b8c Mock npm install in tests
  • 60f36e4 Clear invalid options
  • f2f3c00 Maintain original behavior and add a test
  • 4f43bfd Merge pull request #6534 from onebytegone/fix-remove-extra-lambda-policy-6262
  • ea121e7 Ensure to bail after first fail in CI
  • f781340 typo
  • fe6bab3 Fix prettier issues
  • 3ceb0aa Fix deploy command if package.individually set on a function-level
  • 56d96c4 Only add merged IAM policies for Lambda when they will be used (#6262)

There are 123 commits in total.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.