Code Monkey home page Code Monkey logo

pptraas.com's Introduction

pptraas.com's People

Contributors

c0b41 avatar chennima avatar danielruf avatar ebidel avatar gokaygurcan avatar inf3cti0n95 avatar mathiasbynens avatar paulkinlan avatar wei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pptraas.com's Issues

Launch new tab instead of launch new browser

Hi,

Thanks for the nice repo, it is very helpful. Just one issue:

app.all('*', async (request, response, next) => {
  response.locals.browser = await puppeteer.launch({
    dumpio: true,
    // headless: false,
    // executablePath: 'google-chrome',
    args: ['--no-sandbox', '--disable-setuid-sandbox'], // , '--disable-dev-shm-usage']
  });

  next(); // pass control on to routes.
});

This code shows that we create a new browser every time for a new request, this requires more memory and more load time than create a new tab for the same browser.

I wonder what's the thought behind this?

Thanks,
Vincent

feat(Rate limiting)

At some point, someone, somewhere will rinse this so we need to come up with a decent solution.

Move to app engine flex

Scale + speed + uptime!

Here's what I'm using in other projects.
Dockerfile

FROM node:9.5-slim

MAINTAINER Eric Bidelman <ebidel@>

# See https://crbug.com/795759
#RUN apt-get update && apt-get install -yq libgconf-2-4

# Install latest chrome dev package and fonts to support major charsets (Chinese, Japanese, Arabic, Hebrew, Thai and a few others)
# Note: this installs the necessary libs to make the bundled version of Chromium that Puppeteer
# installs, work.
RUN apt-get update && apt-get install -y wget --no-install-recommends \
    && wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - \
    && sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' \
    && apt-get update \
    && apt-get install -y google-chrome-unstable fonts-ipafont-gothic fonts-wqy-zenhei fonts-thai-tlwg fonts-kacst ttf-freefont \
      --no-install-recommends \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get purge --auto-remove -y curl \
    && rm -rf /src/*.deb

# It's a good idea to use dumb-init to help prevent zombie chrome processes.
ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.0/dumb-init_1.2.0_amd64 /usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init

COPY . /app/

WORKDIR /app

COPY package.json .
RUN yarn --production

COPY server.mjs .
# RUN chmod +x server.mjs

# Add user so we don't need --no-sandbox.
RUN groupadd -r pptruser && useradd -r -g pptruser -G audio,video pptruser \
    && mkdir -p /home/pptruser/Downloads \
    && chown -R pptruser:pptruser /home/pptruser \
    && chown -R pptruser:pptruser /app

# # Run everything after as non-privileged user.
USER pptruser

EXPOSE 8080

ENTRYPOINT ["dumb-init", "--"]
CMD ["npm", "run", "start"]

app.yaml

runtime: custom
env: flex

automatic_scaling:
  min_num_instances: 1
  max_num_instances: 4

resources:
  cpu: 4
  memory_gb: 16 # cpu * [0.9 - 6.5] - 0.4
  disk_size_gb: 100

skip_files:
- ^(.*/)?tests
- ^(.*/)?.*\.md$

Add support for POSTing to /pdf

We have a page we want to render with a POST request due to the amount of data required for the page. Can pptraas support proxying a POST request?

Unable to deploy

It appears nodejs8 has been removed

ERROR: (gcloud.app.deploy) INVALID_ARGUMENT: Invalid runtime 'nodejs8' specified. Accepted runtimes are: [php, php55, python27, java, java7, java8, go111, go112, go113, java11, nodejs10, nodejs12, php72, php73, python37, python38, ruby25]

Error: Chromium revision is not downloaded in Docker image

Trying to switch from rendertron but it's failing when trying to execute with message that Chromeium isn't installed:

> [email protected] start /app
> node server.js

App is listening on port 8080
(node:19) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1):
AssertionError [ERR_ASSERTION]: Chromium revision is not downloaded. Run "npm install" or "yarn install"

Steps used:

  1. Cloned repo
git clone https://github.com/GoogleChromeLabs/pptraas.com.git
cd pptraas.com
npm install
  1. Built docker image
docker build -t pptraas . --no-cache=true
  1. Run docker
docker run -it -p 8080:8080 pptraas

Version details:

$ node --version && npm --version && docker --version
v10.1.0
6.0.0
Docker version 18.05.0-ce-rc1, build 33f00ce

Handle Errors more effectively

We should handle errors and promise rejections more effectively - in some cases, we should outright crash the instance.

EventEmitter memory leak detected. 11 exit listeners added

I've hit issue puppeteer/puppeteer#594 using a modified endpoint from this repo. Did I do something wrong?

I added an endpoint to simply return the html content:

app.get("/html", async (request, response) => {
  const url = request.query.url;
  if (!url) {
    return response.status(400).send("Please provide a URL. Example: ?url=https://example.com");
  }

  const browser = response.locals.browser;

  const page = await browser.newPage();
  const res = await page.goto(url, { waitUntil: "networkidle0" });
  const content = await page.content();

  response.status(res.status()).send(content);
  await browser.close();
});

Any idea how to address the EventEmitter memory leak detected. 11 exit listeners added warning, which led to the server being unresponsive eventually?

Question about using it with aws serverless

Does anyone know a good strategy to use this with aws serverless architecture? Something like running lambda@edge for each connection, checking if its a crawler and redirecting it to pptraas.com? I never did such a thing and would really like to know how to do it and how much extra cost the lambda will be.

List examples on /

Right now it errors out. We should list examples from the readme there

Is there a reason that the service instantiates a browser each request?

I'm building a service modeled after pptraas. We're doing some performance/load testing, and it seems that lack of caching is a big issue. Is there a reason that y'all don't re-use the browser? Wondering if there are any gotchas / threading issues with re-using the browser. Certainly seems like a big performance advantage to do so!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.