Code Monkey home page Code Monkey logo

Comments (31)

nurettin avatar nurettin commented on May 14, 2024 22

Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.

from pyppeteer.

byt3bl33d3r avatar byt3bl33d3r commented on May 14, 2024 3

This issue seems to be still a problem on some websites even after applying the proposed patches for the ping_timeout argument and/or downgrading to websockets 6.0.

from pyppeteer.

nurettin avatar nurettin commented on May 14, 2024 2

Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.

@nurettin how do you do this?

You edit your Pipfile or requirements.txt manually and replace websockets==7.0 with 6.0

But I recommend monkey patching for now see #160

from pyppeteer.

ingmferrer avatar ingmferrer commented on May 14, 2024 2

Bump this. It needs to be resolved in master and the solution is to downgrade websockets.

Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.

@nurettin how do you do this?

You edit your Pipfile or requirements.txt manually and replace websockets==7.0 with 6.0

But I recommend monkey patching for now see #160

from pyppeteer.

Mattwmaster58 avatar Mattwmaster58 commented on May 14, 2024 2

No idea what's wrong, I think it has something to do with the websockets dependancy. With 6.0, there are no issues, with 8.1, issues left and right

from pyppeteer.

hartym avatar hartym commented on May 14, 2024 1

Update:

Still strugling in tracing that but it looks like it happens because page.close() is awaited too soon after the page.goto(...) went timeout.

I reduce greatly the amount of errors (like, to about 0 if not 0, will confirm in a few hours) either if:

  • I slow down the machine that runs the script. Giving only 0.2 CPU instead of 4 to 8 like I was trying before, and I don't see the error happen anymore
  • I await an asyncio.sleep(1) before awaiting page.close().

I suspect there is a race condition between some callback that happens on goto timeout and page.close()

[edit] The "slowness fixes the bug" hypothesis is erroneous. There is something related to the websites opened that causes this. The reason why it's so "random" is that I use a message queue (rabbitmq) that will reschedule any cancelled task at same spot they were before. So for long time, no buuggy site, no error. If one buggy site comes to amqp head, then each time it kills a worker another worker will get it, and die too. Loop that. Funny, right?

from pyppeteer.

sjoerd222888 avatar sjoerd222888 commented on May 14, 2024 1

Still problems with newest websocket package. Does somebody has an idea what goes wrong? Would be nice to open an issue for the websocket package.

from pyppeteer.

miyakogi avatar miyakogi commented on May 14, 2024

Great thanks for detail information and suggestion.
I don't have good idea to fix this soon, but I will check page.goto's timeout sequence.

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

@miyakogi Thanks for the reply.

I can confirm for sure that it is not docker related, I just spent some time installing my code on a machine without containers, and the error happens exactly the same way.

Trying to create a minimal code file for reproduction (even if it's happening at random, it's 100% reproductible, if one waits enough time, which can be 10 seconds sometime)

The code to reproduce is not much more than what is the hello world of pyppeteer, btw.

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

Related websocket log :

DEBG:0087:websockets.protocol: client > Frame(fin=True, opcode=8, data=b'\x03\xe8', rsv1=False, rsv2=False, rsv3=False)
DEBG:0087:websockets.protocol: client - eof_received()
DEBG:0087:websockets.protocol: client ! failing WebSocket connection: 1006
DEBG:0087:websockets.protocol: client - connection_lost(None)
DEBG:0087:websockets.protocol: client x closing TCP connection

Looks like it shows the "send" with OP_CLOSE opcode is actually coming from the pyppeteer (but again, I'm not sure I understand everything here).

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

@miyakogi Unlike everything I believed until then, it seems the error happens because of something the remote site has. Can't understand why, but I can reproduce it 100% (whatever the platform is) using this website (probably found in the 90s):

import asyncio
from pyppeteer import launch

async def main():
    browser = await launch()
    page = await browser.newPage()
    try:
        await page.goto('http://www.kunstenknipwerk.com/', timeout=10000)
    finally:
        await page.close()

asyncio.get_event_loop().run_until_complete(main())

The randomness of the error for me then comes from the fact I use a queue that never contains the same thing.

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

Note that the equivalent puppeteer code on the same website works as expected and does not close the websocket connection:

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  let page = await browser.newPage();
  try {
    await page.goto('http://www.kunstenknipwerk.com/', {timeout: 10000});
  }
  catch (e) {
    console.error(e)
  }
  finally {
    await page.close();
  }
  console.log('done');
})();

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

Here are some other examples of urls that produce this bug:

https://www.iu.edu/
http://www.teletape.info/

One challenge is that not all urls reproduce the bug on all origins. I suspect websites to be slightly different depending on the origin (think of an ad server serving an ad that change on each request ...) and although the urls I pasted here seems to reproduce the bug 100% on my machines, some urls behave differently and only bug on one or a subset of machines ...

from pyppeteer.

miyakogi avatar miyakogi commented on May 14, 2024

Thank you for minimal reproducible code.
This problem is reproduced by the code.
I will try to fix it, but may take some time...

from pyppeteer.

miyakogi avatar miyakogi commented on May 14, 2024

After merging your PR (#64), this problem does not reproduce on my machine.
How about on your environment?

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

@miyakogi That's quite unexpected, should be completely unrelated, but on my local box it looks like the it works for the urls I listed here. Unfortunately, I upgraded a few of my instances to use dev version of pyppeteer, and I still see the error happening on other pages. I'm going to put in place a simple logging infrastructure for the dubious urls, which is a bit tricky as I don't know which tab is the culprit when it happens. But my "stupid" strategy on this is (a.k.a "post somewhere all urls that are currently processing when the error happen and try them one by one in a single tab spider") should help me find urls that still have this problem.

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

Oh and thanks for looking into it, I understand it may take some time as I already spent around 1 full week struggling with this, really hard to trace (but you for sure know better the internals of pyppeteer so it may be easier for you).

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

So, update here.

Indeed it looks like the patches "mostly" fixes the problem.

I still have it on some websites (but not 100% reproductible, looks like the behavior is a bit different from my local network than from the spiders networks).

The thing is, if an error happening in some handler causes the browser ws connection to crash, then there is probably something in pyppeteer that we can do to catch all errors and at least log the original error. It's basically non debuggable if it just says "wops, connection closed (1006)", but it's a matter of seconds to fix it if it says "cannot do X on NoneType". Do you have a good idea on where this code could go? I can work on a patch, but I'm not confident enough with the pyppeteer codebase yet to think of the best place to catch everything.

Thanks.

from pyppeteer.

debugtalk avatar debugtalk commented on May 14, 2024

I have the the same problem. Has this been fixed yet?

from pyppeteer.

hartym avatar hartym commented on May 14, 2024

As far as I know, it is not fixed. I still have the problem from time to time even if using develop makes it happen way less often. @miyakogi any idea on this regard ?

from pyppeteer.

RunningToTheEdgeOfTheWorld avatar RunningToTheEdgeOfTheWorld commented on May 14, 2024

I got this problem too...

from pyppeteer.

caglarsayin avatar caglarsayin commented on May 14, 2024

Same problem

from pyppeteer.

Qianlitp avatar Qianlitp commented on May 14, 2024

I got the exception at random time when I use multiprocess and open many pages.

Task exception was never retrieved
future: <Task finished coro=<Connection._async_send() done, defined at /opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py:69> exception=InvalidStateError('invalid state',)>
Traceback (most recent call last):
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 528, in transfer_data
    msg = yield from self.read_message()
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 580, in read_message
    frame = yield from self.read_data_frame(max_size=self.max_size)
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 645, in read_data_frame
    frame = yield from self.read_frame(max_size)
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 710, in read_frame
    extensions=self.extensions,
  File "/opt/python3/lib/python3.6/site-packages/websockets/framing.py", line 100, in read
    data = yield from reader(2)
  File "/opt/python3/lib/python3.6/asyncio/streams.py", line 672, in readexactly
    raise IncompleteReadError(incomplete, n)
asyncio.streams.IncompleteReadError: 0 bytes read on a total of 2 expected bytes

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 73, in _async_send
    await self.connection.send(msg)
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 361, in send
    yield from self.ensure_open()
  File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 501, in ensure_open
    self.close_code, self.close_reason) from self.transfer_data_exc
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 79, in _async_send
    await self.dispose()
  File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 170, in dispose
    await self._on_close()
  File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 153, in _on_close
    f'Protocol error {cb.method}: Target closed.',  # type: ignore
asyncio.base_futures.InvalidStateError: invalid state```

from pyppeteer.

RunningToTheEdgeOfTheWorld avatar RunningToTheEdgeOfTheWorld commented on May 14, 2024

I found some hints. It may be not websockets problem.
Some web socket server will close your connection if you not send ping message, so check your connect server's rule may help.
And even though send ping message, some server will close your connection after 24 hours

from pyppeteer.

nurettin avatar nurettin commented on May 14, 2024

I have an interesting setup. When I run without pipenv, I get no problems. When I run with pipenv, I get connection closed within a few seconds of opening target website. Any way I can assist in finding the culprit?

from pyppeteer.

nurettin avatar nurettin commented on May 14, 2024

It appears pipenv has websockets 7.0 and the normal userspace site-packages has websockets 6.0, it may be because of this difference.

from pyppeteer.

Qianlitp avatar Qianlitp commented on May 14, 2024

@RunningToTheEdgeOfTheWorld I agree with you. I found that chrome navigated some unstable urls would cost a lot of time and the exception appeared frequently. May be pyppeteer should keep the connection by sending ping message.

from pyppeteer.

taewookim avatar taewookim commented on May 14, 2024

Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.

@nurettin Dumb question, how do you do this?

from pyppeteer.

taewookim avatar taewookim commented on May 14, 2024

duh... I didnt realize websockets was a pip package.
thanks

from pyppeteer.

taewookim avatar taewookim commented on May 14, 2024

Don't mean to hijack this thread, but i wasn't sure if this was related..
Is there a way to force websocket connections to terminate on pyppeteer? (the issue here)

from pyppeteer.

Chestermozhao avatar Chestermozhao commented on May 14, 2024

Does this problem be fixed? I got same problem, and I used uvicorn which need websockets>=8.0...

from pyppeteer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.