Comments (31)
Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.
from pyppeteer.
This issue seems to be still a problem on some websites even after applying the proposed patches for the ping_timeout
argument and/or downgrading to websockets 6.0.
from pyppeteer.
Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.
@nurettin how do you do this?
You edit your Pipfile
or requirements.txt
manually and replace websockets==7.0 with 6.0
But I recommend monkey patching for now see #160
from pyppeteer.
Bump this. It needs to be resolved in master and the solution is to downgrade websockets.
Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.
@nurettin how do you do this?
You edit your
Pipfile
orrequirements.txt
manually and replace websockets==7.0 with 6.0But I recommend monkey patching for now see #160
from pyppeteer.
No idea what's wrong, I think it has something to do with the websockets
dependancy. With 6.0
, there are no issues, with 8.1
, issues left and right
from pyppeteer.
Update:
Still strugling in tracing that but it looks like it happens because page.close()
is awaited too soon after the page.goto(...)
went timeout.
I reduce greatly the amount of errors (like, to about 0 if not 0, will confirm in a few hours) either if:
- I slow down the machine that runs the script. Giving only 0.2 CPU instead of 4 to 8 like I was trying before, and I don't see the error happen anymore
- I await an
asyncio.sleep(1)
before awaitingpage.close()
.
I suspect there is a race condition between some callback that happens on goto timeout and page.close()
[edit] The "slowness fixes the bug" hypothesis is erroneous. There is something related to the websites opened that causes this. The reason why it's so "random" is that I use a message queue (rabbitmq) that will reschedule any cancelled task at same spot they were before. So for long time, no buuggy site, no error. If one buggy site comes to amqp head, then each time it kills a worker another worker will get it, and die too. Loop that. Funny, right?
from pyppeteer.
Still problems with newest websocket package. Does somebody has an idea what goes wrong? Would be nice to open an issue for the websocket package.
from pyppeteer.
Great thanks for detail information and suggestion.
I don't have good idea to fix this soon, but I will check page.goto
's timeout sequence.
from pyppeteer.
@miyakogi Thanks for the reply.
I can confirm for sure that it is not docker related, I just spent some time installing my code on a machine without containers, and the error happens exactly the same way.
Trying to create a minimal code file for reproduction (even if it's happening at random, it's 100% reproductible, if one waits enough time, which can be 10 seconds sometime)
The code to reproduce is not much more than what is the hello world of pyppeteer, btw.
from pyppeteer.
Related websocket log :
DEBG:0087:websockets.protocol: client > Frame(fin=True, opcode=8, data=b'\x03\xe8', rsv1=False, rsv2=False, rsv3=False)
DEBG:0087:websockets.protocol: client - eof_received()
DEBG:0087:websockets.protocol: client ! failing WebSocket connection: 1006
DEBG:0087:websockets.protocol: client - connection_lost(None)
DEBG:0087:websockets.protocol: client x closing TCP connection
Looks like it shows the "send" with OP_CLOSE opcode is actually coming from the pyppeteer (but again, I'm not sure I understand everything here).
from pyppeteer.
@miyakogi Unlike everything I believed until then, it seems the error happens because of something the remote site has. Can't understand why, but I can reproduce it 100% (whatever the platform is) using this website (probably found in the 90s):
import asyncio
from pyppeteer import launch
async def main():
browser = await launch()
page = await browser.newPage()
try:
await page.goto('http://www.kunstenknipwerk.com/', timeout=10000)
finally:
await page.close()
asyncio.get_event_loop().run_until_complete(main())
The randomness of the error for me then comes from the fact I use a queue that never contains the same thing.
from pyppeteer.
Note that the equivalent puppeteer code on the same website works as expected and does not close the websocket connection:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
let page = await browser.newPage();
try {
await page.goto('http://www.kunstenknipwerk.com/', {timeout: 10000});
}
catch (e) {
console.error(e)
}
finally {
await page.close();
}
console.log('done');
})();
from pyppeteer.
Here are some other examples of urls that produce this bug:
https://www.iu.edu/
http://www.teletape.info/
One challenge is that not all urls reproduce the bug on all origins. I suspect websites to be slightly different depending on the origin (think of an ad server serving an ad that change on each request ...) and although the urls I pasted here seems to reproduce the bug 100% on my machines, some urls behave differently and only bug on one or a subset of machines ...
from pyppeteer.
Thank you for minimal reproducible code.
This problem is reproduced by the code.
I will try to fix it, but may take some time...
from pyppeteer.
After merging your PR (#64), this problem does not reproduce on my machine.
How about on your environment?
from pyppeteer.
@miyakogi That's quite unexpected, should be completely unrelated, but on my local box it looks like the it works for the urls I listed here. Unfortunately, I upgraded a few of my instances to use dev version of pyppeteer, and I still see the error happening on other pages. I'm going to put in place a simple logging infrastructure for the dubious urls, which is a bit tricky as I don't know which tab is the culprit when it happens. But my "stupid" strategy on this is (a.k.a "post somewhere all urls that are currently processing when the error happen and try them one by one in a single tab spider") should help me find urls that still have this problem.
from pyppeteer.
Oh and thanks for looking into it, I understand it may take some time as I already spent around 1 full week struggling with this, really hard to trace (but you for sure know better the internals of pyppeteer so it may be easier for you).
from pyppeteer.
So, update here.
Indeed it looks like the patches "mostly" fixes the problem.
I still have it on some websites (but not 100% reproductible, looks like the behavior is a bit different from my local network than from the spiders networks).
The thing is, if an error happening in some handler causes the browser ws connection to crash, then there is probably something in pyppeteer that we can do to catch all errors and at least log the original error. It's basically non debuggable if it just says "wops, connection closed (1006)", but it's a matter of seconds to fix it if it says "cannot do X on NoneType". Do you have a good idea on where this code could go? I can work on a patch, but I'm not confident enough with the pyppeteer codebase yet to think of the best place to catch everything.
Thanks.
from pyppeteer.
I have the the same problem. Has this been fixed yet?
from pyppeteer.
As far as I know, it is not fixed. I still have the problem from time to time even if using develop makes it happen way less often. @miyakogi any idea on this regard ?
from pyppeteer.
I got this problem too...
from pyppeteer.
Same problem
from pyppeteer.
I got the exception at random time when I use multiprocess and open many pages.
Task exception was never retrieved
future: <Task finished coro=<Connection._async_send() done, defined at /opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py:69> exception=InvalidStateError('invalid state',)>
Traceback (most recent call last):
File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 528, in transfer_data
msg = yield from self.read_message()
File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 580, in read_message
frame = yield from self.read_data_frame(max_size=self.max_size)
File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 645, in read_data_frame
frame = yield from self.read_frame(max_size)
File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 710, in read_frame
extensions=self.extensions,
File "/opt/python3/lib/python3.6/site-packages/websockets/framing.py", line 100, in read
data = yield from reader(2)
File "/opt/python3/lib/python3.6/asyncio/streams.py", line 672, in readexactly
raise IncompleteReadError(incomplete, n)
asyncio.streams.IncompleteReadError: 0 bytes read on a total of 2 expected bytes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 73, in _async_send
await self.connection.send(msg)
File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 361, in send
yield from self.ensure_open()
File "/opt/python3/lib/python3.6/site-packages/websockets/protocol.py", line 501, in ensure_open
self.close_code, self.close_reason) from self.transfer_data_exc
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1006 (connection closed abnormally [internal]), no reason
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 79, in _async_send
await self.dispose()
File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 170, in dispose
await self._on_close()
File "/opt/python3/lib/python3.6/site-packages/pyppeteer/connection.py", line 153, in _on_close
f'Protocol error {cb.method}: Target closed.', # type: ignore
asyncio.base_futures.InvalidStateError: invalid state```
from pyppeteer.
I found some hints. It may be not websockets problem.
Some web socket server will close your connection if you not send ping message, so check your connect server's rule may help.
And even though send ping message, some server will close your connection after 24 hours
from pyppeteer.
I have an interesting setup. When I run without pipenv, I get no problems. When I run with pipenv, I get connection closed within a few seconds of opening target website. Any way I can assist in finding the culprit?
from pyppeteer.
It appears pipenv has websockets 7.0 and the normal userspace site-packages has websockets 6.0, it may be because of this difference.
from pyppeteer.
@RunningToTheEdgeOfTheWorld I agree with you. I found that chrome navigated some unstable urls would cost a lot of time and the exception appeared frequently. May be pyppeteer should keep the connection by sending ping message.
from pyppeteer.
Yeah, downgraded to websockets 6.0, it works fine now. websockets 7.0 consistently disconnects after some time for all websites.
@nurettin Dumb question, how do you do this?
from pyppeteer.
duh... I didnt realize websockets was a pip package.
thanks
from pyppeteer.
Don't mean to hijack this thread, but i wasn't sure if this was related..
Is there a way to force websocket connections to terminate on pyppeteer? (the issue here)
from pyppeteer.
Does this problem be fixed? I got same problem, and I used uvicorn which need websockets>=8.0...
from pyppeteer.
Related Issues (20)
- how to clear the input area?
- Page.cookies() returns List, not Dict HOT 1
- TypeError: 'coroutine' object is not callable, why? HOT 1
- Support ARM OS HOT 1
- Navigation Timeout Exceeded: 30000 ms exceeded HOT 7
- Error browser closes unexpectedly HOT 1
- Can I get innerHTML from the Element Handle Class? HOT 1
- Response.buffer() return str rather than [bytes] HOT 1
- Pyppeteer on python 3.6.8 HOT 1
- UTF-8 characters are coming as garbage HOT 1
- KeyError when send HOT 1
- SyntaxError: invalid escape sequence HOT 2
- Is pyppeteer still maintained? HOT 4
- --enable-automation HOT 1
- can we use the cookies in python request from puppeteer cookies? HOT 1
- Page goto returns None HOT 2
- Execution context was destroyed, most likely because of a navigation. HOT 2
- Is there a forum or something for specific question related to pyppeteer? HOT 1
- Screenshot is not working in headless HOT 1
- >> REPOSITORY ABANDONED >> use pyppeteer2 instead HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyppeteer.