hyperiongray / trio-chrome-devtools-protocol Goto Github PK

View Code? Open in Web Editor NEW

56.0 56.0 17.0 332 KB

Trio driver for Chrome DevTools Protocol (CDP)

License: MIT License

Python 99.83% Makefile 0.17%

trio-chrome-devtools-protocol's People

Contributors

Stargazers

Watchers

Forkers

strategist922 jean rfcabal auphofbsf zanachka vcalvert yy660921 gc-ss weskerfoot hixio-mh zklao encacz lzh6710 hamburgerz ttm56p nik0xff

trio-chrome-devtools-protocol's Issues

examples failing with ConnectionClosed exception

Hi -- I'm running the screenshot and get_title examples and they are both succeeding, but when exiting the context block raise an exception:

% python screenshot.py ws://127.0.0.1:9000/devtools/browser/91c7b198-5af0-42d4-8e37-cf87ff880dd2 https://google.com
INFO:screenshot:Connecting to browser: ws://127.0.0.1:9000/devtools/browser/91c7b198-5af0-42d4-8e37-cf87ff880dd2
INFO:screenshot:Listing targets
INFO:screenshot:Attaching to target id=82C8C3F0A2F7DF1F6363F23BCEFC0E7D
INFO:screenshot:Setting device emulation
INFO:screenshot:Enabling page events
INFO:screenshot:Navigating to https://google.com
INFO:screenshot:Making a screenshot
INFO:screenshot:Saving to file
Traceback (most recent call last):
  File "screenshot.py", line 73, in <module>
    trio.run(main, restrict_keyboard_interrupt_to_checkpoints=True)
  File "/Users/phil/repos/trio/trio/_core/_run.py", line 1804, in run
    raise runner.main_task_outcome.error
  File "screenshot.py", line 66, in main
    await screenshot_file.write(b64decode(img_data))
  File "/Users/phil/a50037/envs/cdp/lib/python3.7/contextlib.py", line 177, in __aexit__
    await self.gen.__anext__()
  File "/Users/phil/repos/trio-chrome-devtools-protocol/trio_cdp/__init__.py", line 307, in open_cdp_connection
    yield cdp_conn
  File "/Users/phil/repos/trio/trio/_core/_run.py", line 730, in __aexit__
    raise combined_error_from_nursery
  File "/Users/phil/repos/trio-chrome-devtools-protocol/trio_cdp/__init__.py", line 208, in _reader_task
    message = await self.ws.get_message()
  File "/Users/phil/a50037/envs/cdp/lib/python3.7/site-packages/trio_websocket/_impl.py", line 823, in get_message
    raise ConnectionClosed(self._close_reason) from None
trio_websocket._impl.ConnectionClosed: None

Event example ?

Hi,

If it is possible in the documentation to understand how to use trio_cdp for event management like cdp.network.RequestWillBeSent ... it would be of great help

The API feels cumbersome. Each call to the server has to be wrapped in await conn.execute(...) or await session.execute(...) which is quite a bit of boilerplate to type, it makes lines of code longer (and more likely to need wrapping), and it seems to prevent type inference of the result[1].

It might be better to adopt a code generation approach as in PyCDP. But PyCDP generates Python code, we don't need to parse the CDP spec in this project. Instead, we can use introspection on PyCDP to search for command methods and generate new wrappers for them.

One possible design is to add a connection/session argument to each command, i.e. page.capture_screenshot(format='png') would become await page.capture_screenshot(session, format='png'). The body of the generated function would be really simple:

async def capture_screenshot(session: CdpBase, url: str) -> str:
    ret = await session.execute(cdp.page.navigate(url))
    return typing.cast(str, ret)

(The cast is there to help type inference, but a good type checker might not need it?)

This design would solve all 3 problems described in this issue, but it complicates the documentation a bit. Right now it's easy to explain: you can execute() any command in PyCDP. The new design either requires a separate sets of docs for Trio CDP, or else explain that all PyCDP commands are valid in Trio CDP, but they are now async and have an additional argument. I don't know which one is better...

[1] I have only tested with JEDI, which doesn't seem to understand that execute() always returns the same type as its argument. Kite may have better results. I'll test with it later.

Doing a heap snapshot

Sorry to abuse the tracker for support. Is there a better forum?

I'm trying to port this code:

const chunks = [];
cdpSession.on('HeapProfiler.addHeapSnapshotChunk', ({chunk}) => {chunks.push(chunk);});
await cdpSession.send('HeapProfiler.takeHeapSnapshot', {reportProgress: false});
rawSnapshotData = chunks.join('');
fs.writeFile(path, Buffer.from(rawSnapshotData), (ok, err) => {}

This is what I have now. Obvious problem: it doesn't gather chunks:

            async with session.wait_for(heap_profiler.AddHeapSnapshotChunk):
                session.execute(heap_profiler.take_heap_snapshot())
            with open(outdir / '%s.heapsnapshot' % datetime.today().isoformat(), 'ab') as outfile:
                outfile.write(chunk)

How do I react to all AddHeapSnapshotChunk events until I have the whole snapshot?

get_title.py fails --- TypeError: object _AsyncGeneratorContextManager can't be used in 'await' expression

I was trying the get_title.py example, but it fails with a TypeError:

~/git/python-chrome-devtools-protocol(master)$ python3 test.py ws://127.0.0.1:9900/devtools/browser/618a25d7-... https://example.com
INFO:screenshot:Connecting to browser: ws://127.0.0.1:9900/devtools/browser/618a25d7-...
INFO:screenshot:Listing targets
INFO:screenshot:Attaching to target id=5587641B1405557....
INFO:screenshot:Navigating to https://example.com
Traceback (most recent call last):
  File "test.py", line 43, in <module>
    trio.run(main, restrict_keyboard_interrupt_to_checkpoints=True)
  File "/data/data/com.termux/files/home/.local/lib/python3.8/site-packages/trio/_core/_run.py", line 1804, in run
    raise runner.main_task_outcome.error
  File "test.py", line 29, in main
    event = await session.wait_for(page.LoadEventFired)
TypeError: object _AsyncGeneratorContextManager can't be used in 'await' expression

How to launch trio-chrome fully programmatically ?

Hi,

In the examples it says :
"The URL that Chrome is listening on is displayed in the terminal after Chrome starts up."
This works fine, but how to do handle all this programmatically ? Copy-pasting an url is not possible in production.

Have you a way to get this url automatically ? Isn't there a way to bypass this ? I mean, other library like Pychrome (outdated), were simply asking for the url 'pychrome.Browser(url="http://127.0.0.1:9222")'

In python, Selenium is allowing : driver.execute_cdp_cmd('Target.getTargets', {}). Those libraries are not as complete as trio-chrome, but I do not need to copy paste an url by hand.

how to get response data while scrolling

i am trying to enable network to start receiving response events with:

await session.execute(network.enable())
requests =  session.listen(network.RequestWillBeSent)
async for r in requests:
    print(r, flush=True)

but i dont make it past the point of enabling network:

blocked_reason=CookieBlockedReason.from_json(json['blockedReason']),
KeyError: 'blockedReason'

can you supply an example of how to get response data while scrolling ?

Extended more abstract Functions and Classes using API

@mehaase I have been extensively using the #6 api and will soon move to the #9. I have a number of functional classes providing and implementing more abstract functions, such as keyboard, mouse actions , and more utility functions for finding and focusing, reading or manipulating elements. This results that the end application is very clean and reasonably elegant

By enlarge I have been incorporating functionality from the likes of https://github.com/miyakogi/pyppeteer an unoffical https://github.com/puppeteer/puppeteer port.

Some key benefits are

the async functions of https://github.com/python-trio/trio provided through https://github.com/HyperionGray/trio-chrome-devtools-protocol.

keep the application to pure CDP with no javascript injection or manipulation

Light weight

I feel many of these classes I have created are worth putting up on Github but under what structure, Do we add on to trio-chrome-devtools-protocol or create another module trio-puppeteer
There are many pro's and con's but I have only implemented a small but to me very useful subset of puppeteer

Interested in perspectives from @mehaase or other users or contributers who may have ideas and relevant comments to structure, naming

Call `detachFromTarget` when closing session

The session cleanup when leaving an open_session() context block is pretty minimal. At the very least, it should run the CDP detachFromTarget command.

Event listeners are not closed correctly during connection close

In #5 I added aclose() to the connection class so that the underlying WebSocket can be closed, and future calls to conn.execute() will raise ConnectionClosed. This is not quite enough cleanup, however, because the sessions belonging to that connection are not torn down. For example, if a caller is inside an async for session.listen(...): loop, I believe that loop will currently hang forever, because it does not receive a signal that the connection is closed. The proper behavior would be to close all open channels, which will cause all async for session.listen(...) loops to exit gracefully.

There may be other, similar issues regarding session tear-down.

Publish 0.7.0 to PyPI

Currently, 0.6.0 is on PyPI.

Without manually building from master, I am getting KeyError (e.g., KeyError: 'Page.documentOpened') when using 0.6.0 from PyPI.

https://travis-ci.com/github/HyperionGray/trio-chrome-devtools-protocol/builds/160400374

I'm not very familiar with poetry, but I suspect you need to rerun poetry update when a dependency changes versions (e.g., chrome-devtools-protocol = "^0.4.0"). This will update the poetry.lock file to a coherent state.

Fixing the CI issues and publishing trio-chome-devtools-protocol with ^0.4.0 would be helpful for installing from PyPI.

AttributeError in get_title example

There's an AttributeError in the get_title.py example.

AttributeError: 'TargetInfo' object has no attribute 'type'

I believe this should be type_.

It might be useful to consider leveraging mypy to help proactively catch these errors.

CmEventProxy is returning an empty value

Hi,

Sorry for raising an issue, happy to move this to a chat if there is one.

I am trying, via a very roundabout way, to use parts of this project and integrate it into Selenium to add new evented APIs from CDP. We can't use this project directly as we need want to support multiple chromium versions at once so we generate and integrate it into wheel.

It all works in that I see that an event happens but I can't get the exact event that happened.

I have used an asynccontext manager but what is returned is an empty CmEventProxy object.

Not knowing the code well enough, and the eventing model in trio, I can't see how https://github.com/HyperionGray/trio-chrome-devtools-protocol/blob/master/trio_cdp/__init__.py#L129-L132 works. I can see that the event happens how does event get into proxy.value.

Any help would be greatly appreciated, happy to help with documentation and examples for this repo in return.