Code Monkey home page Code Monkey logo

gdom's Introduction

GDOM

GDOM is the next generation of web-parsing, powered by GraphQL syntax and the Graphene framework.

Install it typing in your console:

pip install gdom

DEMO: Try GDOM online

Usage

You can either do gdom --test to start a test server for testing queries or

gdom QUERY_FILE

This command will write in the standard output (or other output if specified via --output) the resulting JSON.

Your QUERY_FILE could look similar to this:

{
  page(url:"http://news.ycombinator.com") {
    items: query(selector:"tr.athing") {
      rank: text(selector:"td span.rank")
      title: text(selector:"td.title a")
      sitebit: text(selector:"span.comhead a")
      url: attr(selector:"td.title a", name:"href")
      attrs: next {
         score: text(selector:"span.score")
         user: text(selector:"a:eq(0)")
         comments: text(selector:"a:eq(2)")
      }
    }
  }
}

Advanced usage

If you want to generalize your gdom query to any page, just rewrite your query file adding the $page var. So should look to something like this:

query ($page: String) {
  page(url:$page) {
    # ...
  }
}

And then, query it like:

gdom QUERY_FILE http://news.ycombinator.com

gdom's People

Contributors

agrison avatar syrusakbary avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gdom's Issues

crawling support?

I love this idea! I was wondering if there's crawling support (basically go to the links of all links on the page), or if there's any plans to add it?

Miss the Online Demo

For non-coders, the online demo was very unique and useful. There's nothing else like it online.

ModuleNotFoundError

When I run gdom I get this output:

Traceback (most recent call last):
  File "/home/kfm/.local/bin/gdom", line 7, in <module>
    from gdom.cmd import main
  File "/home/kfm/.local/lib/python3.6/site-packages/gdom/cmd.py", line 9, in <module>
    from schema import schema
ModuleNotFoundError: No module named 'schema'

It seems to be a problem with gdom itself, somehow not finding its own schema.py. Any idea what's going wrong there and how to fix this?

auth

would be nice if we could perform a simple login before crawling

ImportError: No module named core.execution.executor

I was testing a fresh install of the module and ran a gdom --test resulting in the following traceback:


Traceback (most recent call last):
  File "/usr/local/bin/gdom", line 7, in <module>
    from gdom.cmd import main
  File "/usr/local/lib/python2.7/site-packages/gdom/__init__.py", line 1, in <module>
    from .schema import schema, Node, Element, Document, Query
  File "/usr/local/lib/python2.7/site-packages/gdom/schema.py", line 1, in <module>
    import graphene
  File "/usr/local/lib/python2.7/site-packages/graphene/__init__.py", line 3, in <module>
    from .core import (
  File "/usr/local/lib/python2.7/site-packages/graphene/core/__init__.py", line 1, in <module>
    from .schema import (
  File "/usr/local/lib/python2.7/site-packages/graphene/core/schema.py", line 4, in <module>
    from graphql.core.execution.executor import Executor
ImportError: No module named core.execution.executor

Below is my python installation log:

https://gist.github.com/dca2f5c358b1aa7de3093565b8fad874

Demo only scrapes hackernews?

Just tried exploring the queries with Tesco as per https://news.ycombinator.com/item?id=11180732 and the results are empty.
Also tried other sites besides hackernews, and could not get results.

It's understandable if it's whitelist-only, but there's no documentation about it which leads users (me) to think that either we're doing it wrong or it's broken.

Cant Run

Hi. Figured I would learn something about dom nodes by using this. I cant get the project to run though. I installed with pip3 install gdom on a mac.

i get the following error. I dont know if this is something a seasoned python dev would know how to deal with. I dont know any python...yet.

error is:

$ gdom --test
Traceback (most recent call last):
  File "/usr/local/bin/gdom", line 7, in <module>
    from gdom.cmd import main
  File "/usr/local/lib/python3.6/site-packages/gdom/__init__.py", line 1, in <module>
    from .schema import schema, Node, Element, Document, Query
  File "/usr/local/lib/python3.6/site-packages/gdom/schema.py", line 1, in <module>
    import graphene
  File "/usr/local/lib/python3.6/site-packages/graphene/__init__.py", line 3, in <module>
    from .core import (
  File "/usr/local/lib/python3.6/site-packages/graphene/core/__init__.py", line 1, in <module>
    from .schema import (
  File "/usr/local/lib/python3.6/site-packages/graphene/core/schema.py", line 4, in <module>
    from graphql.core.execution.executor import Executor
ModuleNotFoundError: No module named 'graphql.core'

Thanks.

GraphiQL opened twice

Cheers for a very nice program!

When gdom is ran with --test, it seems the query test page is opened twice. Do you see the same behaviour on your system; and if yes, is it desired?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.