Code Monkey home page Code Monkey logo

goose-parser's People

Contributors

jifeon avatar mazahaca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goose-parser's Issues

What about goose-starter-kit

A project with configured build for browser env and working basic usage example. Users will be able to clone this repo and start hacking!

Rename "type" to "dataType"

Sometime we need to specify data type in the parsed result.
For now this is a "type" field. But also we use "type" keyword as determine type of actions and transforms . So we need to rename "type" to "dataType" in cases when it has data type meaning.

add TOC to readme

Now when we have plenty of docs, it's time to add a table of content in the head of Readme.md

Remove useActionsResult

We have stores now and can set data in one action and get it inside another, useActionsResult is used to get actions result from previous actions, it's rudiment and should be removed

Add ability to apply DataTransformation after parse process

Needed:

  • add some DataTransformManager.
  • add several DefaultTransformers like date, split, ...
  • ability to add custom transformers.
  • in the parse nodes should be added and ability to specify chains of transformers, which will be applied one by one to the parsed data.

Example:
We have parsed data:

  • date: "13 February 2016, Sat"

We need to get:

  • date: "13 February 2016 13:35" or even "2015-02-13 13:35:00"

Implement captcha handler

It should:

  • knows how to detect if captcha had happened (unexpected landing to captcha url or existing on the page captcha scope)
  • knows scope of confirm button and input for enter captcha.
  • take a picture of captcha (do base64 from picture in evaluate)
  • has ability to set a promised function, which will determine captcha and return value.

Implement:

  • does action back (if needed) and continue parsing

Prepare docs for version 0.5.0

Prepare basic documentation about

  • actions and a way to add custom actions
  • transforms and a way to add custom transforms
  • cookie storage
  • langoose

Continue parsing after failure

For now on PhantomError we get only error, but this could happen after parsing several rows.

  • #54.
  • we need to provide Parser scope to restore it.
  • we need to have an ability to continue parsing process after an error, from the place when it had stopped.

Add ability to extend rules and actions

We can extend rules and actions from any defined one in the system. And have an ability to override some particular properties.

This will allow to maintain a big amount of similar rules in easy way.

Ability to set proxies list

PhantomEnv should allow to set proxies list and knows moment when to switch it between each others.
Remember which proxies was used and last time and url of using.
Probably, also set strategy of switching:

  • each query,
  • smart query (emulate user action).

Extend documentation

Add to docs:

  • Describe Environments.
  • Describe dependency to PhantomJS 2.0.
  • Describe process of tests for different environments.
  • Add budgets from shields.io.
  • Describe how to debug parser.

Schema transformers

Needed:

  • Ability to specify:
    • schema from
    • schema to
    • DataTransformers (#2) which is needed to apply during transformation

Example:
We have parsed data:

  • time: "13:35"
  • date: "13 February 2016, Sat"

We need to get:

  • date: "13 February 2016 13:35" or even "2015-02-13 13:35:00"

Add ability to paginate via click to load more

That kind is very close to scroll pagination, but instead of scrolling you need to click on the block to load new page [extend current page list].

Need to think about a way to build custom pagination event, which allows to cover any case of pagination.

PhantomEnvironment is undefined

import {
   PhantomEnvironment,
   Parser
} from 'goose-parser';

const env = new PhantomEnvironment({
   url: 'http://www.gooseplanet.ru/'
});

TypeError: _gooseParser.PhantomEnvironment is not a constructor

I look at the imported entities and both of them is undefined.

If I write:
import Parser from 'goose-parser'

It return [Function: Parser]
But where I can find PhantomEnvironment???

Add ability to execute custom actions before start parsing

Add ability to execute custom actions before start parsing.
For example we need to

  • land on search page (provided URL)
  • insert in the field search text
  • press to search button
  • land to search results page and start parsing there

This functionality will completely replace actions with once flag

Update tests according to the changes in 0.5.0 and add circle-ci

Move current test system to new efficient way (Take a look on new tests here #76)

So, we have:
old tests here: tests/phantom_parser_test.js
new tests here: tests/phantom/lib/

To run tests, just call npm test

When move some test, remove it in tests/phantom_parser_test.js
At the end remove a file tests/phantom_parser_test.js and html page for it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.