Code Monkey home page Code Monkey logo

docs's Introduction

Pholcus 使用手册

logo

Pholcus(幽灵蛛)是一款纯Go语言编写的高并发、分布式、重量级爬虫软件,支持单机、服务端、客户端三种运行模式,拥有Web、GUI、命令行三种操作界面;规则简单灵活、批量任务并发、输出方式丰富(mysql、mongodb、csv、excel等)、有大量Demo共享;同时她还支持横纵向两种抓取模式,支持模拟登录和任务暂停、取消等一系列高级功能。

模块构成

框架特点

  1. Pholcus(幽灵蛛)以高效率,高灵活性和人性化设计为开发的指导**;

  2. 支持单机、服务端、客户端三种运行模式,即支持分布式布局,适用于各种业务需要;

  3. 支持Web、GUI、命令行三种操作界面,适用于各种运行环境;

  4. 支持mysql/mongodb/csv/excel等多种输出方式,且可以轻松添加更多输出方式;

  5. 采用surfer高并发下载器,支持 GET/POST/HEAD 方法及 http/https 协议,同时支持固定UserAgent自动保存cookie与随机大量UserAgent禁用cookie两种模式,高度模拟浏览器行为,可实现模拟登录等功能;

  6. 服务器/客户端模式采用teleport高并发socketAPI框架,全双工长连接通信,内部数据传输格式为JSON;

  7. 对采集规则进行了精心设计,支持静态编译与动态JS两种规则,灵活简单且有大量Demo,写规则就是这么轻松;

  8. 支持横纵向两种抓取模式,并且支持任务暂停、取消等操作。

贡献者名单

贡献者 贡献内容
henrylee2cn 软件作者
kas surfer下载器中phantomjs内核
wang898jian 完全手册贡献者

第三方依赖包

go get github.com/pholcus/spider_lib
go get github.com/henrylee2cn/teleport
go get github.com/PuerkitoBio/goquery
go get github.com/robertkrimen/otto
go get github.com/andybalholm/cascadia
go get github.com/lxn/walk
go get github.com/lxn/win
go get github.com/go-sql-driver/mysql
go get github.com/jteeuwen/go-bindata/...
go get github.com/elazarl/go-bindata-assetfs/...
go get gopkg.in/mgo.v2
<以下需翻墙下载>
go get golang.org/x/net/html
go get golang.org/x/text/encoding
go get golang.org/x/text/transform

(在此感谢以上开源项目的支持!)

开源协议

Pholcus(幽灵蛛)项目采用商业应用友好的Apache License v2.发布

docs's People

Contributors

andeya avatar wang898jian avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs's Issues

请问linux下可以编译好后定时任务执行采集么?

go build main.go 请问是这样执行么?
./main pholcus -a_ui=cmd -c_spider=0,1,2,3,8,16 -c_output=mysql -c_thread=20 -c_docker=5000 -c_pause=300 -c_proxy=0 -c_keyword=pholcus,golang -c_maxpage=10 -c_inherit_y=true -c_inherit_n=true

或者怎么执行呢? 开始以后命令行自动就退出了 请问怎么运行呢?
采集规则web模式下采集无误,但是我希望将编译到linux环境做定时任务定时执行 请问可以么 ? 如果可以请问怎么执行编译文件呢?目前命令行执行自动秒退出了。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.