Code Monkey home page Code Monkey logo

crawlab-team / crawlab Goto Github PK

View Code? Open in Web Editor NEW
10.8K 10.8K 1.7K 24.21 MB

Distributed web crawler admin platform for spiders management regardless of languages and frameworks. 分布式爬虫管理平台,支持任何语言和框架

Home Page: https://www.crawlab.cn

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 2.86% Shell 15.63% Go 80.96% Python 0.56%
crawlab crawler crawling-tasks docker go platform scrapy scrapyd-ui spider spiders-management web-crawler webcrawler webspider

crawlab's People

Contributors

0xflotus avatar appleboy avatar bestgopher avatar chncaption avatar codingendless avatar cxapython avatar darrenxyli avatar dependabot[bot] avatar gemerz avatar gs80140 avatar haivp3010 avatar hantmac avatar hyyzzz111 avatar jonnnh avatar luzihang123 avatar ma-pony avatar marvzhang avatar maybgit avatar seven2nine avatar tikazyq avatar wahyd4 avatar wo10378931 avatar wu526 avatar xiaoxiaolvqi avatar yann0917 avatar zerafachris avatar zhangweiii avatar zkqiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crawlab's Issues

爬虫运行时报错

[2019-04-01 11:40:33,715: ERROR/MainProcess] Task handler raised error: ValueError('not enough values to unpack (expected 3, got 0)',)
Traceback (most recent call last):
File "C:\Users\xiaojiahao\Envs\jaho\lib\site-packages\billiard\pool.py", line 358, in workloop
result = (True, prepare_result(fun(*args, **kwargs)))
File "C:\Users\xiaojiahao\Envs\jaho\lib\site-packages\celery\app\trace.py", line 537, in _fast_trace_task
tasks, accept, hostname = _loc
ValueError: not enough values to unpack (expected 3, got 0)

爬虫详情修改问题

第一次点击到 网站 那个input,没有数据时 白色的模态框 会一直显示,覆盖了运行与保存。

上传的zip无法同步子节点

docker run -d --name crawlab_w1 -e CRAWLAB_REDIS_ADDRESS=redis -e CRAWLAB_MONGO_HOST=mongo -e CRAWLAB_SERVER_MASTER=N -e CRAWLAB_API_ADDRESS=192.168.2.222:8000 -e CRAWLAB_SPIDER_PATH=/app/spiders -v /var/logs/crawlab:/var/logs/crawlab --link mongo:mongo --link redis:redis --privileged=true tikazyq/crawlab

该docker显示已经上线
image
上传一个zip文件, 只包含一个test.js,console.log("test");
只有主节点可以运行,今天试了上传几次子节点依然没有,发现子节点报以下错误,进入docker中也没有目录建立,只有示例的目录。
image

一键启动服务

现在安装启动crawlab服务比较繁琐,需要执行多条命令,现在需要一键启动的脚本

执行 python3 ./bin/run_flower.py 报以下错误

[I 190409 17:22:40 command:147] Registered tasks:

['celery.accumulate',

 'celery.backend_cleanup',

 'celery.chain',

 'celery.chord',

 'celery.chord_unlock',

 'celery.chunks',

 'celery.group',

 'celery.map',

 'celery.starmap']

[I 190409 17:22:40 mixins:229] Connected to redis://127.0.0.1:6379/0

[W 190409 17:22:45 control:44] 'stats' inspect method failed

[W 190409 17:22:45 control:44] 'active_queues' inspect method failed

[W 190409 17:22:45 control:44] 'registered' inspect method failed

[W 190409 17:22:45 control:44] 'scheduled' inspect method failed

[W 190409 17:22:45 control:44] 'active' inspect method failed

[W 190409 17:22:45 control:44] 'reserved' inspect method failed

[W 190409 17:22:45 control:44] 'revoked' inspect method failed

[W 190409 17:22:45 control:44] 'conf' inspect method failed

批量部署节点

批量部署节点,做到一次性将节点部署完毕,而不用一个个部署

redis端口密码问题

docker run 时不能指定端口,CRAWLAB_REDIS_ADDRESS=127.0.0.1:6378 无效,链接时同样会访问127.0.0.1:6378:6379, 也不能指定密码,mongo一样。

节点不会离线也无法删除

image

使用k8s平台部署(deployment),节点的IP使用service的ClusterIP,当对这个节点(master或者worker)进行更新,会发现在web平台看到2台节点,实际上有一台已经不存在了

执行python3 ./bin/run_worker.py

python3 ./bin/run_worker.py
Traceback (most recent call last):
File "./bin/run_worker.py", line 9, in
from tasks.celery import celery_app
ModuleNotFoundError: No module named 'tasks.celery'

spider 无法部署

爬虫部署的时候报一下错误:

/usr/local/lib/python3.6/dist-packages/pymongo/topology.py:149: UserWarning: MongoClient opened before fork. Create MongoClient only after forking. See PyMongo's documentation for details: http://api.mongodb.org/python/current/faq.html#is-pymongo-fork-safe
  "MongoClient opened before fork. Create MongoClient only "
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/util.py", line 319, in _exit_function
    p.join()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 122, in join
    assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process

不是很明白。

接口跨域问题,出现localhost

Error
vue.runtime.esm.js?2b0e:619 [Vue warn]: Avoid mutating a prop directly since the value will be overwritten whenever the parent component re-renders. Instead, use a data or computed property based on the prop's value. Prop being mutated: "pageNum"

found in

---> at src/components/TableView/GeneralTableView.vue


at src/views/task/TaskDetail.vue
at src/views/layout/components/AppMain.vue
at src/views/layout/Layout.vue
at src/App.vue

部署时出现问题 api出错

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=5000): Max retries exceeded with url: /api/spiders/5c9f2095428f2c217cc474e6/deploy_file?node_id=celery@26GXGQ2 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x00000000096A3A20>: Failed to establish a new connection: [WinError 10061] 由于目标计算机积极拒绝,无法连接。',))

Cannot load celery.commands extension 'flower.command:FlowerCommand': ModuleNotFoundError("No module named 'flower.command'; 'flower' is not a package"

环境:阿里 centos7 python3.7
nohup python3 flower.py
报错: 'stats' inspect method failed
'active_queues' inspect method failed
。。。。。
nohup python3 worker.py
报错 :
/usr/local/python3/lib/python3.7/site-packages/celery/utils/imports.py:167: UserWarning: Cannot load celery.commands extension 'flower.command:FlowerCommand': ModuleNotFoundError("No module named 'flower.command'; 'flower' is not a package")
namespace, class_name, exc))
/usr/local/python3/lib/python3.7/site-packages/celery/platforms.py:801: RuntimeWarning: You're running the worker with superuser privileges: this is
absolutely not recommended!
Please specify a different user using the --uid option.
User information: uid=0 euid=0 gid=0 egid=0

对于配置了username和password的MongoClient对象连接的MongoDB的时候:pymongo.errors.OperationFailure: auth failed

mongo=MongoClient(host=MONGO_HOST,
port=MONGO_PORT,
username=MONGO_USERNAME,
password=MONGO_PASSWORD,
authSource=MONGO_DB,
connect=False)
对于上述初始化mongo对象的代码,需要在初始化MongoClient指定”authSource“字段为username这个账户对应有权限的数据库名称,否则会出现auth fail.具体原因:
一般我们设置MongoDB权限是针对某个db,auth也只能在这个db才能auth,否则就在admin那里auth,就会出现auth fail,对于MongoDB有权限控制的需要加这个 如果直接root账号密码 或者空账号密码那就不会出现我这个问题。

日志管理系统

目前日志管理仅仅是将各任务的日志文件简单展示到前端,现在需要集中管理日志,实现日志过滤、汇总、解析

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.