Code Monkey home page Code Monkey logo

webspider's Introduction

Build Status codecov Code Health License Python

-- --
Version 1.0.1
WebSite http://119.23.223.90:8000
Source https://github.com/JustForFunnnn/webspider
Keywords Python3, Tornado, Celery, Requests

Introduction

This project crawls job&company data from job-seeking websites, cleans the data, modelizes, converts, and stores it in the database. then use Echarts and Bootstrap to build a front-end page to display the IT job statistics, to show the newest requirements and trends of the IT job market.

Demo

You can input the keyword you are interested in into the search box, such as "Python", then click the search button, and the statistics of this keyword will show.

  • The first chart Years of Working(工作年限要求) is about the experience requirement of the Python, according to the data, the "3 ~ 5 years" is the most frequent requirement, then the following is 1 ~ 3 years (Chart Source Code)

  • The second chart Salary Range(薪水分布) is about the salary of the Python, according to the data, the "11k ~ 20k" is the most frequent salary provided, then the following is 21k ~ 35k (Chart Source Code)

and we also got charts:

Python Charts Example:

Alt text

Quick Start

This tutorial is based on Linux - Ubuntu, for other systems, please find the corresponding command

  • Clone the project
git clone [email protected]:JustForFunnnn/webspider.git
  • Install MySQL, Redis, Python3
# install Redis
apt-get install redis-server

# run Redis in background
nohup redis-server &

# install Python3
apt-get install python3

# install MySQL
apt-get install mysql-server

# start MySQL
sudo service mysql start
  • Config database and table
# create database
CREATE DATABASE `spider` CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

We still need to create the tables, copy the table definition SQL from tests/schema.sql and run it in MySQL

  • Build project
# after a successful build, some executable jobs will be generated under the path env/bin 
make
  • Run unit-test
make test
  • Run code style check
make flake8
  • Start web service
env/bin/web
  • Stat crawler
# run task scheduler/dispatcher
env/bin/celery_beat
# run celery worker for job data
env/bin/celery_lg_jobs_data_worker
# run celery worker for job count
env/bin/celery_lg_jobs_count_worker
  • Other jobs
# start crawl job count immediately
env/bin/crawl_lg_jobs_count
# start crawl job data immediately
env/bin/crawl_lg_data
# start celery monitoring
env/bin/celery_flower
  • Clean
# clean the existing build result
make clean

webspider's People

Contributors

justforfunnnn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

webspider's Issues

OSError: mysql_config not found

virtualenv -p /usr/bin/python3 env
Running virtualenv with interpreter /usr/bin/python3
Using base prefix '/usr'
New python executable in /home/tom/PycharmProjects/webspider/env/bin/python3
Not overwriting existing python script /home/tom/PycharmProjects/webspider/env/bin/python (you must use /home/tom/PycharmProjects/webspider/env/bin/python3)
Installing setuptools, pip, wheel...done.
echo "\n Use python virtual environment to install required packages......\n"

 Use python virtual environment to install required packages......

env/bin/pip install -e .
Obtaining file:///home/tom/PycharmProjects/webspider
Collecting tornado (from webspider==0.0.2)
Collecting gevent (from webspider==0.0.2)
  Using cached gevent-1.2.2-cp35-cp35m-manylinux1_x86_64.whl
Collecting gunicorn (from webspider==0.0.2)
  Using cached gunicorn-19.7.1-py2.py3-none-any.whl
Collecting ipython (from webspider==0.0.2)
  Using cached ipython-6.2.1-py3-none-any.whl
Collecting lxml (from webspider==0.0.2)
  Using cached lxml-4.1.1-cp35-cp35m-manylinux1_x86_64.whl
Collecting nose (from webspider==0.0.2)
  Using cached nose-1.3.7-py3-none-any.whl
Collecting requests (from webspider==0.0.2)
  Using cached requests-2.18.4-py2.py3-none-any.whl
Collecting coverage==4.0.3 (from webspider==0.0.2)
  Using cached coverage-4.0.3.tar.gz
Collecting flake8 (from webspider==0.0.2)
  Using cached flake8-3.5.0-py2.py3-none-any.whl
Collecting mysqlclient (from webspider==0.0.2)
  Using cached mysqlclient-1.3.12.tar.gz
    Complete output from command python setup.py egg_info:
    /bin/sh: 1: mysql_config: not found
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-iedwam3m/mysqlclient/setup.py", line 17, in <module>
        metadata, options = get_config()
      File "/tmp/pip-build-iedwam3m/mysqlclient/setup_posix.py", line 44, in get_config
        libs = mysql_config("libs_r")
      File "/tmp/pip-build-iedwam3m/mysqlclient/setup_posix.py", line 26, in mysql_config
        raise EnvironmentError("%s not found" % (mysql_config.path,))
    OSError: mysql_config not found

操作系统

tom@tom-VirtualBox:~/PycharmProjects/webspider$ sudo lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.3 LTS
Release:	16.04
Codename:	xenial

执行命令:

make

(1049, "Unknown database 'spider'")

不会 Web 开发,按 README 试着启动了下 bin/web, 访问 127.0.0.1:8000 出现错误:

Traceback (most recent call last):
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1122, in _do_get
    return self._pool.get(wait, self._timeout)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/util/queue.py", line 145, in get
    raise Empty
sqlalchemy.util.queue.Empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect
    return fn()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 387, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 766, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 516, in checkout
    rec = pool._do_get()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1138, in _do_get
    self._dec_overflow()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1135, in _do_get
    return self._create_connection()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 333, in _create_connection
    return _ConnectionRecord(self)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 461, in __init__
    self.__connect(first_connect_check=True)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 651, in __connect
    connection = pool._invoke_creator(self)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 105, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 393, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/MySQLdb/__init__.py", line 86, in Connect
    return Connection(*args, **kwargs)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/MySQLdb/connections.py", line 204, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
_mysql_exceptions.OperationalError: (1049, "Unknown database 'spider'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/tornado/web.py", line 1509, in _execute
    result = method(*self.path_args, **self.path_kwargs)
  File "/home/acgtyrant/Projects/webspider/app/web/handlers/keyword.py", line 14, in get
    keyword = KeywordController.get(name=keyword_name)
  File "/home/acgtyrant/Projects/webspider/app/controllers/keyword.py", line 9, in get
    return KeywordModel.get(name=name)
  File "/home/acgtyrant/Projects/webspider/app/model/keyword.py", line 17, in get
    return query.one_or_none()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2784, in one_or_none
    ret = list(self)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2855, in __iter__
    return self._execute_and_instances(context)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2876, in _execute_and_instances
    close_with_result=True)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2885, in _get_bind_args
    **kw
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/orm/query.py", line 2867, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 998, in connection
    execution_options=execution_options)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 1003, in _connection_for_bind
    engine, execution_options)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/orm/session.py", line 403, in _connection_for_bind
    conn = bind.contextual_connect()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2112, in contextual_connect
    self._wrap_pool_connect(self.pool.connect, None),
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2151, in _wrap_pool_connect
    e, dialect, self)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 1465, in _handle_dbapi_exception_noconnection
    exc_info
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
    raise value.with_traceback(tb)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2147, in _wrap_pool_connect
    return fn()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 387, in connect
    return _ConnectionFairy._checkout(self)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 766, in _checkout
    fairy = _ConnectionRecord.checkout(pool)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 516, in checkout
    rec = pool._do_get()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1138, in _do_get
    self._dec_overflow()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
    compat.reraise(exc_type, exc_value, exc_tb)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1135, in _do_get
    return self._create_connection()
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 333, in _create_connection
    return _ConnectionRecord(self)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 461, in __init__
    self.__connect(first_connect_check=True)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/pool.py", line 651, in __connect
    connection = pool._invoke_creator(self)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 105, in connect
    return dialect.connect(*cargs, **cparams)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 393, in connect
    return self.dbapi.connect(*cargs, **cparams)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/MySQLdb/__init__.py", line 86, in Connect
    return Connection(*args, **kwargs)
  File "/home/acgtyrant/Projects/webspider/.env/lib/python3.6/site-packages/MySQLdb/connections.py", line 204, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1049, "Unknown database 'spider'")

此外执行 bin/test 也出现了大量类似错误。

咋办!谢谢。

make的时候遇到: ImportError: No module named backports.configparser

您好,下面是我的报错信息

virtualenv -p /usr/bin/python3 env
Traceback (most recent call last):
File "/usr/bin/virtualenv", line 7, in
from virtualenv.main import run_with_catch
File "/usr/lib/python2.7/site-packages/virtualenv/init.py", line 3, in
from .run import cli_run
File "/usr/lib/python2.7/site-packages/virtualenv/run/init.py", line 11, in
from .plugin.activators import ActivationSelector
File "/usr/lib/python2.7/site-packages/virtualenv/run/plugin/activators.py", line 6, in
from .base import ComponentBuilder
File "/usr/lib/python2.7/site-packages/virtualenv/run/plugin/base.py", line 9, in
from importlib_metadata import entry_points
File "/usr/lib/python2.7/site-packages/importlib_metadata/init.py", line 16, in
from ._compat import (
File "/usr/lib/python2.7/site-packages/importlib_metadata/_compat.py", line 20, in
from backports.configparser import ConfigParser
ImportError: No module named backports.configparser
make: *** [python] Error 1

运行问题

启动任务之后 我在配置的数据库中没有看到爬取的数据?我看了下是不是因为constants.py中的访问链接没有配置导致?求答疑

missing lg_cites

Failure: ModuleNotFoundError (No module named 'webspider.crawlers.lg_cites') ... ERROR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.