Code Monkey home page Code Monkey logo

lagoujob's Introduction

Data analysis of Lagou

LagouIcon

Introduction

This repository is designed for job data analysis of Lagou. The main function it includes is listed here:

  1. Crawl job data from Lagou, and get the latest info of jobs
  2. Data analysis and visualize
  3. Crawl job details info and generate word cloud as Job Impression
  4. In order to train a NLP task with machine learning, the data of interviewee's comments will be stored in mongodb

Prerequisites

  1. Install 3rd party libraries

    pip install -r requirements.txt
    
  2. Install mongodb and start mongodb service

    sudo service mongod start
    

Basic Usage

  1. clone this project from github
  2. change the file path in source code
  3. run lagou_spider.py to get job data and output them with a Excel file
  4. run hot_words.py to cut sentences, and return TOP30 hot words

Analysis Results

Image1 Image2 Image3 Image4 Image5 Image6 Image7

Report

For more information, please visit my answer at Zhihu.
In addition, there is an another repository which may help you!
The PPT report can be found here.

One more thing

Inspired by Google IO 2017. We've gotten the data, but how can we make deeper analysis instead of just doing simple statistics. With the help of Machine Learning, we can make full use of these data.

Here are several insights I have thought yet.

  • To train a model with machine learning algorithm and judge which company deserves your entrance. This article describe the basic job data mining with machine learning.
  • More features are being developed ~
  • If your are interested in machine learning or data mining, welcome to join us!

lagoujob's People

Contributors

lucasxlu avatar m2shad0w avatar

Watchers

Vinny Wang avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.