Code Monkey home page Code Monkey logo

weibocrawler's Introduction

介绍

此项目主要目的是以一种更简单更快捷的方式爬取微博信息,并用于情感分析。不仅限于爬取评论, 接下来将会爬取个人信息以及其他相关的内容,可配置。 和其他微博爬虫项目不同,此项目旨在使用更加简单的配置,容易移植的环境。同时对PY2和PY3有良好的支持 项目在开始阶段,后续将会陆续push,如果你是个新手,也对此感兴趣,不妨 fork,我们一起来 complete it!

1、实现功能

简单的使用代理爬取新浪微博用户特定的微博评论数据以及CCPL的数据。

2、流程图

线程1(读)——代理——请求——解析——消息队列——线程2(写)——入库 流程图加载失败!

3、保存文件

先处理微博被评论的时间和真实时间的关系,比如20分钟前、40秒前等.. 然后分割大文件——按500条一个小文件保存。 文件示例内容加载失败!

weibocrawler's People

Contributors

0x00t0x7f avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.