The public-no-crawler from hongliangbest

What is weixin_crawler?

weixin_crawler是一款使用Scrapy、Flask、Echarts、Elasticsearch等实现的微信公众号文章爬虫，自带分析报告(报告样例)和全文检索功能，几百万的文档都能瞬间搜索。weixin_crawler设计的初衷是尽可能多、尽可能快地爬取微信公众的历史发文

如果你想先看看这个项目是否有趣，这段不足3分钟的介绍视频一定是你需要的：

语言		Python3.6
前端	web框架	Flask / Flask-socketio / gevent
	js/css库	Vue / Jquery / W3css / Echarts / Front-awsome
后端	爬虫	Scrapy
	存储	Mongodb / Redis
	索引	Elasticsearch

友情提醒：weixin_crawler尚未在Mac和Linux下尝试运行。如果想尽快看到结果，请优先使用win系统尝试

1.downlaod mongodb / redis / elasticsearch from their official sites and install them
2.run them at the same time under the default configuration【使用默认配置和默认端口运行】. In this case mongodb is localhost:27017 redis is localhost:6379(or you have to config in weixin_crawler/project/configs/auth.py)
3.Inorder to tokenize Chinese, elasticsearch-analysis-ik have to be installed for Elasticsearch【安装中文分词】

1.install nodejs and then npm install anyproxy and redis in weixin_crawler/proxy

cd proxy  
node proxy.js

3.install anyproxy https CA in both computer and phone side【在电脑和手机上安装https ca】
- 【Note】:if you are not sure how to use anyproxy, here is the doc

NOTE: you may can not simply type pip install -r requirements.txt to install every package【不能简单的通过pip install -r requirements.txt安装所有依赖】, twisted is one of them which is needed by scrapy【因为其中的某个被scrapy所依赖】. When you get some problems about installing python package(twisted for instance), here always have a solution——downlod the right version package to your drive and run $ pip install package_name
I am not sure if your python enviroment will throw other package not found error, just install any package that is needed【不确定你特定的环境中会否有其他错误，自行解决】

scrapy Python36\Lib\site-packages\scrapy\http\request\ _init_.py
--weixin_crawler\source_code\request\__init__.py
scrapy Python36\Lib\site-packages\scrapy\http\response\ _init_.py --weixin_crawler\source_code\response\_init_.py
pyecharts Python36\Lib\site-packages\pyecharts\base.py --weixin_crawler\source_code\base.py. In this case function get_echarts_options is added in line 106

Install adb and add it to your path(windows for example)【将adb添加到PATH中】
install android emulator(NOX suggested) or plugin your phone and make sure you can operate them with abd from command line tools【安装安卓模拟器，或者在手机上安装可以使用adb命令行操作的插件】
If mutiple phone are connected to your computer you have to find out their adb ports which will be used to add crawler【如果多个手机连接到你的电脑，需要指定要使用的adb端口】
adb does not support Chinese input, this is a bad news for weixin official account searching. In order to input Chinese, adb keyboard has to be installed in your android phone and set it as the default input method, more is here【adb不支持中文输入，需要安装ADBKeyBoard】

Why could weixin_crawler work automatically? Here is the reason【为啥该爬虫可以自动化工作】:

If you want to crawl a wechat official account, you have to search the account in you phone and click its "全部消息" then you will get a message list , if you roll down more lists will be loaded. Anyone of the messages in the list could be taped if you want to crawl this account's reading data【如果想要爬取某个公众号，需要在手机上搜索且点击“全部信息”，下滑可以加载更多的信息，点击任意一条消息即可爬取】
If a nickname of a wechat official account is given, then wexin_crawler operate the wechat app installed in a phone, at the same time anyproxy is 'listening background'...Anyway weixin_crawler get all the request data requested by wechat app, then it is the show time for scrapy【如果给定了公众号的昵称，该爬虫也会操作手机上的app】
As you supposed, in order to let weixin_crawler operate wechat app we have to tell adb where to click swap and input, most of them are defined in weixin_crawler/project/phone_operate/config.py【如你所料，为了让该爬虫能够操作wechat app，需要在weixin_crawler/project/phone_operate/config.py配置号】. BTW phone_operate is responsible for wechat operate just like human beings, its eyes are baidu OCR API and predefined location tap area, its fingers are adb【顺便说一下，手机操作模拟人类操作，使用baidu ocr识别，adb进行点击】