oa414 / appcrawler Goto Github PK
View Code? Open in Web Editor NEWSpider for extract Android' app infomation in App Market
Home Page: https://github.com/oa414/AppCrawler
Spider for extract Android' app infomation in App Market
Home Page: https://github.com/oa414/AppCrawler
你好。你的这个项目非常有意义,我也有相同的需求。
我是一个Python初学者。对于代码部分有些疑问。
我安装了Python3.5的版本,我暂时还没弄懂如何启动MongoDB,所以先先把结果保存为csv文件:
scrapy crawl google -o test.csv JOBDIR=app/jobs
但是我得到如下错误信息:
ImportError: No module named 'sgmllib'
我在网上查找原因,得知SgmlLinkExtractor & LinkExtractor都需要sgmllib的支持。而Python3.0不支持sgmllib。所以我是不是需要重新安装Python2.7的环境?还有别的替代方法吗?
另外我也很好奇,在Google Play “Viber”页面下,获取app id和下载次数后,爬虫又是如何去爬下一个App的,这个循环是如何实现的?
rules = [
Rule(LinkExtractor(allow=("https://play.google.com/store/apps/details", )), callback='parse_app',follow=True),
] # CrawlSpider 会根据 rules 规则爬取页面并调用函数进行处理
这一段代码看不明白。
我自己可以使用BeautifulSoup+Request爬取某一个App的名称和下载量信息,但是我做不到爬取所有App的一个循环,也没办法让任务中断后,可接着爬取,而不用重新开始。
爬取googleplay不用翻墙代理吗?googleplay是动态加载的只用scrapy可以是实现吗 ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.