Code Monkey home page Code Monkey logo

cartoon-cat's Introduction

cartoon-cat

小喵发现 tazhe.com 这个网站访问不了了。不知道之后会不会重新开放。那么这么项目暂时就不能正常工作了。。。 不过其他的漫画网站的结构其实和 tazhe漫画 是差不多的。大家可以参考博客,做一些小的修改就能爬取其他的网站了。

补充: 最近又发现了一个网站 36mh.com,正好有想看的一拳超人,于是乎就又改了一下这个工具,居然还可以用,哈哈。 只需要修改一下css选择器和结束判断,很爽!

漫画喵

使用selenium + PhantomJs搭建的简单漫画爬虫工具。

博客地址: https://www.miaoerduo.com/2017/02/19/cartoon-cat-client/

可以用于抓取 https://m.36mh.com 的漫画资源。(这个是手机版的页面,比较容易分析)

需要selenium和浏览器的支持。想试用的童鞋可以看看上述的博客,里面介绍了具体的环境要求。

使用:

https://m.36mh.com 上搜索漫画,例如:一拳超人

找到相应的漫画,进入。记住漫画的首页地址,这里是:https://m.36mh.com/manhua/yiquanchaoren/

参考demo.py,设置相应的参数:

#-*- coding: utf-8 -*-

import cartoon_cat as cc

if __name__ == '__main__':

    site = 'https://m.36mh.com/manhua/yiquanchaoren/'

    crawler = cc.CartoonCat(
        site=site,                                  # 漫画首页
        begin=0,                                    # 起始章节
        end=-1,                                     # 结束章节,为负数表明不设结束章节
        save_folder='/path/to/download',            # 保存路径,不存在会自动创建
        browser=cc.BrowserType.PHANTOMJS,           # 浏览器类型:FIREFOX,CHROME,SAFARI,IE,PHANTOMJS
        driver='path/to/phantomjs')                 # 驱动程序路径,firefox不需要
                                                    #   其他的可以从 https://pypi.python.org/pypi/selenium 下载
    crawler.start()

cartoon-cat's People

Contributors

miaoerduo avatar spacedouble7 avatar xuehebinglan avatar yarving avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cartoon-cat's Issues

一些我做的修改

运行demo.py会报错,(不搞花里胡哨的)直接进cartoon_cat.py最后加上:
#######################################################
if name == 'main':
site = 'https://m.36mh.com/manhua/congqianyouzuolingjianshan/'
crawler = CartoonCat(
site=site, # 漫画首页
begin=0, # 起始章节
end=99, # 结束章节
save_folder='./download0-99', # 保存路径,不存在会自动创建
# browser=BrowserType.CHROME, # 浏览器类型:FIREFOX,CHROME,SAFARI,IE,PHANTOMJS
browser=BrowserType.PHANTOMJS, # 浏览器类型:FIREFOX,CHROME,SAFARI,IE,PHANTOMJS
# driver='./chromedriver.exe' # 驱动程序路径,firefox不需要
driver = './phantomjs.exe' # 驱动程序路径,firefox不需要
)
crawler.start()
##########################################################
运行cartoon_cat.py
继续报错:
将phantomjs.exe或者chromedriver.exe复制到当前文件夹中
PS:chromedriver.exe居然要科学上网才能下载
还报错:
import urllib2 改为import urllib.request as urllib2
再运行cartoon_cat.py,然后就快乐的下载起来了
以上是我遇到问题及解决办法

换一个网站该怎么做?

https://m.gufengmh8.com/manhua/congqianyouzuolingjianshan
这个网站就是一个章节结束他不会弹本章结束或者返回目录页面,而是直接进入下一页,这种该修改哪些地方来判断一章结束?
我写了个如下的:(下一张按钮在最后一页指向和其他页不一样)
##################################################
action_list = self.__browser.find_elements_by_css_selector('.action_list ul li a')
print(len(action_list))
next_bottom_url = action_list[2].get_attribute('href')
if not next_bottom_url.endswith("nextChapter();"):
image_div.click()
else:
self.__browser.get(self.__site)
#####################################################
但会报错,我是仿照你的写的,find_elements_by_css_selector不知道咋用

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.