Code Monkey home page Code Monkey logo

zhihuhelp_archived's People

Contributors

daitingting avatar knarfeh avatar yaozeyuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zhihuhelp_archived's Issues

更新后不能用

异常原因:[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

基本不能用,都是这问题

抓取答案内容时,忽略了一种图片格式,导致程序异常

例如这个问题:十行以内,你写过哪些比较酷的 Mathematica 代码? 中用到了一个公式,写出来是这样的(n,m,l)=(4,0,3)

根据dict2Html.py中的

def imgFix(self, content):
        for imgTag in re.findall(r'<img.*?>', content):
            src = re.search(r'(?<=src=").*?(?=")', imgTag)
...

def getFileName(self, imgHref = ''):
        return imgHref.split('/')[-1]

得到的fileName为:equation?tex=%28n%2Cm%2Cl%29%3D%284%2C0%2C3%29,无法建立图片文件,如:
imgpool
程序运行结果为:
result

编写代码文档

编写代码文档,主要介绍epub库和zhihu_parser库的使用,以及知乎助手的架构思路

答案中代码块的显示问题

self.content = content.replace('\r', '').replace('\n', '')  

是不是用这一行代码把缩进什么的都删掉了?这样的话,如果答案中有缩进的代码就会比较难看,比如:
default
理想的是这样的:
default

我也在想办法解决,不知道有什么坑需要注意?

UnicodeEncodeError: 'charmap'

报错了,新安装的 python 2.7.8
操作系统是 Windows 7 64 Bit

C:\Users\Administrator\Desktop\1.7.3.7>python --version
Python 2.7.8

C:\Users\Administrator\Desktop\1.7.3.7>python zhihuHelp.py
Traceback (most recent call last):
  File "zhihuHelp.py", line 11, in <module>
    helper.start()
  File "C:\Users\Administrator\Desktop\1.7.3.7\src\main.py", line 47, in start
    self.check_update()
  File "C:\Users\Administrator\Desktop\1.7.3.7\src\main.py", line 99, in check_u
pdate
    print   u"检查更新。。。"
  File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-6: cha
racter maps to <undefined>

为a标签加上break-word属性

在阅读@湖玛 Humar的回答集锦时,有一行a链接过长,直接把页面撑开了。
需要在css里加上word-wrap: break-word属性进行限制

下载最新源码运行失败

python zhihuHelp.py
Traceback (most recent call last):
File "zhihuHelp.py", line 6, in
import bs4
ImportError: No module named bs4

从上面得出少了bs4模块,我的是OS X系统,怎么导入呢

修正Config.py

比如说,属性用dict.key()遍历,别用dir,然后规范下方法名,现在的方法名还是不够正式

把RawBook处理下

这名字起的- -
有时间就把这个类重新处理下,哪怕改成全命令式也无妨。现在这样子让人看起来很不舒服。

div.content img的图片宽度有问题

目前设置值为100%,即放大到全屏显示,这样会导致一些较小的图片显示异常,应当改为max-width:100%,以避免这个问题

无法生成成功

匹配用户提问数/回答数/专栏数/收藏夹数/公共编辑数失败
错误内容:
need more than 0 values to unpack
超时页面http://www.zhihu.com/people/qiao-er-53/answers?order_by=vote_num&page=49
正在读取答案页面,还有3/67张页面等待读取
正在读取答案页面,还有3/67张页面等待读取
打开网页超时
超时页面http://www.zhihu.com/people/qiao-er-53/answers?order_by=vote_num&page=25
答案录入数据库成功
匹配用户提问数/回答数/专栏数/收藏夹数/公共编辑数失败
错误内容:
need more than 0 values to unpack
匹配用户关注数/被关注数失败
错误内容:
need more than 0 values to unpack
匹配用户赞同数/感谢数/被收藏数/被分享数失败
错误内容:
need more than 0 values to unpack
正在读取答案页面,还有2/67张页面等待读取
正在读取答案页面,还有1/67张页面等待读取
答案录入数据库成功
没有收集到指定问题
错误信息:
'NoneType' object has no attribute 'getitem'

这是该链接:http://www.zhihu.com/people/qiao-er-53/answers

异常

Exception in thread Thread-83:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.target(self.args, *self.kwargs)
File "/Users/oyc/Desktop/zhihuhelp1.7.1.5/codes/epubBuilder/imgDownloader.py", line 76, in worker
imgFile = open(self.targetDir + fileName, 'wb')
IOError: [Errno 63] File name too long: u'./\u77e5\u4e4e\u56fe\u7247\u6c60/equation?tex=%5Clim
%7Bn+%5Crightarrow+%5Cinfty+%7D%7BS
%7Bn%7D+%7D+%3D%5Clim
%7Bn+%5Crightarrow+%5Cinfty+%7D%7B%5Cfrac%7Bb
%7Bn%2B1%7D+-b
%7Bn%7D+%7D%7Ba
%7Bn%2B1%7D+-a
%7Bn%7D+%7D+%7D+%3D%5Clim
%7Bn+%5Crightarrow+%5Cinfty+%7D%5Cfrac%7B%5Cln%5Cfrac%7B%28%28n%2B1%29%21%29%5E%7Bn%2B2%7D+%7D%7B%28%5Cprod
%7Bi%3D0%7D%5E%7Bn%2B1%7D%28i%21%29+%29%5E%7B2%7D+%7D+-%5Cln%5Cfrac%7B%28n%21%29%5E%7Bn%2B1%7D+%7D%7B%28%5Cprod
%7Bi%3D0%7D%5E%7Bn%7D%28i%21%29+%29%5E%7B2%7D+%7D+%7D%7B%28n%2B1%29%5E%7B2%7D-n%5E%7B2%7D++%7D+'

今天更新版本(1.7.1.5)后出现问题,之前没问题(1.7.1.4)

正在制作第1本电子书的第1节
Traceback (most recent call last):
File "zhihuHelp.py", line 8, in
mainClass.helperStart()
File "/Users/oyc/Desktop/zhihuhelp1.7.1.5/codes/main.py", line 83, in helperStart
urlInfo = self.getUrlInfo(rawUrl)
File "/Users/oyc/Desktop/zhihuhelp1.7.1.5/codes/main.py", line 202, in getUrlInfo
urlInfo['worker'] = AuthorWorker(conn = self.conn, urlInfo = urlInfo)
File "/Users/oyc/Desktop/zhihuhelp1.7.1.5/codes/worker.py", line 30, in init
self.setCookie()
File "/Users/oyc/Desktop/zhihuhelp1.7.1.5/codes/worker.py", line 83, in setCookie
cookieStr = Var[0]
TypeError: 'NoneType' object has no attribute 'getitem'

下载私人收藏夹时网址分析器会报错退出

Traceback (most recent call last):
File "D:/MyDocument/Documents/GitHub/ZhihuHelp__Python/zhihuhelp1.7.0/zhihuHelp.py", line 15, in
mainClass.helperStart()
File "D:\MyDocument\Documents\GitHub\ZhihuHelp__Python\zhihuhelp1.7.0\codes\main.py", line 103, in helperStart
collectionWorker.start()
File "D:\MyDocument\Documents\GitHub\ZhihuHelp__Python\zhihuhelp1.7.0\codes\worker.py", line 282, in start
self.leader()
File "D:\MyDocument\Documents\GitHub\ZhihuHelp__Python\zhihuhelp1.7.0\codes\worker.py", line 309, in leader
self.catchFrontInfo()
File "D:\MyDocument\Documents\GitHub\ZhihuHelp__Python\zhihuhelp1.7.0\codes\worker.py", line 463, in catchFrontInfo
infoDict = parse.getInfoDict()
File "D:\MyDocument\Documents\GitHub\ZhihuHelp__Python\zhihuhelp1.7.0\codes\contentParse.py", line 376, in getInfoDict
1].a.get_text())
IndexError: list index out of range

下一版改正之

生成电子成功但是内容有错

抓取的地址为:http://zhuanlan.zhihu.com/qinchao

“This page contains the following errors:
error on line 67 at column 7: Opening and ending tag mismatch: img line 0 and div
Below is a rendering of the page up to the first error.”

摘录来自: ZhihuHelp1.7.0. “专栏_覃超帝国兴亡史 - 在希望的田野上(qinchao)_知乎回答集锦”。 iBooks. 

添加设置项

添加扩展设置项,实现按赞同,字数、只取每个问题下赞同数前10个回答等条件筛选答案的功能

hack掉多看电子书样式

多看表示知乎周刊的样式是用多看私有的图书制作软件做的,使用的是其私有技术,没有示例书籍。

只能手工hack了

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.