Code Monkey home page Code Monkey logo

zhihu-download's Introduction

Hi there, I'm Luda Chen 👋

Email ZhiHu

chenluda's GitHub stats

Welcome to my GitHub. I am currently working on my Master's degree in Computer Application Technology at Kunming University of Science and Technology. My research direction is medical image analysis, and I like to code some interesting tools in my spare time. If you have any questions on my projects, please feel free to send me an email.

zhihu-download's People

Contributors

chenluda avatar forfudan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

zhihu-download's Issues

几点问题和建议

问题:

  • 今天(20231017)的代码版本有误 疑似main.py的第46行 重复粘贴
  • 有些文章图片为png?,无法下载,可以加个if判断

建议:

  • 可不可以考虑在文件前面加时间,有些专栏有几百篇,只按标题顺序很乱
  • 源代码中对文件名称(不能以数字开头)和长度进行了限制,我这边测试下来没有什么问题,不知道作者是出于什么考虑限制的
  • 考不考虑增加休眠时间,避免被反扒干掉。并记录当前传输到哪,即断点续传
  • 考不考虑对跳转链接进行优化,现在的跳转连接都是“https://link.zhihu.com/?target=”,完全可以去掉

Prob

好像没法用了,显示The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

报错信息

  • Running on http://127.0.0.1:5000
    Press CTRL+C to quit
    127.0.0.1 - - [06/Mar/2024 11:00:27] "GET / HTTP/1.1" 200 -
    127.0.0.1 - - [06/Mar/2024 11:00:27] "GET / HTTP/1.1" 200 -
    127.0.0.1 - - [06/Mar/2024 11:00:27] "GET /favicon.ico HTTP/1.1" 404 -
    [2024-03-06 11:00:41,260] ERROR in app: Exception on / [POST]
    Traceback (most recent call last):
    File "D:\Software\Anaconda\envs\auto\lib\site-packages\flask\app.py", line 2190, in wsgi_app
    response = self.full_dispatch_request()
    File "D:\Software\Anaconda\envs\auto\lib\site-packages\flask\app.py", line 1486, in full_dispatch_request
    rv = self.handle_user_exception(e)
    File "D:\Software\Anaconda\envs\auto\lib\site-packages\flask\app.py", line 1484, in full_dispatch_request
    rv = self.dispatch_request()
    File "D:\Software\Anaconda\envs\auto\lib\site-packages\flask\app.py", line 1469, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
    File "C:\Users\Winter\Desktop\Auto\blog2md\zhihu2md\new\zhihu-download-main\app.py", line 21, in index
    markdown_title = judge_zhihu_type(url)
    File "C:\Users\Winter\Desktop\Auto\blog2md\zhihu2md\new\zhihu-download-main\main.py", line 81, in judge_zhihu_type
    title = parse_zhihu_article(url, hexo_uploader)
    File "C:\Users\Winter\Desktop\Auto\blog2md\zhihu2md\new\zhihu-download-main\main.py", line 260, in parse_zhihu_article
    author = soup.select_one('div.AuthorInfo').find(
    AttributeError: 'NoneType' object has no attribute 'find'
    127.0.0.1 - - [06/Mar/2024 11:00:41] "POST / HTTP/1.1" 500 -

internal server Error

Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

状态码 500

好像不能用的,我下载链接地址为:https://zhuanlan.zhihu.com/p/636270877

KeyError: 'href'

在保存https://www.zhihu.com/question/362131975/answer/2182682685 时出现以下报错:
Traceback (most recent call last):
File "D:\anaconda3\Lib\site-packages\flask\app.py", line 2529, in wsgi_app
response = self.full_dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "D:\anaconda3\Lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\wyt\Downloads\zhihu-download-main\app.py", line 21, in index
markdown_title = judge_zhihu_type(url)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\wyt\Downloads\zhihu-download-main\main.py", line 77, in judge_zhihu_type
title = parse_zhihu_answer(url, hexo_uploader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\wyt\Downloads\zhihu-download-main\main.py", line 286, in parse_zhihu_answer
markdown_title = save_and_transform(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\wyt\Downloads\zhihu-download-main\main.py", line 157, in save_and_transform
original_url = link['href']
~~~~^^^^^^^^
File "D:\anaconda3\Lib\site-packages\bs4\element.py", line 1573, in getitem
return self.attrs[key]
~~~~~~~~~~^^^^^
KeyError: 'href'
127.0.0.1 - - [11/Mar/2024 16:18:49] "POST / HTTP/1.1" 500 -

支持知乎数学公式

转换之后,数学公式并不会被美元号 $ 括住,导致看到的数学公式是 latex 源码。

数学公式似乎是由这类元素表示的:

<span class="ztext-math" ...>...</span>

只能下载专栏的前10篇文章

发现当专栏的文章总数多于10篇时,只能成功下载前10篇,发现问题出在“parse_zhihu_column()”函数里从第二循环开始,offset=10或更大的数字时,得到的response是空值,不知道是不是被知乎反爬了。

BUG FIX

When I ran app.py, there went errors:

NameError: name 'Flask' is not defined
NameError: name 'request' is not defined
NameError: name 'render_template' is not defined
NameError: name 'send_file' is not defined

This bug can be fixed by adding this line in app.py(line 5)

from flask import Flask, request, render_template, send_file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.