Code Monkey home page Code Monkey logo

python_scripts's Introduction

python_scripts's People

Contributors

lzjun avatar lzjun567 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python_scripts's Issues

(unicode error) 'utf-8'

File "crawler.py", line 35
"""
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xc5 in position 5: invalid continuation byte

新浪微博API返回的不是JSON

运行 heart.pyresponse.json()[0] 报错:

raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

新浪微博API返回的是html,并不是json格式,因此报错

图片无法下载

爬取我想要的代码时 对于网页里的所有图片 全部都是failed to load

list index out of range

Traceback (most recent call last): File "crawler.py", line 163, in <module> crawler.run() File "crawler.py", line 90, in run for index, url in enumerate(self.parse_menu(self.request(self.start_url))): File "crawler.py", line 116, in parse_menu menu_tag = soup.find_all(class_="uk-nav uk-nav-side")[1]

一点小错误

在windows10下pip需要安装beautifulsoup4 不加4 默认安装的是3.

在win ubuntu全提示字符错误

windows
ERROR:root:瑙f瀽閿欒 Traceback (most recent call last): File "crawler.py", line 56, in parse_url_to_html html = html.encode("utf-8") UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 134: ordinal not in range(128)

ubuntu
ERROR:root:解析错误 Traceback (most recent call last): File "crawler.py", line 56, in parse_url_to_html html = html.encode("utf-8") UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 134: ordinal not in range(128) ERROR:root:解析错误 Traceback (most recent call last): File "crawler.py", line 56, in parse_url_to_html html = html.encode("utf-8")

OSError: wkhtmltopdf exited with non-zero code -6. error:

Traceback (most recent call last):
File "crawler.py", line 165, in
crawler.run()
File "crawler.py", line 99, in run
pdfkit.from_file(htmls, self.name + ".pdf", options=options)
File "/usr/local/lib/python3.4/dist-packages/pdfkit/api.py", line 49, in from_file
return r.to_pdf(output_path)
File "/usr/local/lib/python3.4/dist-packages/pdfkit/pdfkit.py", line 159, in to_pdf
raise IOError("wkhtmltopdf exited with non-zero code {0}. error:\n{1}".format(exit_code, stderr))
OSError: wkhtmltopdf exited with non-zero code -6. error:
The switch --outline-depth, is not support using unpatched qt, and will be ignored.QXcbConnection: Could not connect to display

list index out of range

python3 crawler.py
Traceback (most recent call last):
File "crawler.py", line 163, in
crawler.run()
File "crawler.py", line 90, in run
for index, url in enumerate(self.parse_menu(self.request(self.start_url))):
File "crawler.py", line 116, in parse_menu
menu_tag = soup.find_all(class_="uk-nav uk-nav-side")[1]
IndexError: list index out of range

廖雪峰博客转pdf,运行出错

错误信息:OSError: wkhtmltopdf exited with non-zero code 1. error:
You need to specify at least one input file, and exactly one output file

请问,pdfkit是根据什么自动生成目录的?我修改代码后,生成的pdf文件没有生成目录

创建PDF时出错

Traceback (most recent call last):
File "crawler.py", line 165, in
crawler.run()
File "crawler.py", line 99, in run
pdfkit.from_file(htmls, self.name + ".pdf", options=options)
File "/usr/local/lib/python3.5/dist-packages/pdfkit/api.py", line 49, in from_file
return r.to_pdf(output_path)
File "/usr/local/lib/python3.5/dist-packages/pdfkit/pdfkit.py", line 156, in to_pdf
raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
The switch --outline-depth, is not support using unpatched qt, and will be ignored.Error: This version of wkhtmltopdf is build against an unpatched version of QT, and does not support more then one input document.
Exit with code 1, due to unknown error.

后半部分的好多图片下载失败,是不是wkhtmltopdf分配的缓存太小了

周半部分的好多图片下载失败,是不是wkhtmltopdf分配的缓存太小了。因为失败的图片总是后半部分的图。而错误信息也没有提示什么有用的:
Traceback (most recent call last):
File "crawler.py", line 165, in
crawler.run()
File "crawler.py", line 99, in run
pdfkit.from_file(htmls, self.name + ".pdf", options=options)
File "D:\Program Files\Python36\lib\site-packages\pdfkit\api.py", line 49, in from_file
return r.to_pdf(output_path)
File "D:\Program Files\Python36\lib\site-packages\pdfkit\pdfkit.py", line 156, in to_pdf
raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Loading pages (1/6)
Warning: Failed to load file:///static/img/404.png (ignore)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done
Exit with code 1 due to network error: ProtocolUnknownError

关于图片正则表达式的错误的纠正

        def func(m):
            if not m.group(3).startswith("http"):
                rtn = m.group(1) + get_domain(url) + "/" + m.group(2) + m.group(3)
                #rtn = m.group(1) + domain + m.group(2) + m.group(3)
                return rtn
            else:
                return m.group(1) + m.group(2) + m.group(3)
        
        html = re.compile(pattern).sub(func, html)

我发现里面有问题,于是修改为

qq 20170526000446

大家可以看下 https://regex101.com/ 的测试效果
m.group(2) 才是匹配那个网址哦
qq 20170526000553
所以并不是错误的 m.group(3) 那个只是匹配到

而我看不懂那个正侧替换,查参考资料官方是
re.sub(pattern, repl, string, count=0, flags=0)

repl是字符串 或者 函数

Traceback (most recent call last): File "crawler.py", line 165, in <module> crawler.run() File "crawler.py", line 99, in run pdfkit.from_file(htmls, self.name + ".pdf", options=options) File "/home/kong/.virtualenvs/Py3/lib/python3.5/site-packages/pdfkit/api.py", line 49, in from_file return r.to_pdf(output_path) File "/home/kong/.virtualenvs/Py3/lib/python3.5/site-packages/pdfkit/pdfkit.py", line 159, in to_pdf raise IOError("wkhtmltopdf exited with non-zero code {0}. error:\n{1}".format(exit_code, stderr)) OSError: wkhtmltopdf exited with non-zero code 1. error: Loading pages (1/6) [========> ] 14% (wkhtmltopdf:13716): Gtk-WARNING **: cannot open display:

ImportError: No module named 'pdfkit'

root@raspberrypi:/home/pi/python/crawler_html2pdf/pdf# python3 crawler.py
Traceback (most recent call last):
File "crawler.py", line 14, in
import pdfkit
ImportError: No module named 'pdfkit'

这是为啥?

runoob2pdf 里面的报错 OSError: No wkhtmltopdf executable found: "b''"

OSError: No wkhtmltopdf executable found: "b''"

那个报错
我环境变量里加的是 D:\Program Files\wkhtmltopdf\bin\
以为是\b这个在python里解析出错的造成的,于是去改成 D:\\Program Files\\wkhtmltopdf\\bin\\ 还是不行。
我参考了老外的问答
http://stackoverflow.com/questions/27673870/cant-create-pdf-using-python-pdfkit-error-no-wkhtmltopdf-executable-found
改成

config = pdfkit.configuration(wkhtmltopdf=r"D:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe")
pdfkit.from_file(htmls, file_name, options=options, configuration=config)

就可以正常运行了

请问只能这样处理吗?

用python3运行报错

Traceback (most recent call last):
File "crawler.py", line 56, in parse_url_to_html
f.write(html)
TypeError: a bytes-like object is required, not 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "crawler.py", line 119, in
main()
File "crawler.py", line 108, in main
htmls = [parse_url_to_html(url, str(index) + ".html") for index, url in enumerate(urls)]
File "crawler.py", line 108, in
htmls = [parse_url_to_html(url, str(index) + ".html") for index, url in enumerate(urls)]
File "crawler.py", line 60, in parse_url_to_html
print(e.message)
AttributeError: 'TypeError' object has no attribute 'message'

win7转pdf时候提示wkhtmltopdf reported an error

  File "crawler.py", line 163, in <module>
    crawler.run()
  File "crawler.py", line 97, in run
    pdfkit.from_file(htmls, self.name + ".pdf", options=options)
  File "C:\Anaconda3\envs\py3-dj\lib\site-packages\pdfkit\api.py", line 49, in from_file
    return r.to_pdf(output_path)
  File "C:\Anaconda3\envs\py3-dj\lib\site-packages\pdfkit\pdfkit.py", line 156, in to_pdf
    raise IOError('wkhtmltopdf reported an error:\n' + stderr)
OSError: wkhtmltopdf reported an error:
Loading pages (1/6)
Warning: Failed to load http://www.liaoxuefeng.comhttp//service.t.sina.com.cn/widget/qmd/1658384301/078cedea/2.png (ignore)
Counting pages (2/6)
Resolving links (4/6)
Loading headers and footers (5/6)
Printing pages (6/6)
Done
Exit with code 1 due to network error: ProtocolUnknownError

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.