Code Monkey home page Code Monkey logo

Comments (23)

cdhigh avatar cdhigh commented on August 20, 2024

在Oracle测试显示403错误(Forbidden),无法调试。

你可以协助定位,在Caibre配置里面添加,然后将错误log记录贴上来。

{
  "log_level": "debug"
}

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

已经加了这个参数,不过日志好像没有更具体:

[recipe_input.py:94] Failed to execute recipe "Laos RFA": list index out of range

[plumber.py:394] Failed to execute input plugin: All feeds are empty, aborting.

[worker.py:149] There are no new feeds available: admin: [Laos RFA]

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

那说明你的版本太老了,可以升级再试,我已经更新了部署脚本,重复升级不会再导致扣费了

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

好的,现在是不是不能再用筛选recipe语言的代码了?

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

是的,主要是我不想参数太复杂

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

"[news.py:1960] Failed feed: Laos RFA: Exception Cannot fetch https://www.rfa.org/english/news/laos_news/rss2.xml:403 [news.py:1957->...->news.py:1957->news.py:1957]"

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

我记起来了,之前的下标越界问题之前就修正了。
Rfa无法获取的原因是其网站有强力的反爬虫,KindleEar 只有比较简单的反爬虫欺骗手段,暂时无法突破其封锁。

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

官方提供的全文RSS也反爬虫吗?好吧……

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

它这个反爬虫是针对整个网站的,可能技术人员忘了将rss地址单独列为白名单。

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

哎,那没办法了。以前知乎转发器的原理怎么才能用到新版本呢。能否麻烦做个示范的recipe?感觉转发器还是可以解决不少ip地址被封锁的问题

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024
  1. 我已经更新代码,在http头里面再添加一行,更模拟浏览器访问,现在已经可以爬取rfa了。

  2. 如果你在其他服务器上搭建有转发器,在KindleEar新版本中也可以使用,比如
    假如你的转发器地址为 http://example.com,则修改recipe中的feed地址为下面的形式:

feeds = [
    ("Laos RFA", "http://example.com/?k=xzSlE&t=60&u=https://www.rfa.org/english/news/laos_news/rss2.xml"),
]

只是注意的一点是转发器仅针对反爬虫里面的“封禁IP”,反爬虫还有很多其他多样化的措施。
这个issue里面的rfa的反爬虫就不是封禁IP。

  1. 技术发展很快,现在的serverless技术路线更适合“转发”用途,简单来说就是一个javascript脚本,不需要配置服务器,cloudflare会将这个脚本自动部署到全球多个CDN服务器,我也更新了转发器仓库,添加cloudflare的worker实现。

  2. KindleEar可以搭配RSSHUB使用,在自己写RSS抓取代码前,可以先到RSSHUB搜索是否已经有特定的内容了。

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

太感谢了!这两天试试。最新的commit右侧显示红色的叉(failure),不知道会不会有影响?

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

那个是Github action的执行结果,现在表示项目文档自动更新失败(从*.md自动编译为*.html),和代码没有关系。

至于文档为何更新失败,可能某个编译依赖有变化或系统环境出错之类的,可能下次就好了。

PS:新的部署脚本每次执行都会抓取最新的Calibre Recipe,所以以后calibre的某些recipe有更新,重新执行部署脚本即可同步到你的项目。

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

明白了。刚才部署到了cloudflare,是否URL就是:

https://my-worker.subdomain.workers.dev/?k=xzSlE&t=timeout&u=URL

如何判断部署已经成功呢?

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

在dashboard里面有 "Visit" 按钮,就是链接,如果出错,可以看dashboard里面的logs

直接访问链接至少会返回

Auth Key is invalid!

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

刚才又试了一下,应该是可以了。

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

RFA源用了全文rss,推送到Kindle上图片说明的文字会超出Kindle边界,不知道为什么。

另外转发器打开纽约时报的链接是空白,比如:https://www.nytimes.com/2024/08/12/us/politics/us-china-working-group-trade.html

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024
  1. RFA的img越界是因为在html里面指定了width/height,我已经更新代码,将img的这些属性删除。
  2. 转发nytimes其实已经成功了,只是nytimes反爬虫机制发挥作用了,返回的源码中有提示:
Please enable JS and disable any ad blocker

毕竟我们这个转发器是一个轻量化定制化的工具,不是完整的代理服务器,所以应用场景比较有限,
比如如果返回的html内容的图像文件是相对路径的话,就无法获取到图像,这时候可能你需要重载 BasicNewsRecipe 的函数 image_url_processor() 来返回正确的图像url。

不管怎么样,区区几十行代码就可以解决我们面临的特定问题,还算一个好工具吧。

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

这两天Calibre好像更新了抓取的代码,看到mobileread作者和网友讨论Science recipe时提到会更新:https://www.mobileread.com/forums/showthread.php?t=362642

昨天还有更新,不知道对KE有无帮助:kovidgoyal/calibre@e8453ed

Screenshot_20240815_223334.jpg

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

Qt是桌面技术,不能在服务器环境使用。

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

经过搜索,发现兼容requests的库niquests已经支持http1.1/2/3,所以之后如果确实有需求,改一行代码就可以了。

from kindleear.

Steven630 avatar Steven630 commented on August 20, 2024

Science的recipe应该需要支持才能显示图片,Calibre作者这两天刚上传的这个recipe,应该是新版已经解决了这个问题。另外不知道NYT能不能靠此解决

from kindleear.

cdhigh avatar cdhigh commented on August 20, 2024

好的,等他发布稳定版本,然后你需要哪个recipe在KindleEar里面无法获取,可以提issue,我看是否能解决。

from kindleear.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.