renyunkang / yuque-exporter Goto Github PK
View Code? Open in Web Editor NEWA tool for exporting Yuque documents as markdown.
License: GNU General Public License v2.0
A tool for exporting Yuque documents as markdown.
License: GNU General Public License v2.0
因为我的知识库文档内容太多,整体下载会卡死或超时。
可以在 getAllBooks 函数中,拿到 bookData 数据后,进行判断,只下载需要的 books 知识库。
code:if (object.books[i].id ===id) {
通过 https://www.yuque.com/api/mine/book_stacks 接口获取 全部 bookData内容,得到book_id。
通过 https://www.yuque.com/api/catalog_nodes?book_id=xxx + id 可以得到当前知识库的全部文章列表 getBookDetail 内容。
大佬一直报错是怎么回事呀
Login use user + password...
file:///E:/AIproject/yuque-exporter/node_modules/puppeteer-core/lib/esm/puppeteer/common/LifecycleWatcher.js:158
return new TimeoutError(errorMessage);
^
TimeoutError: Navigation timeout of 30000 ms exceeded
at LifecycleWatcher._LifecycleWatcher_createTimeoutPromise (file:///E:/AIproject/yuque-exporter/node_modules/puppeteer-core/lib/esm/puppeteer/common/LifecycleWatcher.js:
158:12)
at async Frame.waitForNavigation (file:///E:/AIproject/yuque-exporter/node_modules/puppeteer-core/lib/esm/puppeteer/common/Frame.js:252:23)
at async CDPPage.waitForNavigation (file:///E:/AIproject/yuque-exporter/node_modules/puppeteer-core/lib/esm/puppeteer/common/Page.js:445:16)
at async autoLogin (file:///E:/AIproject/yuque-exporter/src/login.js:45:7)
at async run (file:///E:/AIproject/yuque-exporter/main.js:44:5)
E:\AIproject\yuque-exporter>
语雀导出Markdown时以文章名作为文件名,但不可避免有文章名中出现了特殊字符而不能作为文件名的情况,因此语雀会使用下划线‘_’替代特殊字符。然而在该项目export.js源文件waitForDownload函数里,参数mdname是将特殊字符用空格替代了的文件名字符串,参数filename是下载时语雀使用下划线替代的字符串,因此filename和mdname的比较不通过。最终,waitForDownload函数则因为文件名不一致找不到想要的文件,三次重试以后导致Time Out。
一个简单的解决方式是将文件名中的符号字符都过滤掉,只保留中英文字符和数字,这样比较不容易出错。(你说:“我就喜欢拿标点符号做文件名咋办?” 那这方法不行,我没辙了,自己摸索语雀文件名替换规则一个个换吧)
打过补丁的代码waitForDownload函数代码如下:
async function waitForDownload(rootPath, book, mdname, started = false) {
const timeout = 10000; // 10s timeout
let filteredMdname = mdname.replace(/[^0-9a-zA-Z\u4e00-\u9fa5]/g, "");
return new Promise((resolve, reject) => {
// console.log(`======> watch ${rootPath} ${mdname}.md`)
const watcher = fs.watch(rootPath, (eventType, filename) => {
// console.log(`watch ${rootPath} ${eventType} ${filename}, want ${mdname}.md, filtered ${filteredMdname}`)
// if (eventType === 'rename' && filename === `${mdname}.md.crdownload` && !started) {
// console.log("Downloading document " + book + "/" + mdname)
// started = true
// }
if (eventType === 'rename' && filename.replace(/[^0-9a-zA-Z\u4e00-\u9fa5]/g, "") === `${filteredMdname}mdcrdownload` && !started) {
console.log("Downloading document " + book + "/" + mdname)
started = true
}
if (eventType === 'rename' && filename.replace(/[^0-9a-zA-Z\u4e00-\u9fa5]/g, "") === `${filteredMdname}md` && started) {
watcher.close();
resolve(filename);
}
});
setTimeout(() => {
watcher.close();
reject(new Error('Download timed out'));
}, timeout);
});
}
有些笔记是表格的形式,而导出的方式只能是Excel。downloadFile函数的url链接是无效的。
最后建议可以在downloadFile函数里记录下载失败的文件,并在exportMarkDownFiles函数最后打印出来。这样可以照着失败列表手动下载文件。
和之前有位同学反应的情况类似,也是出现download timeout的错误,提示如下:
Error: Download timed out
at Timeout._onTimeout (file:///C:/Users/Administrator/main.js:256:16)
at listOnTimeout (node:internal/timers:569:17)
at process.processTimers (node:internal/timers:512:7)
Waiting download document to files\运维开发学习记录\笔试面试\面试经验
Error: Download timed out
at Timeout._onTimeout (file:///C:/Users/fuyuanyuan/Downloads/yuque-exporter-master/src/export.js:123:20)
at listOnTimeout (node:internal/timers:573:17)
at process.processTimers (node:internal/timers:514:7)
Retrying download... (attempt 3)
所有文件都显示下载错误,文件标题中没有特殊字符
首先,非常感谢作者的这个项目,帮助我解决了导出语雀笔记的大问题。
我自己的笔记觉得是个人非常重要的一个财富,但是一篇一篇导出,除了繁琐以外,更多的是会忘记。这里作者的项目解决了燃眉之急。
这个Issue,并不是要提出什么问题,而是总结我在使用这个项目中的体验。给第一次使用项目的人做一些参考。
ubuntu 22.04 系统
首先阅读项目中README.md文件。
在Ubuntu22.04上使用nvm安装Node.js和配置npm淘宝源以及安装yarn的具体步骤:
1. **安装nvm**:
- 首先,你需要安装nvm。可以通过curl或wget从GitHub仓库安装。这里以curl为例:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
注意:
安装完nvm之后,相关环境变量就会设置在~/.bashrc。可以打开这个配置文件查看。
2. **关闭并重新打开终端**:
- 安装nvm后,你需要关闭并重新打开终端,以便nvm脚本能够生效。
3. **验证nvm安装**:
- 检查nvm是否安装成功:
```bash
nvm --version
```
4. **安装Node.js**:
- 使用nvm安装最新版本的Node.js(或者你可以选择安装其他版本):
```bash
nvm install node
```
5. **设置默认Node.js版本**:
- 将安装的Node.js版本设置为默认版本:
```bash
nvm alias default node
```
6. **配置npm淘宝源**:
- 运行以下命令来配置npm使用淘宝源:
```bash
npm config set registry https://registry.npmmirror.com
```
7. **安装yarn**:
- 使用npm和淘宝源全局安装yarn:
```bash
npm install -g yarn --registry=https://registry.npmmirror.com
```
8. **验证yarn安装**:
- 检查yarn是否安装成功:
```bash
yarn --version
```
9. **使用Node.js和yarn**:
- 安装好了相关依赖之后,就可以使用Node.js和yarn来创建和管理语雀导出这个项目。
git clone https://github.com/renyunkang/yuque-exporter.git
cd yuque-exporter
进入yuque-exporter项目后。
执行下载知识库的命令
# 第一次运行时,使用 USER + PASSWORD 登录
# USER=xxx PASSWORD=xxx node main.js
USER=xxx PASSWORD=xxx node main.js
注意关于导出路径
我没有设置导出路径,执行上述命令,会在yuque-exporter下面创建output文件夹。
执行上述命令报错。
不能找到 puppeteer 这个包。
执行下面执行安装包。
yarn add puppeteer
运行指令之后,就等待程序自动执行,会把所有的知识库下载到output文件夹下面。
注意
执行上述main.js之后,无论是文档中的图片,还是笔记中的图片链接,并没有下载到本地笔记中。
参考作者提供的额外功能。我是想把图片下载到本地,并且更新文档中的图片链接为本地相对路径。
# 下载图片到本地标志
export DOWNLOAD_IMAGE="true"
# 同时更新文档中的图片路径
export UPDATE_MDIMG_URL="true"
python3 export-image.py
以上就是我使用这个项目的记录,后续还有单独下载某个知识库的需求,这个issue有人已经提出方案,尝试后,在给大家更新使用体验。
windows环境执行node main.js 这步总是返回超时错误。
不知道是什么原因。
我这用户名是手机号码,是不是手机号码要加86之类的?
Login use user + password...
file:///C:/Windows/System32/yuque-exporter/node_modules/puppeteer-core/lib/esm/puppeteer/common/LifecycleWatcher.js:158
return new TimeoutError(errorMessage);
^
TimeoutError: Navigation timeout of 30000 ms exceeded
at LifecycleWatcher._LifecycleWatcher_createTimeoutPromise (file:///C:/Windows/System32/yuque-exporter/node_modules/puppeteer-core/lib/esm/puppeteer/common/LifecycleWatcher.js:158:12)
at async Frame.waitForNavigation (file:///C:/Windows/System32/yuque-exporter/node_modules/puppeteer-core/lib/esm/puppeteer/common/Frame.js:252:23)
at async CDPPage.waitForNavigation (file:///C:/Windows/System32/yuque-exporter/node_modules/puppeteer-core/lib/esm/puppeteer/common/Page.js:445:16)
at async file:///C:/Windows/System32/yuque-exporter/main.js:84:5
For README.md, it is recommended to add MacOS environment variable configuration. I can provide it
是否可行呢?
没有文档创建时间或者最新修改时间 的话有点不好跟踪
Error: Download timed out
at Timeout._onTimeout (file:///D:/git_project/yuque-exporter/main.js:256:16)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7)
Download error: Error: Download timed out
I have successfully logged in and all the book stacks are read, so all articles are listed without error.
But when it comes to download and export all books, it says download timed out therefore nothing is downloaded. I have no idea why this happen.
It's nothing about internet error since I have tested with and without proxy. Hope you can fix this bug and I'm also looking for solutions. If anything progress, I will update under this issue. Thanks
手里买的软件有新旧好多版本,但是对方的文档里一直只有最新版,有些老版本已经部署了,想把老版本文档保存备份一下
puppeteer postinstall$ node install.js
│ ERROR: Failed to set up Chromium r1108766! Set "PUPPETEER_SKIP_DOWNLOAD" env variable to skip download.
│ Error: read ECONNRESET
│ at TLSWrap.onStreamRead (node:internal/stream_base_commons:217:20) {
│ errno: -54,
│ code: 'ECONNRESET',
│ syscall: 'read'
│ }
现在登录貌似需要滑块验证码,填完账号密码后,可以加一个自动点击并滑动滑块的操作,参考代码如下:
// Scroll captcha
const start = await page.$('span[id="nc_2_n1z"]');
const startinfo = await start.boundingBox();
console.log(startinfo.x)
const end = await page.waitForSelector('.nc-lang-cnt');
const endinfo = await end.boundingBox();
await page.mouse.move(startinfo.x,endinfo.y);
await page.mouse.down();
for(var i=0;i<endinfo.width;i++) {
await page.mouse.move(startinfo.x+i,endinfo.y);
}
await page.mouse.up();
自己测试是OK的
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.