Comments (9)
Hi @speedAmaster , this is more an issue for the eikek/docspell repository (no problem, though). What system do you have here (the host system)? And what kind of file did you import - maybe scanned documents, if so which dpi? The logs show that the ocrmypdf process timed out after 5min. That could mean that your system is too slow so you would need to increase the timeout or disable the command. It then tries to go without the conversion and the unpaper command seems to have similar problems.
Since you run it in docker, you need to ask docker for more logs. Try running docker-compose logs -f
to get more logs.
The host system is affected if you give the containers all resources ( I think this is the default, but I'm not sure). The job is tried several times until it finally fails. If you reboot, it might try again. I think default is 2 retries. But if it really crashes due to out of memory errors, then its not sure if it can still update the counter.
For configuring docspell to better fit the system it is running on, reading this can be useful: https://docspell.org/docs/configure/#file-processing. TL;DR: you might need to increase the memroy for the job executor, or configure it to use less. Depending on your machine.
PS: fyi, I edited your post to only add 3 backticks around the logs, you can use 3 backticks instead of one for multiline verbatim stuff
from dsc.
Hi @eikek,
sorry for misplacing it in the docspell/dsc - my fault and thanks for the backticks hint! Shall we continue here?
I am using Ubuntu Linux 20.04.3 - ntel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz, 2 cores - 8GB - 6TB system.
I scanned with Brother ADS2800 (pdf), 300dpi-color, 3, 4, 5 pages.
Thanks for the docker-compose logs -f
command! I will investigate!
I extended the timeout, but: the running job just continues after I restart host and docker and docspell..... have 1:46 hour meanwhile ;-) is there a way to "delete that job"? A cancel/delete in the GUI doesn't work :-(
from dsc.
We can continue here, no problem and no worries :)
Ok, so this system is not particular strong, but should be enough in my opinion. And 300dpi scans are quite ok. Of course, >1h is not a normal running time. Really big documents may take that long, but a few pages surely not. I suspect an out-of-memor error which lets the job executor go down - and then the cancel requests don't work, that's bad :(
You can delete the job by connecting to the database:
$ docker exec -it postgres_db psql -U dbuser dbname
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.
dbname=# delete from joblog; delete from job;
DELETE 3313
DELETE 56
dbname=#
(the numbers vary on your system) This deletes all jobs - might be easier for now. Jobs that finished are deleted eventually anyways.
When you then start docspell again, look at http://localhost:7878/api/info/system
at the section "heapUsage" and "inputArgs" - I'd be interested what it says. If you could scan a file for me without any sensible stuff in it that I could try here on my system, that would also help. Thanks!
from dsc.
Hi,
I could delete the job by your hint.
heapUsage is
init 132120576
used 80462224
comitted 218103808
max 3221225472
free 3140763248
description "init=126.00M, used=76.73M, comitted=208.00M, max=3.00G, free=2.93G"
inputArgs is
0 "-Xmx1536M"
1 "-Dconfig.file=/opt/docspell-joex-0.28.0/bin/../conf/docspell-joex.conf"
2 "-XX:+UseG1GC"
3 "-Xmx3G"
Unfortunately docspell collapsed my host again...... maybe I just purge everything and restart from scratch....
from dsc.
Hi, I'm really sorry for this experience. I don't know the cause of this. If you could post the logs from docker-compose log
that could help. Or if you could scan me such a file, I could take a look on my machine.
The memory consumption looks like its idling, there is almost 3G free memory.
Fwiw, it seems there is a duplicate -Xmx
setting, you can try to remove this from the docker-compose.yml
file (the -Xmx3G
; this is probably left from some experiments. Docspell has then less memory but maybe other stuff is running and taking 3G is too much for the host. I can't tell of course…
from dsc.
maybe - I give up an wait for my new server to be comnfigured, ordered, delivered, built :-)
I have pretty lot docs to be scanned and do not want to redo all jobs again.....
brief closing question: can I make the postgress DB accessible from host system (to allow backups) via MYLOCALHOSTFOLDER by in docker-compose.yml?
db:
image: postgres:14.1
container_name: postgres_db
restart: unless-stopped
volumes:
- MYLOCALHOSTFOLDER:/var/lib/postgresql/data/
correct?
from dsc.
Ok, hope it then works better. It's definitely very strange to see this.
Regarding backups: sure! You can do whatever fits your demands. I do it by periodically creating a database dump. When using docker, this might be interesting.
from dsc.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!
from dsc.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!
from dsc.
Related Issues (20)
- [Feature Request] A "full checkout" / "full backup" option HOT 4
- Using `export` raises an error HOT 4
- Watching triggers uploads for directories
- Allow custom export file pattern
- Export files without metadata
- When exporting files omit the `files` directory when there is just one
- Allow to export the converted files instead of originals
- Improve error message when session file doesn't exist
- dsc cannot find collective in `upload` command
- 404 not found using dsc watch on docker HOT 5
- Allow multiple --matches and --not-matches patterns HOT 3
- dsc client with Windows and NAS HOT 20
- Add new global options to config
- Export occasionally crash HOT 1
- DSC not detecting files from NFS Share HOT 4
- On poll/watch the collective folder gets deleted
- Add option for a http timeout
- Content-Lenght:0 for empty POST requests required by some httpd HOT 1
- File names containing quotes are not uploaded properly HOT 1
- flake: odd error about recursive description HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dsc.