Code Monkey home page Code Monkey logo

Comments (9)

eikek avatar eikek commented on September 28, 2024

Hi @speedAmaster , this is more an issue for the eikek/docspell repository (no problem, though). What system do you have here (the host system)? And what kind of file did you import - maybe scanned documents, if so which dpi? The logs show that the ocrmypdf process timed out after 5min. That could mean that your system is too slow so you would need to increase the timeout or disable the command. It then tries to go without the conversion and the unpaper command seems to have similar problems.

Since you run it in docker, you need to ask docker for more logs. Try running docker-compose logs -f to get more logs.

The host system is affected if you give the containers all resources ( I think this is the default, but I'm not sure). The job is tried several times until it finally fails. If you reboot, it might try again. I think default is 2 retries. But if it really crashes due to out of memory errors, then its not sure if it can still update the counter.

For configuring docspell to better fit the system it is running on, reading this can be useful: https://docspell.org/docs/configure/#file-processing. TL;DR: you might need to increase the memroy for the job executor, or configure it to use less. Depending on your machine.

PS: fyi, I edited your post to only add 3 backticks around the logs, you can use 3 backticks instead of one for multiline verbatim stuff

from dsc.

speedAmaster avatar speedAmaster commented on September 28, 2024

Hi @eikek,
sorry for misplacing it in the docspell/dsc - my fault and thanks for the backticks hint! Shall we continue here?
I am using Ubuntu Linux 20.04.3 - ntel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz, 2 cores - 8GB - 6TB system.
I scanned with Brother ADS2800 (pdf), 300dpi-color, 3, 4, 5 pages.
Thanks for the docker-compose logs -f command! I will investigate!
I extended the timeout, but: the running job just continues after I restart host and docker and docspell..... have 1:46 hour meanwhile ;-) is there a way to "delete that job"? A cancel/delete in the GUI doesn't work :-(

from dsc.

eikek avatar eikek commented on September 28, 2024

We can continue here, no problem and no worries :)

Ok, so this system is not particular strong, but should be enough in my opinion. And 300dpi scans are quite ok. Of course, >1h is not a normal running time. Really big documents may take that long, but a few pages surely not. I suspect an out-of-memor error which lets the job executor go down - and then the cancel requests don't work, that's bad :(

You can delete the job by connecting to the database:

$ docker exec -it postgres_db psql -U dbuser dbname
psql (14.1 (Debian 14.1-1.pgdg110+1))
Type "help" for help.

dbname=# delete from joblog; delete from job;
DELETE 3313
DELETE 56
dbname=#

(the numbers vary on your system) This deletes all jobs - might be easier for now. Jobs that finished are deleted eventually anyways.

When you then start docspell again, look at http://localhost:7878/api/info/system at the section "heapUsage" and "inputArgs" - I'd be interested what it says. If you could scan a file for me without any sensible stuff in it that I could try here on my system, that would also help. Thanks!

from dsc.

speedAmaster avatar speedAmaster commented on September 28, 2024

Hi,
I could delete the job by your hint.
heapUsage is

init	132120576
used	80462224
comitted	218103808
max	3221225472
free	3140763248
description	"init=126.00M, used=76.73M, comitted=208.00M, max=3.00G, free=2.93G"

inputArgs is

0	"-Xmx1536M"
1	"-Dconfig.file=/opt/docspell-joex-0.28.0/bin/../conf/docspell-joex.conf"
2	"-XX:+UseG1GC"
3	"-Xmx3G"

Unfortunately docspell collapsed my host again...... maybe I just purge everything and restart from scratch....

from dsc.

eikek avatar eikek commented on September 28, 2024

Hi, I'm really sorry for this experience. I don't know the cause of this. If you could post the logs from docker-compose log that could help. Or if you could scan me such a file, I could take a look on my machine.

The memory consumption looks like its idling, there is almost 3G free memory.
Fwiw, it seems there is a duplicate -Xmx setting, you can try to remove this from the docker-compose.yml file (the -Xmx3G; this is probably left from some experiments. Docspell has then less memory but maybe other stuff is running and taking 3G is too much for the host. I can't tell of course…

from dsc.

speedAmaster avatar speedAmaster commented on September 28, 2024

maybe - I give up an wait for my new server to be comnfigured, ordered, delivered, built :-)
I have pretty lot docs to be scanned and do not want to redo all jobs again.....

brief closing question: can I make the postgress DB accessible from host system (to allow backups) via MYLOCALHOSTFOLDER by in docker-compose.yml?

db:
    image: postgres:14.1
    container_name: postgres_db
    restart: unless-stopped
    volumes:
      - MYLOCALHOSTFOLDER:/var/lib/postgresql/data/

correct?

from dsc.

eikek avatar eikek commented on September 28, 2024

Ok, hope it then works better. It's definitely very strange to see this.

Regarding backups: sure! You can do whatever fits your demands. I do it by periodically creating a database dump. When using docker, this might be interesting.

from dsc.

stale avatar stale commented on September 28, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!

from dsc.

stale avatar stale commented on September 28, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. This only applies to 'question' issues. Always feel free to reopen or create new issues. Thank you!

from dsc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.