eikek / docspell Goto Github PK

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.

License: GNU Affero General Public License v3.0

Scala 44.20% Elm 47.18% CSS 0.41% JavaScript 0.33% HTML 0.56% Nix 2.36% Shell 0.71% Python 0.03% Java 2.16% Rich Text Format 1.99% Dockerfile 0.07% PLpgSQL 0.01%

ocr stanford-corenlp personal-document-system scala elm pdf document-management dms docspell edms

docspell's Introduction

Docspell - Document Management System (DMS)

Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess 😉. It can unify your files from scanners, emails and other sources. It is targeted for home use, i.e. families, households and also for smaller groups/companies.

You can associate tags, set correspondends and lots of other predefined and custom metadata. If your documents are associated with such meta data, you can quickly find them later using the search feature. But adding this manually is a tedious task. Docspell can help by suggesting correspondents, guessing tags or finding dates using machine learning. It can learn metadata from existing documents and find things using NLP. This makes adding metadata to your documents a lot easier. For machine learning, it relies on the free (GPL) Stanford Core NLP library.

Docspell also runs OCR (if needed) on your documents, can provide fulltext search and has great e-mail integration. Everything is accessible via a REST/HTTP api. A mobile friendly SPA web application is the default user interface. An Android app exists for conveniently uploading files from your phone/tablet and a cli. The feature overview lists some more points.

Impressions

Checkout the short demo videos (<1min), present on the project page. Here are some screenshots:

Try it

Docspell consists of several components. The probably quickest way to get started is to use the docker setup as described in the get started page. This is only three commands away:

git clone https://github.com/eikek/docspell
cd docspell/docker/docker-compose
docker-compose up -d

Then go to http://localhost:7880, sign up and login. Use the same name for collective and user for now. More on that can be found here.

Documentation

The project page has lots of information on how to use and setup docspell.

Contributions/Support

Feedback and other contributions are very welcome! There is a gitter room for chatting and questions. You can also open an issue for questions, problems and other feedback; or make a mail to info [at] docspell.org. I'll try to answer in time. You may want to check Contributing.md for some starting points, too.

If you find this project any useful and want to support it via other means, giving a ⭐ is always encouraging and much appreciated. You can also support activity by donating via liberapay or paypal.

Thank you so much!

Tech Stack

Backend

The servers is written in Scala in a pure functional style, based on libraries from the typelevel stack: Cats, FS2, Doobie, Http4s, Circe and Pureconfig.

There are more libraries and technologies used, of course. Docspell is only a orchestration of great tools and libs. One important is the Stanford-NLP, that provides the ML features. Furthermore, file processing relies on external tools like tesseract, unoconv and ocrmypdf. All dependencies can be looked up in project/Dependencies.scala.

Frontend

The web frontend is a SPA written in Elm. The UI framework in use is tailwind.

License

Docspell is free software, distributed under the AGPLv3 or later.

docspell's People

Contributors

Stargazers

Watchers

Forkers

scala-steward ciur afahmyintegral skyr compres torwag jacob1123 totti4ever vanto lattenlui kai2002 bjeanes sukram230799 highball-it antifuchs samotelf yumekui wkrea arakdraja stefan-scheidewig binomie rizwanbinyasin ffahrenb monnypython sree181 lucki sss-software wallace11 hubitor-forks itsnoted gesdoc seijikun nbhatta7 hellerbarde igel18 questerp gandy92 sytone moerfi666 ismobaga shinedog shishir-insane lucson312 hacker-h mtonnie jgirardet arittner lesuisse rubenixnagios manugratx bobokeke0 ppmathis gspiliotis polestar8 jberggg zoath aguanix dacog gooroo brahimmade julioloayzam arafat08007 vtimofeenko disi33 rhecustein lausen vwulf dillishrestha yakumwamba max-horvath orinocoz beauknowstech waldher xshadowlegendx wt-cristiano strogo tomb-raid3r ysdk2 simhaonline dawnraven kidk nasbeginner mprasil jaffe9 salemsaggaf rehanone alexau0301 aaronrchen plavreshin discite-dev v6ak madduck sikkgit ttt7275 edsonacortese parag60288 imfantuan baiyimi dublinlee chiongyuyih

docspell's Issues

Search for "name or notes" only works with lower-case entries

While experimenting, I discovered that the moment I enter a capital letter in the "name or notes" field, my search result is empty. With the corresponding lowercase letter, everything works as expected.

Adding file again via Web-Upload creates duplicate

When I upload a file again, that's already been uploaded, docspell happily creates a duplicate document. I think it should check if the file already exists and do nothing in that case.

Save Name on confirm

Hi,

I really like the new update :) Nice work!

Another little thing that has bothered me now more often: All fields except the one for name are saved directly. What I felt already 100 times has happened to me now: Name changed, forgot to press save, pressed confirm, changes were lost.

An idea could be to save the name field while pressing the confirm button?

Cheers
Eresturo

Searchable documentation

From #289

"Try it docker setup" uses wrong path

The instructions to start the docker compose are run from the wrong folder if you follow the mini tutorial

git clone https://github.com/eikek/docspell
cd docspell
DOCSPELL_HEADER_VALUE="my-secret-123" docker-compose up

should be:

git clone https://github.com/eikek/docspell
cd docspell/docker
DOCSPELL_HEADER_VALUE="my-secret-123" docker-compose up

because the compose file is in the docker folder and not the root folder.

consumedir alternative for mounted NFS drives

Hello,

I am stunned by your solution. It works really well. Thank you for providing this.

The last puzzle piece missing is consumedir. I have an NFS drive on a NAS were the PDFs from my scanner arrive. This drive is mounted via NFS on the host (which is a VM running under proxmox). This dir where the NFS is munted is a volume inside the container which runs consumedir.

inotify is not working with files on an NFS drive. What are my options to get the PDFs arriving on the NAS automatically show up inside docspell?

I am really looking forward to solve this last puzzle.

Regards
Ralf

Can't create an account

Installing docspell with the given dockerfile, I end up in the login-screen. Creating an account and pressing "ok", seems not to trigger anything. Browser does no further loading and the docker log remains silent about the sign-up procedure.

Auto reset the upload input

From #289

Fatal error?

Docspell seems to to be exactly that what I was searching for a while. Thank you.

I see this error when starting:
postgres_db | 2020-09-26 07:12:41.162 UTC [39] FATAL: unsupported frontend protocol 1234.5680: server supports 2.0 to 3.0

The next problem the processing queue hangs every time when "Starting text analysis"

Scrolling in document-list also affects document-view

After scrolling down in the list of documents and opening a document, I have to scroll up again to edit the metadata. When returning to the list I have to scroll down again, to reach the previous document.

Improve handling of dropdown fields and make search feature configurable

Is it possible to add a search to the person and organizations dropdowns? Similar to the tags input search.

docspell and docker dosent work

Hello,

I wanted to test docspell in the docker today but unfortunately it always ends with an exit code, can anyone help me?

Log:

https://pastebin.com/5wPqFx8R

Document Classifier - Remove category restriction

Hi,
i'm about to test the really cool feature of automatically suggesting tags. What I ask myself during configuration is: Why do I have to specify a category? Do the suggestions only work for tags from this category?
Sheers
Eresturo

Content/OCR Search of all documents

It's great to have the tagging features, but what if I am looking for every document containing a string of characters in all documents? For example, what if I'm looking for all documents with an SSN on them?

This can be implemented in multiple ways. Indexing document contents when processing would be best but that would likely add a lot of extra data to the database. And I imagine it would certainly be complex to implement. A slower, perhaps easier, approach might be to search each document on demand. This could be sped up by filtering by certain tags or other criteria first.

Position of the mask to create a new person / organization.

I hope not to be annoying, just start using the software and keep noticing things :)

When you create a new organization / person, the popup opens over the document, so you can no longer copy name / address / e-mail / etc. from the document.
One possibility would be to have the popup only in the left area, i.e. to cover only the input mask for metadata.
It would also be cool if you could mark and copy text in the document to replace annoying typing with copy and paste.

Furthermore, it would be great if this mask could also be called up for editing, for example to add information that is available for the first time in the current document.

Re-triggering the processing

Docspell grows in functionality in frequent release cycles, which I really enjoy.
With 0.9 the feature was introduced to get the text extracted by OCR as an overlay into the PDF, which works great.
Unfortunately, I already have some documents that were processed in an older version. Is it possible to trigger a new processing?
An API call would be sufficient for me, a UI entry would not be necessary.

Docker error on starting

Hi, I've installed docker and docker-compose, can run docker-compose up but am running into this error. I also deleted the entire docspell folder and pulled it from github again. Thanks.

osboxes@osboxes:~/Downloads/docspell/docker$ docker-compose up
Starting docspell-solr ... done
Starting postgres_db   ... done
Starting 48412a6dfdab_docspell-joex       ... done
Starting 0d84a8053338_docspell-restserver ... done
Starting docspell-consumedir              ... done
Attaching to docspell-solr, postgres_db, 0d84a8053338_docspell-restserver, docspell-consumedir, 48412a6dfdab_docspell-joex
0d84a8053338_docspell-restserver | bash: /opt/docspell-restserver-0.9.0/bin/docspell-restserver: Permission denied
docspell-consumedir | bash: /opt/docspell-tools-0.9.0/consumedir.sh: Permission denied
0d84a8053338_docspell-restserver exited with code 126
48412a6dfdab_docspell-joex | /bin/sh: can't open '/opt/entrypoint.sh': Permission denied
docspell-consumedir exited with code 126
docspell-solr | /bin/bash: /opt/docker-solr/scripts/docker-entrypoint.sh: Permission denied
postgres_db   | bash: /usr/local/bin/docker-entrypoint.sh: Permission denied
postgres_db exited with code 126
docspell-solr exited with code 126
48412a6dfdab_docspell-joex exited with code 2

Edit multiple documents in parallel

It would be nice to be able to edit multiple documents in parallel (e.g. set organization & person or edit tags).

/opt/docspell.conf isn't used

It still works on the last release - just not on the latest/self-compiled version!

Mapping for local one is done in docker-compose.ymland seems to work properly:

volumes
	- ./docspell.conf:/opt/docspell.conf`

execing ls in the container also shows that all rights are there:

-rwxrwxrwx    1 root     root           871 Oct  3 17:35 /opt/docspell.conf

But still, the default one is used :-/

[main] INFO  d.r.Main - Using config file from system properties: /opt/docspell-restserver/bin/../conf/docspell-server.conf

Unraid support

There have been requests for unraid templates. I'm not familiar with unraid, it seems to me that it is possible to convert the docker-compose.yml file into some xml or other format, that is supported on unraid.

Match via regex

Hi,
another suggestion, how about defining one (or more) regex expressions to automatically match correspondent / concerning or even tags? Would reduce the manual effort, at least in my workflow.
Sheers
eresturo

Unable to process/store PDF due to incorrect datetime value for suggested due date

I have uploaded a PDF file that is since stuck in the processing queue.

Error message:
2020-06-25T9:06:37: Job 8RYNuoXTR.../.../process-item/High execution failed. Retrying later.: (conn=6796) Incorrect datetime value: '2090-12-03 00:00:00' for column docspell.item.duedate at row 1

I wonder, where the due date comes from and how I could get the document into the system without it.

Keyboard Shortcuts

Hi,
I just had another idea what could increase the efficiency of using this great program. How about assigning keyboard shortcuts to important functions?

For example, I'm currently going through a lot of old documents and have to confirm them. I would be faster if there were shortcuts for Confirm and next document buttons.

Sheers
eresturo

Help with migrating from paperless

Hey, this piece of software looks really really promising!

I still have some problems understanding how I would use it - hope someone who is used to docspell can help:

Where are the documents stored? I'm afraid it is stored in the database or so? What I really, really would need is a storage in the filesystem :-)
There is no chance to see thumbnail in the overview, is it? That really helps a lot in identifying documents
If I mount /opt/docs of the consumer to a directory, should I still setup a watch folder using consumedir.sh?
Am I right, that all documents are being re-ocred? It is just that I already have a process up, taking the PDF from the scanner and sending them to ocrmypdf and I found the parameter set, which suits my needs best, I think. So, I would like to have new documents only ocred if they're not ocred, yet
Could you add the number of pages to the tiles in the overview? Would also help to identify documents quicker.

thanks for the great work!!

Proposal: Make fields hideable

First of all, many thanks for this really great software! 👍
I have tried many different DMS for home use and this one, although still in beta status, is by far the best 🥇

Now for my proposal. I don't need any due dates or directions in my workflow. How about making it configurable which fields you hide?
This would significantly increase simplicity and clarity, especially if more functions are added in the future.

Access original documents

First, I'd like to say that I was amazed by first impression!

I tested uploading several documents, including an xlsx file. Upload and parsing succeeded, but when I try to download the file, I only get the generated PDF, but not the original file. Is this not (yet?) possible, or did I simply miss something in the GUI?

make datepicker popup optional

First I'd like to congratulate you on this piece of software! It looks really refreshingly simple yet powerful. I've been searching for a simple document management system for home use for a long time.

Anyway. I have a few smaller nitpicks that could be converted into seperate issues at your discretion:

The "Notes" editor is a little weird. It's nice that it supports Markdown but once created, the note doesn't feel well-integrated yet. Why is it highlighted with a greenish colour? And why isn't it used -- at least in an abbreviated form -- for the item card in the "gallery view" when searching for documents?
In the main view I would like to be able to browse by category or tag. I don't want a full "folder"-like implementation. Actually this is probably one of the reasons why it feels so nice and simple, because it doesn't attempt to mimick a file explorer window .. but sometimes I would like to be able to browse by category without a specific search term in mind. (I am aware I can search for a tag without any other search terms and that works but its not quite the same thing.)

Now what this issue is actually about ... the datepicker in the metadata editor is really going on my nerves. 😬

I would really prefer if it were optional and hidden behind a button in the input field or at least would not pop up immediately when there is a manually entered date in the field already.
Or at least don't cover the input field when it does pop up. 😋

I am used to enter dates in ISO format like 2020-07-21. The parser for this field apparently understands this, so that's nice. That would have been more obvious though if I could enter dates manually more comfortably. 😃
(I haven't seen any suggestions yet, possibly because the dates in my documents are formatted in German like 21.07.2020?)

Solr Error

Hi, i have installed all but Solr is not working. In Solr the logs show me this error. What i am doing wrong?

Thanks for the help.

org.apache.solr.common.SolrException: undefined field content
at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1319)
at org.apache.solr.schema.IndexSchema$SolrQueryAnalyzer.getWrappedAnalyzer(IndexSchema.java:462)
at org.apache.lucene.analysis.DelegatingAnalyzerWrapper$DelegatingReuseStrategy.getReusableComponents(DelegatingAnalyzerWrapper.java:86)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:192)
at org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:262)
at org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:527)
at org.apache.solr.parser.QueryParser.newFieldQuery(QueryParser.java:62)
at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:1102)
at org.apache.solr.parser.SolrQueryParserBase.handleBareTokenQuery(SolrQueryParserBase.java:825)
at org.apache.solr.parser.QueryParser.Term(QueryParser.java:421)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:278)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)
at org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)
at org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)
at org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)
at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:263)
at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)
at org.apache.solr.search.QParser.getQuery(QParser.java:174)
at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:161)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:579)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:420)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:352)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:500)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
at java.lang.Thread.run(Thread.java:748)

QuickSearch Content disappear

Hi,
again a little thing that sometimes annoys. If you switch from "Names" to "Contents Only" in the Quick Search, the search term is deleted. Couldn't it remain there?
Sheers
Eresturo

Invalid date 'FEBRUARY 30'

I've uploaded a whole folder and one document failed. The document does not contain any date in february.

execution failed. Retrying later.: Invalid date 'FEBRUARY 30'
execution failed. Retrying later.: (conn=3795) Cannot delete or update a parent row: a foreign key constraint fails (`docspell`.`attachment_source`, CONSTRAINT `attachment_source_ibfk_1` FOREIGN KEY (`file_id`) REFERENCES `filemeta` (`id`))
execution failed. Retries exceeded.: (conn=3815) Cannot delete or update a parent row: a foreign key constraint fails (`docspell`.`attachment_source`, CONSTRAINT `attachment_source_ibfk_1` FOREIGN KEY (`file_id`) REFERENCES `filemeta` (`id`))

Although the document failed and isn't displayed anywhere in the UI it's blocking the reupload with the option "Skip files already present in docspell" set.

Unsetting the skip option allowed me to upload it and see it but there aren't any suggested meta:

execution failed. Retrying later.: Invalid date 'FEBRUARY 30'
Found existing item with these files.
execution failed. Retrying later.: 0
Found existing item with these files.
Processing failed on last retry. Creating item but without proposals.
execution failed. Retries exceeded.: 0

No suggestions for hidden fields

Hi,
if you have hidden a field, e.g. due date, you still get suggestions, but you can't delete them, because they are hidden. My suggestion is not to make suggestions for hidden fields.
Sheers
Eresturo

Broken Syntax Highlighting on Configuration Help

Hi,
on the following page the syntax highlighting is broken: https://docspell.org/docs/configure/
I think all ' should be replaced by `
Sheers
Eresturo

Multi user capabilites

Hi,
from the documentation it is not clear to me whether can organize documents for more than one person independent of each other.

Example: I don't want my girlfrind to see the bills for her birthday presents.

Is that possible?

UI Localization

From #289

FR: Proper docker files

So far they just download binaries from github and are fixed to a version.

Would be great if they would actually run the installation, so one can easily deploy the latest committed version..

I might give it a try in the next days myself :-)

Add notes to item list

It might be useful to see partial notes in the card-view. The server could return the first sentence or first N words. The webui could render it inside the card, or if it is too much, maybe only show a hint, that there is some notes.

More context in #186 .

Add convenience enhancements

Localization
- Render time in local time (it's currently UTC)
- Use the correct local date format - for me:
  - DD.MM.YYYY instead of MM.DD.YYYY
  - DD MMMM YYYY instead of MMMM DD, YYYY
- (Other languages, probably a bigger task)
Dark theme
Searchable documentation - everything is in categories and it's a bit difficult to find a specific part again.

improve search

First of all you made a really good piece of software, thank you for this :)

I have put a few documents in it and testing in out right now. So I have created some tags and I'm really liking the possible to group the tags with categories together. In the overview it helps to have these with the same color, good idea to make this possible. What I'm missing is the possibility to search for categories or to search for more than one tag (with an "or" condition not with an "and"). I think an option to switch between "and" and "or" in the search view would be the most flexible but also the most complicating way (from user perspective). So the possibility to search for a category would solve this without over-complicating the interface.

Example:
tag1: apples
tag2: oranges
category: fruits

Right now I can search for apple or oranges, but I can't filter for fruits

Unable to login from another computer

I'm able to log into the application via the browser running locally on the machine docspell is installed on. When I try to log in from another computer, the login page shows up, but I can't log in.

I saw in the terminal: docspell-restserver | [blaze-selector-2] INFO o.h.b.c.n.NIO1SocketServerGroup - Accepted connection from /192.168.0.177:57118

But in Chrome inspector (on another computer), I see:

Failed to load resource: net::ERR_CONNECTION_REFUSED localhost:7880/api/v1/sec/folder?q=:1 
Failed to load resource: net::ERR_CONNECTION_REFUSED localhost:7880/api/v1/sec/tag?q=:1 
Failed to load resource: net::ERR_CONNECTION_REFUSED docspell.js:15 
Remove account from local storage
Failed to load resource: net::ERR_CONNECTION_REFUSED localhost:7880/api/v1/open/auth/login:1

Is something wrong with my configuration? Thanks.

Feature request - add new while assigning tags

Hello,

I would love to see an “add new” button for tags/companies/persons in the edit metadata form.

Could be the last option in the drop down. It could show a dialogue above the form which allows entering a new one.

Pressing the save button in that dialogue adds the new person/company/person and applies it to the document.

Regards
Ralf

Improve Notes View

For context see #186

The item notes could be presented below the files and be editable like the readme on github/gitsomething. If there are no notes, the form could be shown to be able to create them.

consumedir.sh: Whitespaces in file names lead to error

realpath: Auto/2019-09-20: Datei oder Verzeichnis nicht gefunden
realpath: Car/2019-09-20: File or directory not found

command:

docspell/tools/consumedir.sh -p ./docs -mi --dry http://home:7880/api/v1/open/upload/item/test --once

docs/collective_name/2019 Folder/2019-09-20 filename.pdf

UI dark theme

From #289

Can't sign-up / sign-in

Hi,
I will give docspell a try. It sounds great.
But after Docker Installation I am not able to sign-up.
After fill up the sign-up form and submit, nothing happens.

Custom metadata fields

Joex crash java.lang.OutOfMemoryError: Java heap space

Since speed isn't important and your documentation already has the rPi mentioned I gave this a try on a similar device.
When the job executer is at e.s.n.p.StanfordCoreNLP - Adding annotator pos it crashes after some time because it's running out of memory:

Partial log

[ioapp-compute-1] INFO  d.j.s.LogSink - >>> 2020-09-24T22:31:59.571227Z Info GyJVT47C3.../process-item/High: Text extraction finished in 1794 ms
[ioapp-compute-3] INFO  d.j.s.LogSink - >>> 2020-09-24T22:31:59.604144Z Info GyJVT47C3.../process-item/High: Starting text analysis
[ioapp-compute-3] INFO  d.j.s.LogSink - >>> 2020-09-24T22:31:59.654125Z Info GyJVT47C3.../process-item/High: The text to analyse is larger than limit (1176 > 1). Analysing only first 1 characters.
[ioapp-compute-2] INFO  d.a.n.PipelineCache - Creating german Stanford NLP NER classifier...
[ioapp-compute-2] INFO  e.s.n.p.StanfordCoreNLP - Adding annotator tokenize
[ioapp-compute-2] INFO  e.s.n.p.StanfordCoreNLP - Adding annotator ssplit
[ioapp-compute-2] INFO  e.s.n.p.StanfordCoreNLP - Adding annotator mwt
[ioapp-compute-2] INFO  e.s.n.p.StanfordCoreNLP - Adding annotator pos
java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOfRange(Arrays.java:3821)
        at java.base/java.lang.StringLatin1.newString(StringLatin1.java:767)
        at java.base/java.lang.StringBuilder.toString(StringBuilder.java:447)
        at java.base/java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3694)
        at java.base/java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3477)
        at java.base/java.io.ObjectInputStream.readString(ObjectInputStream.java:2041)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1689)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:517)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:475)
        at java.base/java.util.HashMap.readObject(HashMap.java:1551)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:564)
        at java.base/java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1216)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2381)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2215)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1707)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2520)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2414)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2215)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1707)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2520)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2414)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2215)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1707)
        at java.base/java.io.ObjectInputStream.readArray(ObjectInputStream.java:2111)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1695)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2520)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2414)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2215)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1707)
[blaze-selector-0] INFO  o.h.b.c.ServerChannel - Closing NIO1 channel /127.0.0.1:7878
[shutdownHook1] INFO  o.h.b.c.n.SelectorLoop - Shutting down SelectorLoop blaze-selector-0
[shutdownHook1] INFO  o.h.b.c.n.SelectorLoop - Shutting down SelectorLoop blaze-selector-1
[shutdownHook1] INFO  o.h.b.c.n.SelectorLoop - Shutting down SelectorLoop blaze-selector-2
[shutdownHook1] INFO  o.h.b.c.n.SelectorLoop - Shutting down SelectorLoop blaze-selector-3
[shutdownHook1] INFO  o.h.b.c.n.SelectorLoop - Shutting down SelectorLoop blaze-selector-4
[shutdownHook1] INFO  d.b.ops.ONode - Unregister app joex1
[ioapp-compute-1] INFO  c.z.h.HikariDataSource - HikariPool-1 - Shutdown initiated...
[ioapp-compute-1] INFO  c.z.h.HikariDataSource - HikariPool-1 - Shutdown completed.
[ioapp-compute-1] INFO  o.h.c.PoolManager - Shutting down connection pool: curAllocated=0 idleQueues.size=0 waitQueue.size=0 maxWaitQueueLimit=256 closed=false
docspell-joex.service: Main process exited, code=exited, status=1/FAILURE
docspell-joex.service: Failed with result 'exit-code'.

Because it was crashing of too low memory where there should be enough swap available I've tried the following things:

I've made sure as advised that

-Xmx500M is set and
the pool size is set to 1.

Because that part seemingly belongs to the text analyzing I've also

disabled regex-ner.enabled = false,
disabled classification.enabled = false and
set text-analysis.max-length = 1.

Well, it's still crashing and even on a part that should be disabled? (I don't want to disable that, I just tried to circumvent the crash)
Do you can think of something why it's crashing even a device that should be a bit more powerful than a rPi?
Why is it's running out of memory when trying to analyze 1(!) character?

Tag / Folder / Category Selection

Hello again :)
In the outdated example screenshot from the Readme, tags folders and categories are listed according to the number of documents and can be pressed quickly. In the latest version, however, this has been replaced by a dropdown. May I ask why? Couldn't one have both? Or make it configurable? Or combine them?
Sheers
Eresturo

Long documents fail

I've uploaded a whole folder and one document failed. It's a document with 45 pages about updated terms and conditions.

Copied errors from the processing page:

execution failed. Retrying later.: (conn=3836) Data too long for column 'content' at row 1
execution failed. Retrying later.: (conn=3775) Cannot delete or update a parent row: a foreign key constraint fails (`docspell`.`attachment_source`, CONSTRAINT `attachment_source_ibfk_1` FOREIGN KEY (`file_id`) REFERENCES `filemeta` (`id`))
execution failed. Retries exceeded.: (conn=3796) Cannot delete or update a parent row: a foreign key constraint fails (`docspell`.`attachment_source`, CONSTRAINT `attachment_source_ibfk_1` FOREIGN KEY (`file_id`) REFERENCES `filemeta` (`id`))

Although the document failed and isn't displayed anywhere in the UI it's blocking the reupload with the option "Skip files already present in docspell" set.

Unsetting the skip option allowed me to upload it and see it but everything related to the content wasn't available:

execution failed. Retrying later.: (conn=3836) Data too long for column 'content' at row 1
Found existing item with these files.
execution failed. Retrying later.: 0
Found existing item with these files.
execution failed. Retries exceeded.: 0
Processing failed on last retry. Creating item but without proposals.

Import metadata from PDFs (especially keywords)

I'm thinking about a workflow for getting rid of paper at home; it will most likely involve some smartphone scanning app. Most of these apps allow adding some tags which end up as keywords in the PDF metadata. It would be great to have these imported and - if the strings match - automatically assigned as tags to the imported item.

Cookies not set when using traefik (because of X-FORWARDED-Header)

When it exists, Docspell uses the X-FORWARDED-FOR header as domain for the cookies. But this Header contains IPs or local host names (e.g. laptop.lan) at best.

Correct would be X-Forwarded-Host or Host.

This is a major bug when using traefik as one cannot disable the forwarded-headers there, i.e. all functions relying on cookies don't work with traefik right now.

eikek / docspell Goto Github PK

docspell's Introduction

Docspell - Document Management System (DMS)

Impressions

Try it

Documentation

Contributions/Support

Tech Stack

License

docspell's People

Contributors

Stargazers

Watchers

Forkers

docspell's Issues

Recommend Projects

Recommend Topics

Recommend Org