Comments (5)
Hi Andrej,
I ran into this issue as well as I am using multi-page pdfs, so I've addressed this in my local branch.
If you would be receptive to a PR with one change, that would just save images instead as:
- [pdf_name_page_n.jpg] for n in 1..page-length (if page-length > 1), or
- [pdf_name.jpg (current behavior)] if page-length==1
...I'd be happy to submit one. Obviously your call though. I'm a big fan of this repo and find your code very easy to work with, so I definitely appreciate your open-sourcing this!
from sparrow.
Hey Max, sure, please submit the change, I will merge it. Thanks a lot :)
from sparrow.
PR submitted here: #18.
Sorry to take a while to get this submitted after proposing it last week.
from sparrow.
Thanks, the changes merged.
from sparrow.
Thanks, @rajans, good catch. At the moment, I test code only with single-page docs, I will keep it as is for now. I added a comment in the code, so as not to forget it in the future.
from sparrow.
Related Issues (20)
- Taking a long response time HOT 1
- Long respond time HOT 1
- Error with docker HOT 1
- api/chat not available HOT 5
- Installation Tutorial HOT 1
- Cannot install requirements for sparrow-ml/llm HOT 4
- Model for commercial use HOT 1
- http connection error while using vprocessor HOT 3
- Please help me start the sparrow ocr api. It is not clear from the readme. HOT 1
- Have you seen this issue while installing requirements.txt of sparrow-ocr? HOT 5
- LLM is disabled. Using mock LLM HOT 1
- Model assuming NoneType instead of string. HOT 5
- Haystack giving timeout error HOT 1
- Performance is slow on GPU as well HOT 1
- License and usage clarification HOT 2
- When running Unstructured, { ModuleNotFoundError: No module named 'backoff._typing' } HOT 2
- How to remove validation errors or bypass it to always show result of the values that are found i.e if it didn't find the value for any input it will bypass it and print all the rest response,. HOT 3
- Getting Errors while running sparrow.sh under llm HOT 3
- Error running pip install -r requirements.txt in sparrow-data/ocr HOT 8
- please add more languages parameter for processor.extract_data( HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparrow.