Code Monkey home page Code Monkey logo

mindee-api-python's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mindee-api-python's Issues

reconstructed_total division by zero

Prerequisites

  • Reproduced the problem or exposed a new need
  • Checked the github existing issues

Description

Running the prediction.py script provided here I ran into a divided by zero error on a specific document.

Steps to Reproduce

  1. Copy the first exemple code of the performances-benchmark tuto (see link before)
  2. Run the code against specific files
  3. See the divided by zero error

Expected behavior:

Parse the document and provide json

Actual behavior:

Parse the document, get prediction and crash at reconstruction step reconstructed_total += tax.value + 100 * tax.value / tax.rate

Reproduces how often:

100% of the time on that specific file

Versions

latest mindee-api-python code available

Additional Information

Stack trance:

[REDACTED] factures TVA novembre 2006-page15.pdf float division by zero
Traceback (most recent call last):
  File "prediction.py", line 13, in <module>
    mindee_response = mindee_client.parse_invoice(test_file_path)
  File "/home/pafer/.local/share/virtualenvs/mindee-wRBzjoc1/lib/python3.8/site-packages/mindee/__init__.py", line 180, in parse_invoice
    return self._wrap_response(input_file, response, "invoice")
  File "/home/pafer/.local/share/virtualenvs/mindee-wRBzjoc1/lib/python3.8/site-packages/mindee/__init__.py", line 93, in _wrap_response
    return Response.format_response(dict_response, document_type, input_file)
  File "/home/pafer/.local/share/virtualenvs/mindee-wRBzjoc1/lib/python3.8/site-packages/mindee/__init__.py", line 302, in format_response
    Invoice(
  File "/home/pafer/.local/share/virtualenvs/mindee-wRBzjoc1/lib/python3.8/site-packages/mindee/documents/invoice.py", line 91, in __init__
    self._checklist()
  File "/home/pafer/.local/share/virtualenvs/mindee-wRBzjoc1/lib/python3.8/site-packages/mindee/documents/invoice.py", line 192, in _checklist
    "taxes_match_total_incl": self.__taxes_match_total_incl(),
  File "/home/pafer/.local/share/virtualenvs/mindee-wRBzjoc1/lib/python3.8/site-packages/mindee/documents/invoice.py", line 212, in __taxes_match_total_incl
    reconstructed_total += tax.value + 100 * tax.value / tax.rate
ZeroDivisionError: float division by zero
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "prediction.py", line 18, in <module>
    print(e.with_traceback())
TypeError: with_traceback() takes exactly one argument (0 given)

Cannot install requirements on M1

Prerequisites

Put an X between the brackets on this line if you have done all of the following:

  • Reproduced the problem or exposed a new need
  • Checked the github existing issues

Description

Steps to Reproduce

  1. run pip install -r requirements.txt

Expected behavior:

Install without any errors

Actual behavior:

I get this error fitz/fitz_wrap.c:2754:10: fatal error: 'fitz.h' file not found

See https://asciinema.org/a/0bq80oBoozUsqBd7aAUQwbj6o

Reproduces how often:

All the time

Versions

macOS: 11.5.2 on M1 architecture
Python: 3.9.7
Pip: 21.3

SDK not working on Windows - URL bad parsing

Description

The SDK is not able to parse the HTTP URLs when using on Windows. URLs generation is somehow using python os.path helpers that make path with "/" on Unix systems but "\" on Windows.

Versions

Tested on version v1.2.3

Typo in readme: expense receipt token

Just a small typo in the readme, I found the real value by looking through the code base.

This is in the readme:

from mindee import Client

mindee_client = Client(
    expense_receipts_token="your_expense_receipts_api_token_here",

but the named parameter should be expense_receipt_token.

receipts vs receipt

Sometimes we loose the mimetype

When I prepare a path input, when printing self.file_extension I get application/pdf, but the API returns an Invalid mimetype application/octet-stream meaning my original mimetype seems to be overriden somewhere

Error in http response

Hi,

I just got an error like the one below for an api call that I did not see before:

File "/usr/local/lib/python3.12/site-packages/mindee/mindee_http/response_validation.py", line 67, in clean_request_json and response_json["api_request"]["status_code"].isdigit()
AttributeError: 'int' object has no attribute 'isdigit'

This happened after calling a custom API built with the new API builder. Previous calls did not teilen this error. Please let me know if I can provide you more information.

Some PDF files are recognized as blank (zero page)

Since v1.2.2, a few PDF files are recognized as blank (zero page).

The function check_if_document_is_empty in mindee/inputs.py does not check for PDF "paths" (only image & text) to decide whether the page is blank or not. Some rare scanned PDF have no "image" and are then considered empty making the inference impossible.

Error on financial document

Prerequisites

Put an X between the brackets on this line if you have done all of the following:

  • Reproduced the problem or exposed a new need
  • Checked the github existing issues

Description

Got a non graceful error message when trying to use the CLI for financial type of document

Steps to Reproduce

run ./mindee-cli.sh financial -i path file.pdf

Expected behavior:

Return information about the financial document I used.

Actual behavior:

Traceback (most recent call last):
  File "/Users/fharper/.pyenv/versions/3.9.7/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/fharper/.pyenv/versions/3.9.7/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/sdk-python/mindee/__main__.py", line 185, in <module>
    call_endpoint(parse_args())
  File "/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/sdk-python/mindee/__main__.py", line 68, in call_endpoint
    client = _ots_client(args, info)
  File "/Users/fharper/Dropbox/Mac (3)/Documents/code/mindee/sdk-python/mindee/__main__.py", line 44, in _ots_client
    func = getattr(client, f"config_{args.product_name}")
AttributeError: 'Client' object has no attribute 'config_financial'

Reproduces how often:

Always

Versions

2.0.1

Adapt help display for no-raise-errors

Prerequisites

Put an X between the brackets on this line if you have done all of the following:

  • Reproduced the problem or exposed a new need
  • Checked the github existing issues

Description

--no-raise-errors definition is displayed after a break line probably caused by the fact it's too long.

Steps to Reproduce

Run ./mindee-cli.sh -h

Expected behavior:

no break line

Actual behavior:

CleanShot 2022-02-17 at 17 44 19@2x

Reproduces how often:

Always

Versions

2.0.1

Additional Information

KeyError 'page_id' When Parsing Documents with Custom Endpoints on Mindee API

Summary:
My API project is facing a KeyError 'page_id' when using new custom endpoints with the Mindee API. This issue is not present with older endpoints, suggesting a potential problem with the newer custom endpoint configuration or the library's response handling.

Background:
Our application leverages Mindee's OCR capabilities to extract data from PDFs. While older endpoints operate as expected with library version 3.13, the integration with newly created custom endpoints leads to a KeyError, even after updating to version 4.0.1.

Issue Description:
Upon invoking client.parse with product.CustomV1 for new custom endpoints, a KeyError is thrown, indicating the absence of 'page_id' in the parsed results. This exception is traced back to the construction of ListFieldV1 within the Mindee library.

Logs:
Traceback (most recent call last): File "C:\Users\PC\PycharmProjects\multiCotizadorAPI\utils\mindee\mindee.py", line 30, in analyze_pdf_with_mindee result = mindee_client.parse(product.CustomV1, input_doc, endpoint=custom_endpoint) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\multiCotizadorAPI\venv\Lib\site-packages\mindee\client.py", line 114, in parse return self._make_request( ^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\multiCotizadorAPI\venv\Lib\site-packages\mindee\client.py", line 339, in _make_request return PredictResponse(product_class, dict_response) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\multiCotizadorAPI\venv\Lib\site-packages\mindee\parsing\common\predict_response.py", line 28, in __init__ self.document = Document(prediction_type, raw_response["document"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\multiCotizadorAPI\venv\Lib\site-packages\mindee\parsing\common\document.py", line 47, in __init__ self.inference = prediction_type(raw_response["inference"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\multiCotizadorAPI\venv\Lib\site-packages\mindee\product\custom\custom_v1.py", line 27, in __init__ self.prediction = CustomV1Document(raw_prediction["prediction"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\multiCotizadorAPI\venv\Lib\site-packages\mindee\product\custom\custom_v1_document.py", line 29, in __init__ self.fields[field_name] = ListFieldV1(field_contents) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\PC\PycharmProjects\multiCotizadorAPI\venv\Lib\site-packages\mindee\parsing\custom\list.py", line 45, in __init__ self.page_id = raw_prediction["page_id"] ~~~~~~~~~~~~~~^^^^^^^^^^^ KeyError: 'page_id'

Troubleshooting Steps:

  • Confirmed operational status of older endpoints.
  • Encountered the error upon integrating new custom endpoints.
  • Library updated from 3.13 to 4.0.1, with no resolution.
  • Adherence to the updated Mindee documentation confirmed.
  • Codebase reviewed for discrepancies with custom endpoint usage.

Current Findings:

  • The issue is isolated to new custom endpoints.
  • The potential misalignment between the library's expectations and the custom endpoint's response.
  • The persistent problem post-library update suggests the issue might be beyond the client-side code.

-We request the Mindee team's input on this matter. Could you advise on any known issues with custom endpoint integration or suggest further steps we might take to debug this problem? Any assistance would be greatly appreciated to facilitate a swift resolution.

Thank you for your attention to this matter.

Could not determine MIME type of '{self.filename}' with tempfile.NamedTemporaryFile

Prerequisites

Mindee library raises raise MimeTypeError(f"Could not determine MIME type of '{self.filename}'") when being used with tempfile.NamedTemporaryFile.

f = tempfile.NamedTemporaryFile()

# Some TMP file magic happens here

mindee_client = Client(api_key=self.config.MINDEE_API_KEY.get_secret_value())
input_doc = mindee_client.doc_from_bytes(f.read(), f.name)

Description

The problem is caused by the usage of library mimetypes which relies on file extensions when trying to guess MIME of file. However, temporary name, available in NamedTemporaryFile doesn't contain the file extension, hence, mimetypes can't guess MIME.

Solution

There are two solutions:

  • To create an additional optional argument in doc_from_bytes() function accepting MIME types sent there manually.
  • To switch to magic library, based on libmagic. It can guess MIME pretty well using binary signatures of different types of files. It creates extra dependencies for developers since it requires libmagic to be installed on the computer.

However, guessing MIME from the file extension is not an optimal way to define correct MIME since sometimes users upload, for example, PNG files with .jpg extension and it will cause errors.

Add a consistent User-Agent header in Python SDK requests

The node SDK contains a header User-Agent : mindee-node/${sdkVersion} node/${process.version} when making the request to Mindee API (api.mindee.net/v1).

We should add the same header so that Mindee API backend has the information.

Example:

User-Agent : mindee-python/v1.3.0 python/3.8.12

Custom 500

Hi,

it's me again. Thank you for fixing the http code error so quickly. Now I can see the actual error from the server but it's not very informative:

File "/usr/local/lib/python3.12/site-packages/mindee/client.py", line 397, in _get_queued_document raise handle_error(mindee.error.mindee_http_error.MindeeHTTPServerError: custom 500 HTTP error: None - None

My code and the api works fine with other documents but not with this particular one. Is there anything I can do to debug this?!

Can't install because of compiling failure of dependencies

Prerequisites

Put an X between the brackets on this line if you have done all of the following:

  • Reproduced the problem or exposed a new need
  • Checked the github existing issues

Description

Cannot install due to failed cascading dependencies on Big Sur

Steps to Reproduce

  1. pip install mindee on MacOS Big Sur

Expected behavior:

Install mindee and dependencies, especially matplotlib

Actual behavior:

Failed to compile numpy, a dependency of matplotlib on Mac Intel

Reproduces how often:

100%

Versions

  • MacOS Big Sur
  • Python 3.9.0

Additional Information

I checked the issues on matplotlib project and maintener rejected the issue reported 3 days ago saying basicly "not my problem if you're using arm" except ... it's not an ARM issue :/

passport full_name is really just first/last

Prerequisites

Put an X between the brackets on this line if you have done all of the following:

  • [ x] Reproduced the problem or exposed a new need
  • [x ] Checked the github existing issues

Description

In the passport parsing the passport.full_name.value prints first & last, but no middle name.

Steps to Reproduce

  1. test the passport API withthe Python (I have a Jupyter notebook_
  2. call passport.full_name.value
  3. note: just first last are extracted

Expected behavior:

full_name should contain all values of name
First [middle] last (where there can be multiple middle names.
Actual behavior:

First Last

Reproduces how often:

Tested with 2 passports.

Versions

Additional Information

Use right header format when doing API calls

Prerequisites

Put an X between the brackets on this line if you have done all of the following:

  • Reproduced the problem or exposed a new need
  • Checked the github existing issues

Description

Steps to Reproduce

  • None

Expected behavior:

I expect the headers to be forwared to Mindee's API using the right format, e.g: Authorization: "Token <my_token>"

Actual behavior:

The header miss the Token part.

Reproduces how often:

All APIs calls

Versions

Additional Information

The currently used format is deprecated

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.