Running on Synology NAS with version DSM 7.1.1-42962 Update 6. Worked fine with ve

Sure: <div class="snippet-clipboard-content notranslate position-relative overflow

processing fails since update to 1.4.0: "ModuleNotFoundError: No module named 'pypdf'",about geimist/synocr

Comments (5)

kobeegh commented on June 2, 2024 1

Sure:

    -----------------------------------
    |    ==> installation info <==    |
    -----------------------------------

synOCR-user:              synOCR
synOCR-user is admin:     yes
synOCR-version:           1.4.0
Architecture:             x86_64
DSM-build:                42962
Device:                   918plus (2053505210)
current Profil:           default
monitor is running?:      no
DB-version:               9
used image (created):     jbarlow83/ocrmypdf:latest (2023-06-20T10:34:03)
document author:          
used ocr-parameter (raw): -srd -l deu
OCR-arg 1:                -srd
OCR-arg 2:                -l
OCR-arg 3:                deu
ocropt_array:             -srd -l deu
search prefix:            
replace search prefix:    no
renaming syntax:          
Symbol for tag marking:   #
target file handling:     no
Document split pattern:   
split page handling:      discard
delete blank pages:       
threshold black/white:    
threshold black pixels:   
clean up spaces:          false
Date search method:       use standard search via RegEx
date found order:         firstfound
source for filedate:      now
ignored dates by search:  
date range in past:       0 [absolute: 0]
date range in future:     0 [absolute: 0]
PATH-Variable:            /sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/syno/bin:/usr/syno/sbin:/usr/local/bin:/opt/usr/bin:/usr/syno/synoman/webman/3rdparty/synOCR/bin:/usr/local/bin:/opt/usr/bin
Docker test:              OK
DSM notify to user:       @administrators
apprise notify service:   
apprise attachment:       false
notify language:          enu
Loglevel:                 debug
max. count of logfiles:   10
rotate backupfiles after: (purge backup deactivated)
Source directory:         /volume1/OCR/_INPUT/
Target directory:         /volume1/OCR/_OUTPUT/
BackUp directory:         /volume1/OCR/_BACKUP/



  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● ---------------------------------- ●
  ● |    ==> RUN THE FUNCTIONS <==   | ●
  ● ---------------------------------- ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

  -----------------------------------------------------------------------------------
  | check the python3 installation and the necessary modules:                       |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:01]


                  Check Python:

                  module list:
                  Package               Version
                  --------------------- -----------
                  apprise               1.4.0
                  argcomplete           3.0.8
                  backports.zoneinfo    0.2.1
                  certifi               2023.5.7
                  charset-normalizer    3.1.0
                  click                 8.1.3
                  dateparser            1.1.8
                  DateTime              5.1
                  deprecation           2.1.0
                  idna                  3.4
                  importlib-metadata    6.7.0
                  lxml                  4.9.2
                  Markdown              3.4.3
                  oauthlib              3.2.2
                  packaging             23.1
                  pikepdf               7.1.2
                  Pillow                9.5.0
                  pip                   23.1.2
                  PyPDF2                2.3.1
                  python-dateutil       2.8.2
                  pytz                  2023.3
                  pytz-deprecation-shim 0.1.0.post0
                  PyYAML                6.0
                  regex                 2023.5.5
                  requests              2.31.0
                  requests-oauthlib     1.3.1
                  setuptools            56.0.0
                  six                   1.16.0
                  tomlkit               0.11.8
                  typing_extensions     4.5.0
                  tzdata                2023.3
                  tzlocal               4.3
                  urllib3               2.0.3
                  xmltodict             0.13.0
                  yq                    3.2.2
                  zipp                  3.15.0
                  zope.interface        6.0
                prepare_python: OK
Target temp directory:    /tmp/tmp.rivsr98dQA


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED:                                      ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE:   ➜ 2023.07.01 - testfile.pdf
                  temp. target file: /tmp/tmp.rivsr98dQA/step1_tmp_1688575421/2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | processing PDF @ OCRmyPDF:                                                      |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:00]

                ➜ OCRmyPDF-LOG:
                  WARNING: Error loading config file: .dockercfg: $HOME is not defined
                    DEBUG ocrmypdf - ocrmypdf 14.2.2.dev31+g7c38c717.d20230620
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.subprocess - Found tesseract 5.3.1-22-g24da
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Found gs 9.55.0
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
                    DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
                  chi_sim
                  deu
                  eng
                  fra
                  osd
                  por
                  spa
                  
                     INFO ocrmypdf._validation - reading file from standard input
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/stdin, /tmp/ocrmypdf.io.uuaw1x_6/origin.pdf)
                    DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
                     INFO ocrmypdf._pipeline -    1  skipping all processing on this page
                    DEBUG ocrmypdf._graft -    1  Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
                    DEBUG ocrmypdf._graft -    1  Page rotation: (content, auto) -> page = (0, 0) -> 0
                     INFO ocrmypdf._sync - Postprocessing...
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/graft_layers.pdf, /tmp/ocrmypdf.io.uuaw1x_6/fix_docinfo.pdf)
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.uuaw1x_6/fix_docinfo.pdf', '/tmp/ocrmypdf.io.uuaw1x_6/pdfa.ps']
                    DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0 (2021-09-27)
                    DEBUG ocrmypdf.subprocess.gs - Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
                    DEBUG ocrmypdf.subprocess.gs - This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
                    DEBUG ocrmypdf.subprocess.gs - see the file COPYING for details.
                    DEBUG ocrmypdf.subprocess.gs - Processing pages 1 through 1.
                    DEBUG ocrmypdf.subprocess.gs - Page 1
                    DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0: Setting Overprint Mode to 1
                    DEBUG ocrmypdf.subprocess.gs - not permitted in PDF/A-2, overprint mode not set
                    DEBUG ocrmypdf.subprocess.gs - 
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - XrefExt(xref=224, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=225, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=218, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=219, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=220, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=221, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=222, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=223, ext='.png')
                    DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 8
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 225: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 218: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 219: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 221: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 223: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 225: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 218: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 219: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 223: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/optimize.opt.pdf, /tmp/ocrmypdf.io.uuaw1x_6/optimize.pdf)
                    DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
                     INFO ocrmypdf._pipeline - Image optimization ratio: 1.00 savings: 0.4%
                     INFO ocrmypdf._pipeline - Total file size ratio: 1.01 savings: 0.6%
                    DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.uuaw1x_6/optimize.pdf -> -
                     INFO ocrmypdf._sync - Output sent to stdout
                ← OCRmyPDF-LOG-END


[runtime up to now:    00:00:18]

                target file (OK): /tmp/tmp.rivsr98dQA/step1_tmp_1688575421/2023.07.01 - testfile.pdf

                no split pattern defined or splitting not possible

  -----------------------------------------------------------------------------------
  | handle source file:                                                             |
  -----------------------------------------------------------------------------------

                ➜ backup source file to: /volume1/OCR/_BACKUP/2023.07.01 - testfile.pdf
                removed directory '/tmp/tmp.rivsr98dQA/step1_tmp_1688575421/'

Stats:
  runtime last file:              ➜ 00:00:18
  runtime 1st step (all files):   ➜ 00:00:25


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● STEP 2 - SEARCH TAGS / RENAME / SORT:                                           ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


                list files in INPUT with transcoded special characters:
                ➜ 2023.07.01 - testfile.pdf$

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pypdf'
ERROR at line 2284: pagecount_latest=$( py_page_count "${input}" )
                (pages counted with python module pypdf)
./synOCR.sh: line 2299: 1182+ERROR at line 1739: python3
ERROR at line 2284: python3: syntax error in expression (error token is "at line 1739: python3
ERROR at line 2284: python3")

  purge log files ...
  delete 1 log files ( > 10 files)
  delete -10 search files ( > 10 files)

  purge backup deactivated!
rmdir: failed to remove '/tmp/tmp.rivsr98dQA': Directory not empty
  rmdir: removing directory, '/tmp/tmp.rivsr98dQA'

  runtime all files:              ➜ 00:00:25


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● ---------------------------------- ●
  ● |    ==> END OF FUNCTIONS <==    | ●
  ● ---------------------------------- ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

from synocr.

geimist commented on June 2, 2024 1

For some reason, the Python environment for synOCR has not been updated.
Do you see yourself being able to run the command below in the terminal or via the DSM task scheduler? It will delete the Python environment so that it will be recreated on the next run.
rm -rf /usr/syno/synoman/webman/3rdparty/synOCR/python3_env

Alternatively, you can use HyperBackup to create a backup of synOCR and then uninstall synOCR; reinstall it and restore the backup.

from synocr.

kobeegh commented on June 2, 2024 1

Have deleted the directory and started a new attempt. It works now, thank you very much!

If it is of interest to you, here is the debug log from the successful run where it initially pulls up the python environment and modules:


    -----------------------------------
    |    ==> installation info <==    |
    -----------------------------------

synOCR-user:              synOCR
synOCR-user is admin:     yes
synOCR-version:           1.4.0
Architecture:             x86_64
DSM-build:                42962
Device:                   918plus (2053505210)
current Profil:           default
monitor is running?:      no
DB-version:               9
used image (created):     jbarlow83/ocrmypdf:latest (2023-06-20T10:34:03)
document author:          
used ocr-parameter (raw): -srd -l deu
OCR-arg 1:                -srd
OCR-arg 2:                -l
OCR-arg 3:                deu
ocropt_array:             -srd -l deu
search prefix:            
replace search prefix:    no
renaming syntax:          
Symbol for tag marking:   #
target file handling:     no
Document split pattern:   
split page handling:      discard
delete blank pages:       
threshold black/white:    
threshold black pixels:   
clean up spaces:          false
Date search method:       use standard search via RegEx
date found order:         firstfound
source for filedate:      now
ignored dates by search:  
date range in past:       0 [absolute: 0]
date range in future:     0 [absolute: 0]
PATH-Variable:            /sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/syno/bin:/usr/syno/sbin:/usr/local/bin:/opt/usr/bin:/usr/syno/synoman/webman/3rdparty/synOCR/bin:/usr/local/bin:/opt/usr/bin
Docker test:              OK
DSM notify to user:       @administrators
apprise notify service:   
apprise attachment:       false
notify language:          enu
Loglevel:                 debug
max. count of logfiles:   10
rotate backupfiles after: (purge backup deactivated)
Source directory:         /volume1/OCR/_INPUT/
Target directory:         /volume1/OCR/_OUTPUT/
BackUp directory:         /volume1/OCR/_BACKUP/



  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● ---------------------------------- ●
  ● |    ==> RUN THE FUNCTIONS <==   | ●
  ● ---------------------------------- ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

  -----------------------------------------------------------------------------------
  | check the python3 installation and the necessary modules:                       |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:00]


                  Check Python:
                  python3 already installed (/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/bin/python3)

                  Check pip:
                  pip already installed (pip 21.1.1 from /usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/pip (python 3.8)) / upgrade available ...
                  Requirement already satisfied: pip in ./python3_env/lib/python3.8/site-packages (21.1.1)
                  Collecting pip
                    Using cached pip-23.1.2-py3-none-any.whl (2.1 MB)
                  Installing collected packages: pip
                    Attempting uninstall: pip
                      Found existing installation: pip 21.1.1
                      Uninstalling pip-21.1.1:
                        Successfully uninstalled pip-21.1.1
                  Successfully installed pip-23.1.2

                  read installed python modules:
                  Package    Version
                  ---------- -------
                  pip        23.1.2
                  setuptools 56.0.0

                ➜ check python module "DateTime": ➜ DateTime was not found and will be installed ➜ ok
                ➜ check python module "dateparser": ➜ dateparser was not found and will be installed ➜ ok
                ➜ check python module "pypdf==3.5.1": ➜ pypdf==3.5.1 was not found and will be installed ➜ ok
                ➜ check python module "pikepdf==7.1.2": ➜ pikepdf==7.1.2 was not found and will be installed ➜ ok
                ➜ check python module "Pillow": ➜ Pillow was not found and will be installed ➜ ok
                ➜ check python module "yq": ➜ yq was not found and will be installed ➜ ok
                ➜ check python module "PyYAML": ➜ PyYAML was not found and will be installed ➜ ok
                ➜ check python module "apprise": ➜ apprise was not found and will be installed ➜ ok


                  module list:
                  Package            Version
                  ------------------ --------
                  apprise            1.4.0
                  argcomplete        3.1.1
                  backports.zoneinfo 0.2.1
                  certifi            2023.5.7
                  charset-normalizer 3.1.0
                  click              8.1.3
                  dateparser         1.1.8
                  DateTime           5.1
                  deprecation        2.1.0
                  idna               3.4
                  importlib-metadata 6.7.0
                  lxml               4.9.3
                  Markdown           3.4.3
                  oauthlib           3.2.2
                  packaging          23.1
                  pikepdf            7.1.2
                  Pillow             10.0.0
                  pip                23.1.2
                  pypdf              3.5.1
                  python-dateutil    2.8.2
                  pytz               2023.3
                  PyYAML             6.0
                  regex              2023.6.3
                  requests           2.31.0
                  requests-oauthlib  1.3.1
                  setuptools         56.0.0
                  six                1.16.0
                  tomlkit            0.11.8
                  typing_extensions  4.7.1
                  tzlocal            5.0.1
                  urllib3            2.0.3
                  xmltodict          0.13.0
                  yq                 3.2.2
                  zipp               3.15.0
                  zope.interface     6.0
                prepare_python: OK
Target temp directory:    /tmp/tmp.oMs0RqPMX7


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED:                                      ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE:   ➜ 2023.07.01 - testfile.pdf
                  temp. target file: /tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | processing PDF @ OCRmyPDF:                                                      |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:00]

                ➜ OCRmyPDF-LOG:
                  WARNING: Error loading config file: .dockercfg: $HOME is not defined
                    DEBUG ocrmypdf - ocrmypdf 14.2.2.dev31+g7c38c717.d20230620
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.subprocess - Found tesseract 5.3.1-22-g24da
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Found gs 9.55.0
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
                    DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
                  chi_sim
                  deu
                  eng
                  fra
                  osd
                  por
                  spa
                  
                     INFO ocrmypdf._validation - reading file from standard input
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/stdin, /tmp/ocrmypdf.io.0889pg_j/origin.pdf)
                    DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
                     INFO ocrmypdf._pipeline -    1  skipping all processing on this page
                    DEBUG ocrmypdf._graft -    1  Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
                    DEBUG ocrmypdf._graft -    1  Page rotation: (content, auto) -> page = (0, 0) -> 0
                     INFO ocrmypdf._sync - Postprocessing...
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/graft_layers.pdf, /tmp/ocrmypdf.io.0889pg_j/fix_docinfo.pdf)
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.0889pg_j/fix_docinfo.pdf', '/tmp/ocrmypdf.io.0889pg_j/pdfa.ps']
                    DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0 (2021-09-27)
                    DEBUG ocrmypdf.subprocess.gs - Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
                    DEBUG ocrmypdf.subprocess.gs - This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
                    DEBUG ocrmypdf.subprocess.gs - see the file COPYING for details.
                    DEBUG ocrmypdf.subprocess.gs - Processing pages 1 through 1.
                    DEBUG ocrmypdf.subprocess.gs - Page 1
                    DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0: Setting Overprint Mode to 1
                    DEBUG ocrmypdf.subprocess.gs - not permitted in PDF/A-2, overprint mode not set
                    DEBUG ocrmypdf.subprocess.gs - 
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - XrefExt(xref=224, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=225, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=218, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=219, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=220, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=221, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=222, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=223, ext='.png')
                    DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 8
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 225: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 218: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 219: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 221: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 223: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 225: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 218: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 219: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 223: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/optimize.opt.pdf, /tmp/ocrmypdf.io.0889pg_j/optimize.pdf)
                    DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
                     INFO ocrmypdf._pipeline - Image optimization ratio: 1.00 savings: 0.4%
                     INFO ocrmypdf._pipeline - Total file size ratio: 1.01 savings: 0.6%
                    DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.0889pg_j/optimize.pdf -> -
                     INFO ocrmypdf._sync - Output sent to stdout
                ← OCRmyPDF-LOG-END


[runtime up to now:    00:00:16]

                target file (OK): /tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/2023.07.01 - testfile.pdf

                no split pattern defined or splitting not possible

  -----------------------------------------------------------------------------------
  | handle source file:                                                             |
  -----------------------------------------------------------------------------------

                ➜ backup source file to: /volume1/OCR/_BACKUP/2023.07.01 - testfile.pdf
                removed directory '/tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/'

Stats:
  runtime last file:              ➜ 00:00:16
  runtime 1st step (all files):   ➜ 00:01:49


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● STEP 2 - SEARCH TAGS / RENAME / SORT:                                           ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


                list files in INPUT with transcoded special characters:
                ➜ 2023.07.01 - testfile.pdf$

                (pages counted with python module pypdf)

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE:   ➜ 2023.07.01 - testfile.pdf
                ➜ File permissions source file:
                  -rw-rw-r-- 1 synOCR synOCR 219590 Jul  5 18:02 /tmp/tmp.oMs0RqPMX7/2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | search tags in ocr text:                                                        |
  -----------------------------------------------------------------------------------

                no tags defined

  -----------------------------------------------------------------------------------
  | search for a valid date in ocr text:                                            |
  -----------------------------------------------------------------------------------

                run RegEx date search - search for date format: 1 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
                run RegEx date search - search for date format: 2 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
                run RegEx date search - search for date format: 3 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
                  Date not found in OCR text - use file date:
                  day:  05
                  month:07
                  year: 2023

  -----------------------------------------------------------------------------------
  | rename and sort to target folder:                                               |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:01]

                ➜ renaming:
                  apply renaming syntax ➜ ! WARNING ! – No variables were found for renaming. A fallback is used to prevent an empty file name: 2023.07.01 - testfile

[runtime up to now:    00:00:01]

                ➜ insert metadata (use python pikepdf)
                used metadata:
                ➜ '/Author': '',
                ➜ '/Keywords': '',
                ➜ '/CreationDate': 'D:20230705',
                ➜ '/CreatorTool': 'synOCR 1.4.0'

                call handlePdf.py -dbg_lvl "2" -dbg_file "/volume1/OCR/_LOG/synOCR_2023-07-05_19-01-33.log" -task metadata -inputFile "/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf" -metaData "{'/Author': '',
'/Keywords': '',
'/CreationDate': 'D:20230705',
'/CreatorTool': 'synOCR 1.4.0'}" -outputFile "/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf"

2023-07-05 19:03:23,958 - INFO - HandlePdf started
2023-07-05 19:03:23,958 - INFO - Version: 0.2
2023-07-05 19:03:23,958 - INFO - Task=metadata
2023-07-05 19:03:23,959 - DEBUG - set_task_metadata_parameter(input_file=/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf, output_file=/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf,                    meta_data_str={'/Author': '',
'/Keywords': '',
'/CreationDate': 'D:20230705',
'/CreatorTool': 'synOCR 1.4.0'})
2023-07-05 19:03:23,959 - DEBUG - <<<<<< set_task_meta_data_parameter ended
2023-07-05 19:03:23,959 - DEBUG - >>>>>> open_pdf started
2023-07-05 19:03:23,965 - DEBUG - <<<<<< open_pdf ended
2023-07-05 19:03:23,966 - INFO - >>>>> write meta_data started
2023-07-05 19:03:23,966 - DEBUG - old meta_data....
2023-07-05 19:03:23,966 - DEBUG - >>>>> log metadata >>>>>)
2023-07-05 19:03:23,967 - DEBUG - <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP toolkit 2.9.1-13, framework 1.6">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:iX="http://ns.adobe.com/iX/1.0/">
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="" pdf:Producer="pikepdf 7.2.0"/>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""><xmp:ModifyDate>2023-07-05T17:03:17+00:00</xmp:ModifyDate>
<xmp:CreateDate>2023-06-27T09:46:39+02:00</xmp:CreateDate>
<xmp:CreatorTool>ocrmypdf 14.2.2.dev31+g7c38c717.d20230620 / Tesseract OCR-PDF 5.3.1-22-g24da</xmp:CreatorTool></rdf:Description>
<rdf:Description xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/" rdf:about="" xapMM:DocumentID="uuid:62258a05-5372-11f9-0000-5664f20a76f7"/>
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" dc:format="application/pdf"><dc:title><rdf:Alt><rdf:li xml:lang="x-default">Deg. 27. KW</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>KOBILIN</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="" pdfaid:part="2" pdfaid:conformance="B"/><rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:MetadataDate="2023-07-05T17:03:17.438999+00:00"/></rdf:RDF>
</x:xmpmeta>

2023-07-05 19:03:23,967 - DEBUG - <<<<< log metadata <<<<<)
2023-07-05 19:03:24,020 - DEBUG - new meta_data....
2023-07-05 19:03:24,020 - DEBUG - <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP toolkit 2.9.1-13, framework 1.6">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:iX="http://ns.adobe.com/iX/1.0/">
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="" pdf:Producer="pikepdf 7.1.2"/>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""><xmp:ModifyDate>2023-07-05T00:00:00</xmp:ModifyDate>
<xmp:CreateDate>2023-07-05T00:00:00</xmp:CreateDate>
<xmp:CreatorTool>synOCR 1.4.0</xmp:CreatorTool></rdf:Description>
<rdf:Description xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/" rdf:about="" xapMM:DocumentID="uuid:62258a05-5372-11f9-0000-5664f20a76f7"/>
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" dc:format="application/pdf"><dc:title><rdf:Alt><rdf:li xml:lang="x-default">Deg. 27. KW</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>KOBILIN</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="" pdfaid:part="2" pdfaid:conformance="B"/><rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:MetadataDate="2023-07-05T19:03:23.974217+02:00"/></rdf:RDF>
</x:xmpmeta>

2023-07-05 19:03:24,021 - INFO - save pdf to file (/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf)
2023-07-05 19:03:24,051 - DEBUG - <<<<<< write meta_data ended
empty
0

[runtime up to now:    00:00:02]

                  target file: 2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | adjusts the attributes of the target file:                                      |
  -----------------------------------------------------------------------------------

                ➜ Adapt file date (Source: NOW)
                ➜ File permissions target file:
                  -rwxrwxrwx+ 1 synOCR synOCR 219453 Jul  5 19:03 /volume1/OCR/_OUTPUT/2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | final tasks:                                                                    |
  -----------------------------------------------------------------------------------

                  INFO: Notify for apprise not defined ...

run user defined post scripts:

Stats:
  runtime last file:    ➜ 00:00:05
  pagecount last file:  ➜ 1
  file count profile :  ➜ (profile default) - 191 PDF's / 634 Pages processed up to now
  file count total:     ➜ 350 PDF's / 1183 Pages processed up to now since 2019-06-04

cleanup:
  delete tmp-files ...
                removed '/tmp/tmp.oMs0RqPMX7/2023.07.01 - testfile.pdf'
                removed '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/synOCR.txt'
                removed '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/synOCR_filename.txt'
                removed directory '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/'
                removed directory '/tmp/tmp.oMs0RqPMX7'

  purge log files ...
  delete 1 log files ( > 10 files)
  delete -9 search files ( > 10 files)

  purge backup deactivated!

  runtime all files:              ➜ 00:01:54


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● ---------------------------------- ●
  ● |    ==> END OF FUNCTIONS <==    | ●
  ● ---------------------------------- ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

from synocr.

geimist commented on June 2, 2024 1

Nice 😃

synOCR write the current version to …/python3_env/synOCR_python_env_version
Every time synOCR is started, the saved version is compared with the installation version and if there is a discrepancy, the Python environment is updated. For some reason this check does not seem to work reliably. But I have not found the error yet.

from synocr.

geimist commented on June 2, 2024

Can you run a file with debug mode (loglevel 2), please?

from synocr.

processing fails since update to 1.4.0: "ModuleNotFoundError: No module named 'pypdf'" about synocr HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent