Code Monkey home page Code Monkey logo

Comments (5)

kobeegh avatar kobeegh commented on June 2, 2024 1

Sure:

    -----------------------------------
    |    ==> installation info <==    |
    -----------------------------------

synOCR-user:              synOCR
synOCR-user is admin:     yes
synOCR-version:           1.4.0
Architecture:             x86_64
DSM-build:                42962
Device:                   918plus (2053505210)
current Profil:           default
monitor is running?:      no
DB-version:               9
used image (created):     jbarlow83/ocrmypdf:latest (2023-06-20T10:34:03)
document author:          
used ocr-parameter (raw): -srd -l deu
OCR-arg 1:                -srd
OCR-arg 2:                -l
OCR-arg 3:                deu
ocropt_array:             -srd -l deu
search prefix:            
replace search prefix:    no
renaming syntax:          
Symbol for tag marking:   #
target file handling:     no
Document split pattern:   
split page handling:      discard
delete blank pages:       
threshold black/white:    
threshold black pixels:   
clean up spaces:          false
Date search method:       use standard search via RegEx
date found order:         firstfound
source for filedate:      now
ignored dates by search:  
date range in past:       0 [absolute: 0]
date range in future:     0 [absolute: 0]
PATH-Variable:            /sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/syno/bin:/usr/syno/sbin:/usr/local/bin:/opt/usr/bin:/usr/syno/synoman/webman/3rdparty/synOCR/bin:/usr/local/bin:/opt/usr/bin
Docker test:              OK
DSM notify to user:       @administrators
apprise notify service:   
apprise attachment:       false
notify language:          enu
Loglevel:                 debug
max. count of logfiles:   10
rotate backupfiles after: (purge backup deactivated)
Source directory:         /volume1/OCR/_INPUT/
Target directory:         /volume1/OCR/_OUTPUT/
BackUp directory:         /volume1/OCR/_BACKUP/



  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● ---------------------------------- ●
  ● |    ==> RUN THE FUNCTIONS <==   | ●
  ● ---------------------------------- ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

  -----------------------------------------------------------------------------------
  | check the python3 installation and the necessary modules:                       |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:01]


                  Check Python:

                  module list:
                  Package               Version
                  --------------------- -----------
                  apprise               1.4.0
                  argcomplete           3.0.8
                  backports.zoneinfo    0.2.1
                  certifi               2023.5.7
                  charset-normalizer    3.1.0
                  click                 8.1.3
                  dateparser            1.1.8
                  DateTime              5.1
                  deprecation           2.1.0
                  idna                  3.4
                  importlib-metadata    6.7.0
                  lxml                  4.9.2
                  Markdown              3.4.3
                  oauthlib              3.2.2
                  packaging             23.1
                  pikepdf               7.1.2
                  Pillow                9.5.0
                  pip                   23.1.2
                  PyPDF2                2.3.1
                  python-dateutil       2.8.2
                  pytz                  2023.3
                  pytz-deprecation-shim 0.1.0.post0
                  PyYAML                6.0
                  regex                 2023.5.5
                  requests              2.31.0
                  requests-oauthlib     1.3.1
                  setuptools            56.0.0
                  six                   1.16.0
                  tomlkit               0.11.8
                  typing_extensions     4.5.0
                  tzdata                2023.3
                  tzlocal               4.3
                  urllib3               2.0.3
                  xmltodict             0.13.0
                  yq                    3.2.2
                  zipp                  3.15.0
                  zope.interface        6.0
                prepare_python: OK
Target temp directory:    /tmp/tmp.rivsr98dQA


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED:                                      ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE:   ➜ 2023.07.01 - testfile.pdf
                  temp. target file: /tmp/tmp.rivsr98dQA/step1_tmp_1688575421/2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | processing PDF @ OCRmyPDF:                                                      |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:00]

                ➜ OCRmyPDF-LOG:
                  WARNING: Error loading config file: .dockercfg: $HOME is not defined
                    DEBUG ocrmypdf - ocrmypdf 14.2.2.dev31+g7c38c717.d20230620
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.subprocess - Found tesseract 5.3.1-22-g24da
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Found gs 9.55.0
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
                    DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
                  chi_sim
                  deu
                  eng
                  fra
                  osd
                  por
                  spa
                  
                     INFO ocrmypdf._validation - reading file from standard input
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/stdin, /tmp/ocrmypdf.io.uuaw1x_6/origin.pdf)
                    DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
                     INFO ocrmypdf._pipeline -    1  skipping all processing on this page
                    DEBUG ocrmypdf._graft -    1  Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
                    DEBUG ocrmypdf._graft -    1  Page rotation: (content, auto) -> page = (0, 0) -> 0
                     INFO ocrmypdf._sync - Postprocessing...
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/graft_layers.pdf, /tmp/ocrmypdf.io.uuaw1x_6/fix_docinfo.pdf)
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.uuaw1x_6/fix_docinfo.pdf', '/tmp/ocrmypdf.io.uuaw1x_6/pdfa.ps']
                    DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0 (2021-09-27)
                    DEBUG ocrmypdf.subprocess.gs - Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
                    DEBUG ocrmypdf.subprocess.gs - This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
                    DEBUG ocrmypdf.subprocess.gs - see the file COPYING for details.
                    DEBUG ocrmypdf.subprocess.gs - Processing pages 1 through 1.
                    DEBUG ocrmypdf.subprocess.gs - Page 1
                    DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0: Setting Overprint Mode to 1
                    DEBUG ocrmypdf.subprocess.gs - not permitted in PDF/A-2, overprint mode not set
                    DEBUG ocrmypdf.subprocess.gs - 
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - XrefExt(xref=224, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=225, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=218, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=219, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=220, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=221, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=222, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=223, ext='.png')
                    DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 8
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 225: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 218: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 219: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 221: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 223: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 225: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 218: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 219: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 223: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/optimize.opt.pdf, /tmp/ocrmypdf.io.uuaw1x_6/optimize.pdf)
                    DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
                     INFO ocrmypdf._pipeline - Image optimization ratio: 1.00 savings: 0.4%
                     INFO ocrmypdf._pipeline - Total file size ratio: 1.01 savings: 0.6%
                    DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.uuaw1x_6/optimize.pdf -> -
                     INFO ocrmypdf._sync - Output sent to stdout
                ← OCRmyPDF-LOG-END


[runtime up to now:    00:00:18]

                target file (OK): /tmp/tmp.rivsr98dQA/step1_tmp_1688575421/2023.07.01 - testfile.pdf

                no split pattern defined or splitting not possible

  -----------------------------------------------------------------------------------
  | handle source file:                                                             |
  -----------------------------------------------------------------------------------

                ➜ backup source file to: /volume1/OCR/_BACKUP/2023.07.01 - testfile.pdf
                removed directory '/tmp/tmp.rivsr98dQA/step1_tmp_1688575421/'

Stats:
  runtime last file:              ➜ 00:00:18
  runtime 1st step (all files):   ➜ 00:00:25


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● STEP 2 - SEARCH TAGS / RENAME / SORT:                                           ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


                list files in INPUT with transcoded special characters:
                ➜ 2023.07.01 - testfile.pdf$

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pypdf'
ERROR at line 2284: pagecount_latest=$( py_page_count "${input}" )
                (pages counted with python module pypdf)
./synOCR.sh: line 2299: 1182+ERROR at line 1739: python3
ERROR at line 2284: python3: syntax error in expression (error token is "at line 1739: python3
ERROR at line 2284: python3")

  purge log files ...
  delete 1 log files ( > 10 files)
  delete -10 search files ( > 10 files)

  purge backup deactivated!
rmdir: failed to remove '/tmp/tmp.rivsr98dQA': Directory not empty
  rmdir: removing directory, '/tmp/tmp.rivsr98dQA'

  runtime all files:              ➜ 00:00:25


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● ---------------------------------- ●
  ● |    ==> END OF FUNCTIONS <==    | ●
  ● ---------------------------------- ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

from synocr.

geimist avatar geimist commented on June 2, 2024 1

For some reason, the Python environment for synOCR has not been updated.
Do you see yourself being able to run the command below in the terminal or via the DSM task scheduler? It will delete the Python environment so that it will be recreated on the next run.
rm -rf /usr/syno/synoman/webman/3rdparty/synOCR/python3_env

Alternatively, you can use HyperBackup to create a backup of synOCR and then uninstall synOCR; reinstall it and restore the backup.

from synocr.

kobeegh avatar kobeegh commented on June 2, 2024 1

Have deleted the directory and started a new attempt. It works now, thank you very much!

If it is of interest to you, here is the debug log from the successful run where it initially pulls up the python environment and modules:


    -----------------------------------
    |    ==> installation info <==    |
    -----------------------------------

synOCR-user:              synOCR
synOCR-user is admin:     yes
synOCR-version:           1.4.0
Architecture:             x86_64
DSM-build:                42962
Device:                   918plus (2053505210)
current Profil:           default
monitor is running?:      no
DB-version:               9
used image (created):     jbarlow83/ocrmypdf:latest (2023-06-20T10:34:03)
document author:          
used ocr-parameter (raw): -srd -l deu
OCR-arg 1:                -srd
OCR-arg 2:                -l
OCR-arg 3:                deu
ocropt_array:             -srd -l deu
search prefix:            
replace search prefix:    no
renaming syntax:          
Symbol for tag marking:   #
target file handling:     no
Document split pattern:   
split page handling:      discard
delete blank pages:       
threshold black/white:    
threshold black pixels:   
clean up spaces:          false
Date search method:       use standard search via RegEx
date found order:         firstfound
source for filedate:      now
ignored dates by search:  
date range in past:       0 [absolute: 0]
date range in future:     0 [absolute: 0]
PATH-Variable:            /sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/syno/bin:/usr/syno/sbin:/usr/local/bin:/opt/usr/bin:/usr/syno/synoman/webman/3rdparty/synOCR/bin:/usr/local/bin:/opt/usr/bin
Docker test:              OK
DSM notify to user:       @administrators
apprise notify service:   
apprise attachment:       false
notify language:          enu
Loglevel:                 debug
max. count of logfiles:   10
rotate backupfiles after: (purge backup deactivated)
Source directory:         /volume1/OCR/_INPUT/
Target directory:         /volume1/OCR/_OUTPUT/
BackUp directory:         /volume1/OCR/_BACKUP/



  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● ---------------------------------- ●
  ● |    ==> RUN THE FUNCTIONS <==   | ●
  ● ---------------------------------- ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

  -----------------------------------------------------------------------------------
  | check the python3 installation and the necessary modules:                       |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:00]


                  Check Python:
                  python3 already installed (/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/bin/python3)

                  Check pip:
                  pip already installed (pip 21.1.1 from /usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/pip (python 3.8)) / upgrade available ...
                  Requirement already satisfied: pip in ./python3_env/lib/python3.8/site-packages (21.1.1)
                  Collecting pip
                    Using cached pip-23.1.2-py3-none-any.whl (2.1 MB)
                  Installing collected packages: pip
                    Attempting uninstall: pip
                      Found existing installation: pip 21.1.1
                      Uninstalling pip-21.1.1:
                        Successfully uninstalled pip-21.1.1
                  Successfully installed pip-23.1.2

                  read installed python modules:
                  Package    Version
                  ---------- -------
                  pip        23.1.2
                  setuptools 56.0.0

                ➜ check python module "DateTime": ➜ DateTime was not found and will be installed ➜ ok
                ➜ check python module "dateparser": ➜ dateparser was not found and will be installed ➜ ok
                ➜ check python module "pypdf==3.5.1": ➜ pypdf==3.5.1 was not found and will be installed ➜ ok
                ➜ check python module "pikepdf==7.1.2": ➜ pikepdf==7.1.2 was not found and will be installed ➜ ok
                ➜ check python module "Pillow": ➜ Pillow was not found and will be installed ➜ ok
                ➜ check python module "yq": ➜ yq was not found and will be installed ➜ ok
                ➜ check python module "PyYAML": ➜ PyYAML was not found and will be installed ➜ ok
                ➜ check python module "apprise": ➜ apprise was not found and will be installed ➜ ok


                  module list:
                  Package            Version
                  ------------------ --------
                  apprise            1.4.0
                  argcomplete        3.1.1
                  backports.zoneinfo 0.2.1
                  certifi            2023.5.7
                  charset-normalizer 3.1.0
                  click              8.1.3
                  dateparser         1.1.8
                  DateTime           5.1
                  deprecation        2.1.0
                  idna               3.4
                  importlib-metadata 6.7.0
                  lxml               4.9.3
                  Markdown           3.4.3
                  oauthlib           3.2.2
                  packaging          23.1
                  pikepdf            7.1.2
                  Pillow             10.0.0
                  pip                23.1.2
                  pypdf              3.5.1
                  python-dateutil    2.8.2
                  pytz               2023.3
                  PyYAML             6.0
                  regex              2023.6.3
                  requests           2.31.0
                  requests-oauthlib  1.3.1
                  setuptools         56.0.0
                  six                1.16.0
                  tomlkit            0.11.8
                  typing_extensions  4.7.1
                  tzlocal            5.0.1
                  urllib3            2.0.3
                  xmltodict          0.13.0
                  yq                 3.2.2
                  zipp               3.15.0
                  zope.interface     6.0
                prepare_python: OK
Target temp directory:    /tmp/tmp.oMs0RqPMX7


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED:                                      ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE:   ➜ 2023.07.01 - testfile.pdf
                  temp. target file: /tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | processing PDF @ OCRmyPDF:                                                      |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:00]

                ➜ OCRmyPDF-LOG:
                  WARNING: Error loading config file: .dockercfg: $HOME is not defined
                    DEBUG ocrmypdf - ocrmypdf 14.2.2.dev31+g7c38c717.d20230620
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.subprocess - Found tesseract 5.3.1-22-g24da
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Found gs 9.55.0
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
                    DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
                  chi_sim
                  deu
                  eng
                  fra
                  osd
                  por
                  spa
                  
                     INFO ocrmypdf._validation - reading file from standard input
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/stdin, /tmp/ocrmypdf.io.0889pg_j/origin.pdf)
                    DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
                     INFO ocrmypdf._pipeline -    1  skipping all processing on this page
                    DEBUG ocrmypdf._graft -    1  Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
                    DEBUG ocrmypdf._graft -    1  Page rotation: (content, auto) -> page = (0, 0) -> 0
                     INFO ocrmypdf._sync - Postprocessing...
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/graft_layers.pdf, /tmp/ocrmypdf.io.0889pg_j/fix_docinfo.pdf)
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.0889pg_j/fix_docinfo.pdf', '/tmp/ocrmypdf.io.0889pg_j/pdfa.ps']
                    DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0 (2021-09-27)
                    DEBUG ocrmypdf.subprocess.gs - Copyright (C) 2021 Artifex Software, Inc.  All rights reserved.
                    DEBUG ocrmypdf.subprocess.gs - This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
                    DEBUG ocrmypdf.subprocess.gs - see the file COPYING for details.
                    DEBUG ocrmypdf.subprocess.gs - Processing pages 1 through 1.
                    DEBUG ocrmypdf.subprocess.gs - Page 1
                    DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0: Setting Overprint Mode to 1
                    DEBUG ocrmypdf.subprocess.gs - not permitted in PDF/A-2, overprint mode not set
                    DEBUG ocrmypdf.subprocess.gs - 
                    DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - XrefExt(xref=224, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=225, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=218, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=219, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=220, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=221, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=222, ext='.png')
                    DEBUG ocrmypdf.optimize - XrefExt(xref=223, ext='.png')
                    DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 8
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 225: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 218: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 219: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 221: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 223: marking this JPEG as deflatable
                    DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
                    DEBUG ocrmypdf.optimize - xref 224: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 225: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 218: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 219: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - xref 223: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
                    DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
                    DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/optimize.opt.pdf, /tmp/ocrmypdf.io.0889pg_j/optimize.pdf)
                    DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
                    DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
                     INFO ocrmypdf._pipeline - Image optimization ratio: 1.00 savings: 0.4%
                     INFO ocrmypdf._pipeline - Total file size ratio: 1.01 savings: 0.6%
                    DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.0889pg_j/optimize.pdf -> -
                     INFO ocrmypdf._sync - Output sent to stdout
                ← OCRmyPDF-LOG-END


[runtime up to now:    00:00:16]

                target file (OK): /tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/2023.07.01 - testfile.pdf

                no split pattern defined or splitting not possible

  -----------------------------------------------------------------------------------
  | handle source file:                                                             |
  -----------------------------------------------------------------------------------

                ➜ backup source file to: /volume1/OCR/_BACKUP/2023.07.01 - testfile.pdf
                removed directory '/tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/'

Stats:
  runtime last file:              ➜ 00:00:16
  runtime 1st step (all files):   ➜ 00:01:49


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● STEP 2 - SEARCH TAGS / RENAME / SORT:                                           ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


                list files in INPUT with transcoded special characters:
                ➜ 2023.07.01 - testfile.pdf$

                (pages counted with python module pypdf)

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE:   ➜ 2023.07.01 - testfile.pdf
                ➜ File permissions source file:
                  -rw-rw-r-- 1 synOCR synOCR 219590 Jul  5 18:02 /tmp/tmp.oMs0RqPMX7/2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | search tags in ocr text:                                                        |
  -----------------------------------------------------------------------------------

                no tags defined

  -----------------------------------------------------------------------------------
  | search for a valid date in ocr text:                                            |
  -----------------------------------------------------------------------------------

                run RegEx date search - search for date format: 1 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
                run RegEx date search - search for date format: 2 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
                run RegEx date search - search for date format: 3 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
                  Date not found in OCR text - use file date:
                  day:  05
                  month:07
                  year: 2023

  -----------------------------------------------------------------------------------
  | rename and sort to target folder:                                               |
  -----------------------------------------------------------------------------------


[runtime up to now:    00:00:01]

                ➜ renaming:
                  apply renaming syntax ➜ ! WARNING ! – No variables were found for renaming. A fallback is used to prevent an empty file name: 2023.07.01 - testfile

[runtime up to now:    00:00:01]

                ➜ insert metadata (use python pikepdf)
                used metadata:
                ➜ '/Author': '',
                ➜ '/Keywords': '',
                ➜ '/CreationDate': 'D:20230705',
                ➜ '/CreatorTool': 'synOCR 1.4.0'

                call handlePdf.py -dbg_lvl "2" -dbg_file "/volume1/OCR/_LOG/synOCR_2023-07-05_19-01-33.log" -task metadata -inputFile "/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf" -metaData "{'/Author': '',
'/Keywords': '',
'/CreationDate': 'D:20230705',
'/CreatorTool': 'synOCR 1.4.0'}" -outputFile "/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf"

2023-07-05 19:03:23,958 - INFO - HandlePdf started
2023-07-05 19:03:23,958 - INFO - Version: 0.2
2023-07-05 19:03:23,958 - INFO - Task=metadata
2023-07-05 19:03:23,959 - DEBUG - set_task_metadata_parameter(input_file=/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf, output_file=/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf,                    meta_data_str={'/Author': '',
'/Keywords': '',
'/CreationDate': 'D:20230705',
'/CreatorTool': 'synOCR 1.4.0'})
2023-07-05 19:03:23,959 - DEBUG - <<<<<< set_task_meta_data_parameter ended
2023-07-05 19:03:23,959 - DEBUG - >>>>>> open_pdf started
2023-07-05 19:03:23,965 - DEBUG - <<<<<< open_pdf ended
2023-07-05 19:03:23,966 - INFO - >>>>> write meta_data started
2023-07-05 19:03:23,966 - DEBUG - old meta_data....
2023-07-05 19:03:23,966 - DEBUG - >>>>> log metadata >>>>>)
2023-07-05 19:03:23,967 - DEBUG - <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP toolkit 2.9.1-13, framework 1.6">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:iX="http://ns.adobe.com/iX/1.0/">
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="" pdf:Producer="pikepdf 7.2.0"/>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""><xmp:ModifyDate>2023-07-05T17:03:17+00:00</xmp:ModifyDate>
<xmp:CreateDate>2023-06-27T09:46:39+02:00</xmp:CreateDate>
<xmp:CreatorTool>ocrmypdf 14.2.2.dev31+g7c38c717.d20230620 / Tesseract OCR-PDF 5.3.1-22-g24da</xmp:CreatorTool></rdf:Description>
<rdf:Description xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/" rdf:about="" xapMM:DocumentID="uuid:62258a05-5372-11f9-0000-5664f20a76f7"/>
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" dc:format="application/pdf"><dc:title><rdf:Alt><rdf:li xml:lang="x-default">Deg. 27. KW</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>KOBILIN</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="" pdfaid:part="2" pdfaid:conformance="B"/><rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:MetadataDate="2023-07-05T17:03:17.438999+00:00"/></rdf:RDF>
</x:xmpmeta>

2023-07-05 19:03:23,967 - DEBUG - <<<<< log metadata <<<<<)
2023-07-05 19:03:24,020 - DEBUG - new meta_data....
2023-07-05 19:03:24,020 - DEBUG - <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP toolkit 2.9.1-13, framework 1.6">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:iX="http://ns.adobe.com/iX/1.0/">
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="" pdf:Producer="pikepdf 7.1.2"/>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""><xmp:ModifyDate>2023-07-05T00:00:00</xmp:ModifyDate>
<xmp:CreateDate>2023-07-05T00:00:00</xmp:CreateDate>
<xmp:CreatorTool>synOCR 1.4.0</xmp:CreatorTool></rdf:Description>
<rdf:Description xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/" rdf:about="" xapMM:DocumentID="uuid:62258a05-5372-11f9-0000-5664f20a76f7"/>
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" dc:format="application/pdf"><dc:title><rdf:Alt><rdf:li xml:lang="x-default">Deg. 27. KW</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>KOBILIN</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="" pdfaid:part="2" pdfaid:conformance="B"/><rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:MetadataDate="2023-07-05T19:03:23.974217+02:00"/></rdf:RDF>
</x:xmpmeta>

2023-07-05 19:03:24,021 - INFO - save pdf to file (/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf)
2023-07-05 19:03:24,051 - DEBUG - <<<<<< write meta_data ended
empty
0

[runtime up to now:    00:00:02]

                  target file: 2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | adjusts the attributes of the target file:                                      |
  -----------------------------------------------------------------------------------

                ➜ Adapt file date (Source: NOW)
                ➜ File permissions target file:
                  -rwxrwxrwx+ 1 synOCR synOCR 219453 Jul  5 19:03 /volume1/OCR/_OUTPUT/2023.07.01 - testfile.pdf

  -----------------------------------------------------------------------------------
  | final tasks:                                                                    |
  -----------------------------------------------------------------------------------

                  INFO: Notify for apprise not defined ...

run user defined post scripts:

Stats:
  runtime last file:    ➜ 00:00:05
  pagecount last file:  ➜ 1
  file count profile :  ➜ (profile default) - 191 PDF's / 634 Pages processed up to now
  file count total:     ➜ 350 PDF's / 1183 Pages processed up to now since 2019-06-04

cleanup:
  delete tmp-files ...
                removed '/tmp/tmp.oMs0RqPMX7/2023.07.01 - testfile.pdf'
                removed '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/synOCR.txt'
                removed '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/synOCR_filename.txt'
                removed directory '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/'
                removed directory '/tmp/tmp.oMs0RqPMX7'

  purge log files ...
  delete 1 log files ( > 10 files)
  delete -9 search files ( > 10 files)

  purge backup deactivated!

  runtime all files:              ➜ 00:01:54


  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  ● ---------------------------------- ●
  ● |    ==> END OF FUNCTIONS <==    | ●
  ● ---------------------------------- ●
  ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

from synocr.

geimist avatar geimist commented on June 2, 2024 1

Nice πŸ˜ƒ

synOCR write the current version to …/python3_env/synOCR_python_env_version
Every time synOCR is started, the saved version is compared with the installation version and if there is a discrepancy, the Python environment is updated. For some reason this check does not seem to work reliably. But I have not found the error yet.

from synocr.

geimist avatar geimist commented on June 2, 2024

Can you run a file with debug mode (loglevel 2), please?

from synocr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.