Comments (5)
Sure:
-----------------------------------
| ==> installation info <== |
-----------------------------------
synOCR-user: synOCR
synOCR-user is admin: yes
synOCR-version: 1.4.0
Architecture: x86_64
DSM-build: 42962
Device: 918plus (2053505210)
current Profil: default
monitor is running?: no
DB-version: 9
used image (created): jbarlow83/ocrmypdf:latest (2023-06-20T10:34:03)
document author:
used ocr-parameter (raw): -srd -l deu
OCR-arg 1: -srd
OCR-arg 2: -l
OCR-arg 3: deu
ocropt_array: -srd -l deu
search prefix:
replace search prefix: no
renaming syntax:
Symbol for tag marking: #
target file handling: no
Document split pattern:
split page handling: discard
delete blank pages:
threshold black/white:
threshold black pixels:
clean up spaces: false
Date search method: use standard search via RegEx
date found order: firstfound
source for filedate: now
ignored dates by search:
date range in past: 0 [absolute: 0]
date range in future: 0 [absolute: 0]
PATH-Variable: /sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/syno/bin:/usr/syno/sbin:/usr/local/bin:/opt/usr/bin:/usr/syno/synoman/webman/3rdparty/synOCR/bin:/usr/local/bin:/opt/usr/bin
Docker test: OK
DSM notify to user: @administrators
apprise notify service:
apprise attachment: false
notify language: enu
Loglevel: debug
max. count of logfiles: 10
rotate backupfiles after: (purge backup deactivated)
Source directory: /volume1/OCR/_INPUT/
Target directory: /volume1/OCR/_OUTPUT/
BackUp directory: /volume1/OCR/_BACKUP/
ββββββββββββββββββββββββββββββββββββββ
β ---------------------------------- β
β | ==> RUN THE FUNCTIONS <== | β
β ---------------------------------- β
ββββββββββββββββββββββββββββββββββββββ
-----------------------------------------------------------------------------------
| check the python3 installation and the necessary modules: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:01]
Check Python:
module list:
Package Version
--------------------- -----------
apprise 1.4.0
argcomplete 3.0.8
backports.zoneinfo 0.2.1
certifi 2023.5.7
charset-normalizer 3.1.0
click 8.1.3
dateparser 1.1.8
DateTime 5.1
deprecation 2.1.0
idna 3.4
importlib-metadata 6.7.0
lxml 4.9.2
Markdown 3.4.3
oauthlib 3.2.2
packaging 23.1
pikepdf 7.1.2
Pillow 9.5.0
pip 23.1.2
PyPDF2 2.3.1
python-dateutil 2.8.2
pytz 2023.3
pytz-deprecation-shim 0.1.0.post0
PyYAML 6.0
regex 2023.5.5
requests 2.31.0
requests-oauthlib 1.3.1
setuptools 56.0.0
six 1.16.0
tomlkit 0.11.8
typing_extensions 4.5.0
tzdata 2023.3
tzlocal 4.3
urllib3 2.0.3
xmltodict 0.13.0
yq 3.2.2
zipp 3.15.0
zope.interface 6.0
prepare_python: OK
Target temp directory: /tmp/tmp.rivsr98dQA
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED: β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CURRENT FILE: β 2023.07.01 - testfile.pdf
temp. target file: /tmp/tmp.rivsr98dQA/step1_tmp_1688575421/2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| processing PDF @ OCRmyPDF: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:00]
β OCRmyPDF-LOG:
WARNING: Error loading config file: .dockercfg: $HOME is not defined
DEBUG ocrmypdf - ocrmypdf 14.2.2.dev31+g7c38c717.d20230620
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Found tesseract 5.3.1-22-g24da
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Found gs 9.55.0
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
chi_sim
deu
eng
fra
osd
por
spa
INFO ocrmypdf._validation - reading file from standard input
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/stdin, /tmp/ocrmypdf.io.uuaw1x_6/origin.pdf)
DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
INFO ocrmypdf._pipeline - 1 skipping all processing on this page
DEBUG ocrmypdf._graft - 1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
DEBUG ocrmypdf._graft - 1 Page rotation: (content, auto) -> page = (0, 0) -> 0
INFO ocrmypdf._sync - Postprocessing...
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/graft_layers.pdf, /tmp/ocrmypdf.io.uuaw1x_6/fix_docinfo.pdf)
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.uuaw1x_6/fix_docinfo.pdf', '/tmp/ocrmypdf.io.uuaw1x_6/pdfa.ps']
DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0 (2021-09-27)
DEBUG ocrmypdf.subprocess.gs - Copyright (C) 2021 Artifex Software, Inc. All rights reserved.
DEBUG ocrmypdf.subprocess.gs - This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
DEBUG ocrmypdf.subprocess.gs - see the file COPYING for details.
DEBUG ocrmypdf.subprocess.gs - Processing pages 1 through 1.
DEBUG ocrmypdf.subprocess.gs - Page 1
DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0: Setting Overprint Mode to 1
DEBUG ocrmypdf.subprocess.gs - not permitted in PDF/A-2, overprint mode not set
DEBUG ocrmypdf.subprocess.gs -
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - XrefExt(xref=224, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=225, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=218, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=219, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=220, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=221, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=222, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=223, ext='.png')
DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 8
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 225: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 218: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 219: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 221: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 223: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 225: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 218: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 219: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 223: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/optimize.opt.pdf, /tmp/ocrmypdf.io.uuaw1x_6/optimize.pdf)
DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
INFO ocrmypdf._pipeline - Image optimization ratio: 1.00 savings: 0.4%
INFO ocrmypdf._pipeline - Total file size ratio: 1.01 savings: 0.6%
DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.uuaw1x_6/optimize.pdf -> -
INFO ocrmypdf._sync - Output sent to stdout
β OCRmyPDF-LOG-END
[runtime up to now: 00:00:18]
target file (OK): /tmp/tmp.rivsr98dQA/step1_tmp_1688575421/2023.07.01 - testfile.pdf
no split pattern defined or splitting not possible
-----------------------------------------------------------------------------------
| handle source file: |
-----------------------------------------------------------------------------------
β backup source file to: /volume1/OCR/_BACKUP/2023.07.01 - testfile.pdf
removed directory '/tmp/tmp.rivsr98dQA/step1_tmp_1688575421/'
Stats:
runtime last file: β 00:00:18
runtime 1st step (all files): β 00:00:25
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 2 - SEARCH TAGS / RENAME / SORT: β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
list files in INPUT with transcoded special characters:
β 2023.07.01 - testfile.pdf$
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pypdf'
ERROR at line 2284: pagecount_latest=$( py_page_count "${input}" )
(pages counted with python module pypdf)
./synOCR.sh: line 2299: 1182+ERROR at line 1739: python3
ERROR at line 2284: python3: syntax error in expression (error token is "at line 1739: python3
ERROR at line 2284: python3")
purge log files ...
delete 1 log files ( > 10 files)
delete -10 search files ( > 10 files)
purge backup deactivated!
rmdir: failed to remove '/tmp/tmp.rivsr98dQA': Directory not empty
rmdir: removing directory, '/tmp/tmp.rivsr98dQA'
runtime all files: β 00:00:25
ββββββββββββββββββββββββββββββββββββββ
β ---------------------------------- β
β | ==> END OF FUNCTIONS <== | β
β ---------------------------------- β
ββββββββββββββββββββββββββββββββββββββ
from synocr.
For some reason, the Python environment for synOCR has not been updated.
Do you see yourself being able to run the command below in the terminal or via the DSM task scheduler? It will delete the Python environment so that it will be recreated on the next run.
rm -rf /usr/syno/synoman/webman/3rdparty/synOCR/python3_env
Alternatively, you can use HyperBackup to create a backup of synOCR and then uninstall synOCR; reinstall it and restore the backup.
from synocr.
Have deleted the directory and started a new attempt. It works now, thank you very much!
If it is of interest to you, here is the debug log from the successful run where it initially pulls up the python environment and modules:
-----------------------------------
| ==> installation info <== |
-----------------------------------
synOCR-user: synOCR
synOCR-user is admin: yes
synOCR-version: 1.4.0
Architecture: x86_64
DSM-build: 42962
Device: 918plus (2053505210)
current Profil: default
monitor is running?: no
DB-version: 9
used image (created): jbarlow83/ocrmypdf:latest (2023-06-20T10:34:03)
document author:
used ocr-parameter (raw): -srd -l deu
OCR-arg 1: -srd
OCR-arg 2: -l
OCR-arg 3: deu
ocropt_array: -srd -l deu
search prefix:
replace search prefix: no
renaming syntax:
Symbol for tag marking: #
target file handling: no
Document split pattern:
split page handling: discard
delete blank pages:
threshold black/white:
threshold black pixels:
clean up spaces: false
Date search method: use standard search via RegEx
date found order: firstfound
source for filedate: now
ignored dates by search:
date range in past: 0 [absolute: 0]
date range in future: 0 [absolute: 0]
PATH-Variable: /sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/syno/bin:/usr/syno/sbin:/usr/local/bin:/opt/usr/bin:/usr/syno/synoman/webman/3rdparty/synOCR/bin:/usr/local/bin:/opt/usr/bin
Docker test: OK
DSM notify to user: @administrators
apprise notify service:
apprise attachment: false
notify language: enu
Loglevel: debug
max. count of logfiles: 10
rotate backupfiles after: (purge backup deactivated)
Source directory: /volume1/OCR/_INPUT/
Target directory: /volume1/OCR/_OUTPUT/
BackUp directory: /volume1/OCR/_BACKUP/
ββββββββββββββββββββββββββββββββββββββ
β ---------------------------------- β
β | ==> RUN THE FUNCTIONS <== | β
β ---------------------------------- β
ββββββββββββββββββββββββββββββββββββββ
-----------------------------------------------------------------------------------
| check the python3 installation and the necessary modules: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:00]
Check Python:
python3 already installed (/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/bin/python3)
Check pip:
pip already installed (pip 21.1.1 from /usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/pip (python 3.8)) / upgrade available ...
Requirement already satisfied: pip in ./python3_env/lib/python3.8/site-packages (21.1.1)
Collecting pip
Using cached pip-23.1.2-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.1.1
Uninstalling pip-21.1.1:
Successfully uninstalled pip-21.1.1
Successfully installed pip-23.1.2
read installed python modules:
Package Version
---------- -------
pip 23.1.2
setuptools 56.0.0
β check python module "DateTime": β DateTime was not found and will be installed β ok
β check python module "dateparser": β dateparser was not found and will be installed β ok
β check python module "pypdf==3.5.1": β pypdf==3.5.1 was not found and will be installed β ok
β check python module "pikepdf==7.1.2": β pikepdf==7.1.2 was not found and will be installed β ok
β check python module "Pillow": β Pillow was not found and will be installed β ok
β check python module "yq": β yq was not found and will be installed β ok
β check python module "PyYAML": β PyYAML was not found and will be installed β ok
β check python module "apprise": β apprise was not found and will be installed β ok
module list:
Package Version
------------------ --------
apprise 1.4.0
argcomplete 3.1.1
backports.zoneinfo 0.2.1
certifi 2023.5.7
charset-normalizer 3.1.0
click 8.1.3
dateparser 1.1.8
DateTime 5.1
deprecation 2.1.0
idna 3.4
importlib-metadata 6.7.0
lxml 4.9.3
Markdown 3.4.3
oauthlib 3.2.2
packaging 23.1
pikepdf 7.1.2
Pillow 10.0.0
pip 23.1.2
pypdf 3.5.1
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0
regex 2023.6.3
requests 2.31.0
requests-oauthlib 1.3.1
setuptools 56.0.0
six 1.16.0
tomlkit 0.11.8
typing_extensions 4.7.1
tzlocal 5.0.1
urllib3 2.0.3
xmltodict 0.13.0
yq 3.2.2
zipp 3.15.0
zope.interface 6.0
prepare_python: OK
Target temp directory: /tmp/tmp.oMs0RqPMX7
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED: β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CURRENT FILE: β 2023.07.01 - testfile.pdf
temp. target file: /tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| processing PDF @ OCRmyPDF: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:00]
β OCRmyPDF-LOG:
WARNING: Error loading config file: .dockercfg: $HOME is not defined
DEBUG ocrmypdf - ocrmypdf 14.2.2.dev31+g7c38c717.d20230620
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Found tesseract 5.3.1-22-g24da
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Found gs 9.55.0
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
chi_sim
deu
eng
fra
osd
por
spa
INFO ocrmypdf._validation - reading file from standard input
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/stdin, /tmp/ocrmypdf.io.0889pg_j/origin.pdf)
DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
INFO ocrmypdf._pipeline - 1 skipping all processing on this page
DEBUG ocrmypdf._graft - 1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
DEBUG ocrmypdf._graft - 1 Page rotation: (content, auto) -> page = (0, 0) -> 0
INFO ocrmypdf._sync - Postprocessing...
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/graft_layers.pdf, /tmp/ocrmypdf.io.0889pg_j/fix_docinfo.pdf)
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.0889pg_j/fix_docinfo.pdf', '/tmp/ocrmypdf.io.0889pg_j/pdfa.ps']
DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0 (2021-09-27)
DEBUG ocrmypdf.subprocess.gs - Copyright (C) 2021 Artifex Software, Inc. All rights reserved.
DEBUG ocrmypdf.subprocess.gs - This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
DEBUG ocrmypdf.subprocess.gs - see the file COPYING for details.
DEBUG ocrmypdf.subprocess.gs - Processing pages 1 through 1.
DEBUG ocrmypdf.subprocess.gs - Page 1
DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0: Setting Overprint Mode to 1
DEBUG ocrmypdf.subprocess.gs - not permitted in PDF/A-2, overprint mode not set
DEBUG ocrmypdf.subprocess.gs -
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - XrefExt(xref=224, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=225, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=218, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=219, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=220, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=221, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=222, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=223, ext='.png')
DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 8
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 225: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 218: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 219: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 221: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 223: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 225: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 218: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 219: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 223: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/optimize.opt.pdf, /tmp/ocrmypdf.io.0889pg_j/optimize.pdf)
DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
INFO ocrmypdf._pipeline - Image optimization ratio: 1.00 savings: 0.4%
INFO ocrmypdf._pipeline - Total file size ratio: 1.01 savings: 0.6%
DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.0889pg_j/optimize.pdf -> -
INFO ocrmypdf._sync - Output sent to stdout
β OCRmyPDF-LOG-END
[runtime up to now: 00:00:16]
target file (OK): /tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/2023.07.01 - testfile.pdf
no split pattern defined or splitting not possible
-----------------------------------------------------------------------------------
| handle source file: |
-----------------------------------------------------------------------------------
β backup source file to: /volume1/OCR/_BACKUP/2023.07.01 - testfile.pdf
removed directory '/tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/'
Stats:
runtime last file: β 00:00:16
runtime 1st step (all files): β 00:01:49
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β STEP 2 - SEARCH TAGS / RENAME / SORT: β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
list files in INPUT with transcoded special characters:
β 2023.07.01 - testfile.pdf$
(pages counted with python module pypdf)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CURRENT FILE: β 2023.07.01 - testfile.pdf
β File permissions source file:
-rw-rw-r-- 1 synOCR synOCR 219590 Jul 5 18:02 /tmp/tmp.oMs0RqPMX7/2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| search tags in ocr text: |
-----------------------------------------------------------------------------------
no tags defined
-----------------------------------------------------------------------------------
| search for a valid date in ocr text: |
-----------------------------------------------------------------------------------
run RegEx date search - search for date format: 1 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
run RegEx date search - search for date format: 2 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
run RegEx date search - search for date format: 3 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
Date not found in OCR text - use file date:
day: 05
month:07
year: 2023
-----------------------------------------------------------------------------------
| rename and sort to target folder: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:01]
β renaming:
apply renaming syntax β ! WARNING ! β No variables were found for renaming. A fallback is used to prevent an empty file name: 2023.07.01 - testfile
[runtime up to now: 00:00:01]
β insert metadata (use python pikepdf)
used metadata:
β '/Author': '',
β '/Keywords': '',
β '/CreationDate': 'D:20230705',
β '/CreatorTool': 'synOCR 1.4.0'
call handlePdf.py -dbg_lvl "2" -dbg_file "/volume1/OCR/_LOG/synOCR_2023-07-05_19-01-33.log" -task metadata -inputFile "/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf" -metaData "{'/Author': '',
'/Keywords': '',
'/CreationDate': 'D:20230705',
'/CreatorTool': 'synOCR 1.4.0'}" -outputFile "/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf"
2023-07-05 19:03:23,958 - INFO - HandlePdf started
2023-07-05 19:03:23,958 - INFO - Version: 0.2
2023-07-05 19:03:23,958 - INFO - Task=metadata
2023-07-05 19:03:23,959 - DEBUG - set_task_metadata_parameter(input_file=/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf, output_file=/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf, meta_data_str={'/Author': '',
'/Keywords': '',
'/CreationDate': 'D:20230705',
'/CreatorTool': 'synOCR 1.4.0'})
2023-07-05 19:03:23,959 - DEBUG - <<<<<< set_task_meta_data_parameter ended
2023-07-05 19:03:23,959 - DEBUG - >>>>>> open_pdf started
2023-07-05 19:03:23,965 - DEBUG - <<<<<< open_pdf ended
2023-07-05 19:03:23,966 - INFO - >>>>> write meta_data started
2023-07-05 19:03:23,966 - DEBUG - old meta_data....
2023-07-05 19:03:23,966 - DEBUG - >>>>> log metadata >>>>>)
2023-07-05 19:03:23,967 - DEBUG - <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP toolkit 2.9.1-13, framework 1.6">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:iX="http://ns.adobe.com/iX/1.0/">
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="" pdf:Producer="pikepdf 7.2.0"/>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""><xmp:ModifyDate>2023-07-05T17:03:17+00:00</xmp:ModifyDate>
<xmp:CreateDate>2023-06-27T09:46:39+02:00</xmp:CreateDate>
<xmp:CreatorTool>ocrmypdf 14.2.2.dev31+g7c38c717.d20230620 / Tesseract OCR-PDF 5.3.1-22-g24da</xmp:CreatorTool></rdf:Description>
<rdf:Description xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/" rdf:about="" xapMM:DocumentID="uuid:62258a05-5372-11f9-0000-5664f20a76f7"/>
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" dc:format="application/pdf"><dc:title><rdf:Alt><rdf:li xml:lang="x-default">Deg. 27. KW</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>KOBILIN</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="" pdfaid:part="2" pdfaid:conformance="B"/><rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:MetadataDate="2023-07-05T17:03:17.438999+00:00"/></rdf:RDF>
</x:xmpmeta>
2023-07-05 19:03:23,967 - DEBUG - <<<<< log metadata <<<<<)
2023-07-05 19:03:24,020 - DEBUG - new meta_data....
2023-07-05 19:03:24,020 - DEBUG - <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP toolkit 2.9.1-13, framework 1.6">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:iX="http://ns.adobe.com/iX/1.0/">
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="" pdf:Producer="pikepdf 7.1.2"/>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""><xmp:ModifyDate>2023-07-05T00:00:00</xmp:ModifyDate>
<xmp:CreateDate>2023-07-05T00:00:00</xmp:CreateDate>
<xmp:CreatorTool>synOCR 1.4.0</xmp:CreatorTool></rdf:Description>
<rdf:Description xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/" rdf:about="" xapMM:DocumentID="uuid:62258a05-5372-11f9-0000-5664f20a76f7"/>
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" dc:format="application/pdf"><dc:title><rdf:Alt><rdf:li xml:lang="x-default">Deg. 27. KW</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>KOBILIN</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="" pdfaid:part="2" pdfaid:conformance="B"/><rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:MetadataDate="2023-07-05T19:03:23.974217+02:00"/></rdf:RDF>
</x:xmpmeta>
2023-07-05 19:03:24,021 - INFO - save pdf to file (/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf)
2023-07-05 19:03:24,051 - DEBUG - <<<<<< write meta_data ended
empty
0
[runtime up to now: 00:00:02]
target file: 2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| adjusts the attributes of the target file: |
-----------------------------------------------------------------------------------
β Adapt file date (Source: NOW)
β File permissions target file:
-rwxrwxrwx+ 1 synOCR synOCR 219453 Jul 5 19:03 /volume1/OCR/_OUTPUT/2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| final tasks: |
-----------------------------------------------------------------------------------
INFO: Notify for apprise not defined ...
run user defined post scripts:
Stats:
runtime last file: β 00:00:05
pagecount last file: β 1
file count profile : β (profile default) - 191 PDF's / 634 Pages processed up to now
file count total: β 350 PDF's / 1183 Pages processed up to now since 2019-06-04
cleanup:
delete tmp-files ...
removed '/tmp/tmp.oMs0RqPMX7/2023.07.01 - testfile.pdf'
removed '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/synOCR.txt'
removed '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/synOCR_filename.txt'
removed directory '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/'
removed directory '/tmp/tmp.oMs0RqPMX7'
purge log files ...
delete 1 log files ( > 10 files)
delete -9 search files ( > 10 files)
purge backup deactivated!
runtime all files: β 00:01:54
ββββββββββββββββββββββββββββββββββββββ
β ---------------------------------- β
β | ==> END OF FUNCTIONS <== | β
β ---------------------------------- β
ββββββββββββββββββββββββββββββββββββββ
from synocr.
Nice π
synOCR write the current version to β¦/python3_env/synOCR_python_env_version
Every time synOCR is started, the saved version is compared with the installation version and if there is a discrepancy, the Python environment is updated. For some reason this check does not seem to work reliably. But I have not found the error yet.
from synocr.
Can you run a file with debug mode (loglevel 2), please?
from synocr.
Related Issues (20)
- Default Profil zurΓΌckgesetzt nach LΓΆschen eines anderen Profils HOT 4
- Feature: Detect and delete blank pages from scans HOT 26
- Folderpicker ΓΌber GUI realisieren HOT 4
- diverse Listboxen durch Schalter oder Checkbox ersetzen HOT 13
- YAML-Regeln / Tagverwaltung ΓΌbersichtlicher gestalten
- SensitivitΓ€t der OCR Erkennung HOT 3
- Datumserkennung (Python) mit zweistelliger Jahreszahl HOT 12
- GUI: Parameter per JS mit Drag&Drop zusammenstellen HOT 2
- KompatibilitΓ€t fΓΌr DomΓ€nenuser ([Synology] Directory Server) HOT 2
- Wie stelle ich die Sprache der OCR-Engine ein β damit sie diakritische Zeichen korrekt erkennt? HOT 4
- verbesserte Benachrichtigungen
- Falsche Download Version! HOT 1
- Falsche Download Version fΓΌr DSM7 HOT 2
- QR Code parsen und mit enthaltenen Werten synOCR Workflow beeinflussen
- New setup not working HOT 4
- Dokument automatisch aufteilen (split) HOT 9
- synOCR does not yet have the necessary permissions HOT 1
- keine Verarbeitung von Scans mehr nach Update von 1.4.1 auf 1.4.5 HOT 5
- Metadata-Datei im Ausgabe-Ordner HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from synocr.