Code Monkey home page Code Monkey logo

Comments (2)

christian-vigh-phpclasses avatar christian-vigh-phpclasses commented on August 13, 2024

Hello Philipp,

I intentionnally left an exception here because the pdf file format is so
tricky regarding page description that I was sure that one day I would
encounter a case like yours.

For your curiosity, there is a triple indirection in the way which text
objects are contained in which pages :

  •      Object #x contains a keyword that specifies a certain number of
    

    objects y1, �., yn1

  •      Each object y1, �, y1 references objects z1, �, zn2. These are
    

    the contents for one page

  •      In turn, each object z1, � zn2 lists the object that contain the
    

    text drawing instructions to draw a part of the page

And you can even find pdf files without any page description at all ! this
is the case for example of the official Adobe PDF Specification document�

I suspect that your pdf samples have a little inconsistency ; they say that
the page contents for one page are described by 32 objects, while only 6 are
referenced. This may be due to a bug in the application that generated it
but if this is the case, pdf readers need to be highly tolerant so I will
change my class accordingly.

Regarding issue #2 (the repeating error) , I suspect that I need to add a
check somewhere.

Ok, I�ll put that in my bug tracking system.

Christian.


De : phisu [mailto:[email protected]]
Envoyé : mercredi 27 juillet 2016 11:00
À : christian-vigh-phpclasses/PdfToText
Objet : [christian-vigh-phpclasses/PdfToText] error by the /Count parameter
(#8)

hello christian.

i get an error concerning page count. i did:

$pdf = new PdfToText ($filename) ;
echo $pdf->Text;

and i got the following error:

Object #202 : Page count given by the /Count parameter (32) differs from the
actual number of objects referenced by the /Kids parameter (6).
PdfToText.php
545
512

the following files produces similar errors:

http://www.cleanclothes.at/media/common/uploads/download/cck-label-check/CCK
-LabelCheck_screen.pdf
*
https://www.uni-muenster.de/imperia/md/content/physikalische_chemie/praktiku
m/h_p_saetze.pdf
*
http://www.cleanclothes.at/media/common/uploads/download/cck-label-check/CCK
-LabelCheck_screen.pdf

and the same error on the following file. but a repeating error too:
http://www.umweltberatung.at/downloads/mehrweggetraenke-bezugsquellen-abfall
.pdf

Undefined offset: 1
/PdfToText.php
2115

philipp


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it
#8 on
GitHub, or mute
<https://github.com/notifications/unsubscribe-auth/ARM8ald5t9481fTbyYBQDRGHj
D61DH0Zks5qZx4igaJpZM4JV-mM> the thread.
<https://github.com/notifications/beacon/ARM8akIa8zNncDVJdBVHBpBtLWqwDOhXks5
qZx4igaJpZM4JV-mM.gif>


L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

from pdftotext.

christian-vigh-phpclasses avatar christian-vigh-phpclasses commented on August 13, 2024

Hello Philipp,

I corrected the repeating problem of « undefined offset 1 ». This was due to
an improper parsing of floating point numbers used for specifying
coordinates. A value such as « 0.12 » was recognized, while « .12 » was
discarded.

Regarding the warning (« Page count given by the /Count parameter�. »), your
samples made me discover that page maps could be nested, the top level page
map listing only objects describing further page maps and giving their total
count (yet another pdf surprise !).

I disabled this warning in non-debug mode ; I am not yet able to evaluate
whether the individual page contents extracted from your samples will be
correct ; however, I know that I have to modify the PdfTexterPageMap class
in my source to handle this new crazy situation. This is an issue I added to
my list of open issues�

Regarding the text positioning issues you reported me in another mail (with
extra spaces and extraneous line breaks) , don�t worry, I�m handling them in
a separate thread�

Christian.


De : phisu [mailto:[email protected]]
Envoyé : mercredi 27 juillet 2016 11:00
À : christian-vigh-phpclasses/PdfToText
Objet : [christian-vigh-phpclasses/PdfToText] error by the /Count parameter
(#8)

hello christian.

i get an error concerning page count. i did:

$pdf = new PdfToText ($filename) ;
echo $pdf->Text;

and i got the following error:

Object #202 : Page count given by the /Count parameter (32) differs from the
actual number of objects referenced by the /Kids parameter (6).
PdfToText.php
545
512

the following files produces similar errors:

http://www.cleanclothes.at/media/common/uploads/download/cck-label-check/CCK
-LabelCheck_screen.pdf
*
https://www.uni-muenster.de/imperia/md/content/physikalische_chemie/praktiku
m/h_p_saetze.pdf
*
http://www.cleanclothes.at/media/common/uploads/download/cck-label-check/CCK
-LabelCheck_screen.pdf

and the same error on the following file. but a repeating error too:
http://www.umweltberatung.at/downloads/mehrweggetraenke-bezugsquellen-abfall
.pdf

Undefined offset: 1
/PdfToText.php
2115

philipp


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it
#8 on
GitHub, or mute
<https://github.com/notifications/unsubscribe-auth/ARM8ald5t9481fTbyYBQDRGHj
D61DH0Zks5qZx4igaJpZM4JV-mM> the thread.
<https://github.com/notifications/beacon/ARM8akIa8zNncDVJdBVHBpBtLWqwDOhXks5
qZx4igaJpZM4JV-mM.gif>


L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus

from pdftotext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.