Code Monkey home page Code Monkey logo

Comments (13)

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Hi,
I have another file not working either for:
%PDF-1.5
%���
3 0 obj
<< /Length 4 0 R
/Filter /FlateDecode

stream
x���oo�Ǒ.��������`��{�����7�^l�kd�ͽ@����#E�E9$e��;�]U��穪$�E���.=U3���xH��׋���w�����wo�.������%�߮��C���*v��.�zs�_���]�����݋1�/�9�8�ݝ��u�~>߿{{�^��O��z1�W�&�_�<�8�y�R�Ws�a�[,G}wz����T�p���-�X�Hi�����}w9\��������?������{��������Mî�_���)�n ]��?����R��R�o��4NWC��4,�Kp��O7�燇�k����5������w��r�9�=�������u�<L\3^�aIi�� ����J]��Ͽ��?|��R����^��*��+1ī���X/zY�^}�^��������_��% ]�����^��������?�i�軛���W^�!\��tc�\MK�:���������ή˺����+�����ܜ�?}8x �u��d5/�����_�/5{�4Ҽ���S����>}�p�?�iJWSMh�����6MZ��<���淟�n�(a�{�$5�Ġ
��1�w��y��s���X^����
...

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Hi,
I tried to send a PDF-file to ([email protected]), but it bounced back.
Where can I send it?
Regards,
/Fredrik.

from pdftotext.

mjblacker avatar mjblacker commented on August 13, 2024

I'm happy to take a look at it if you can send it over.

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Faktura utan giro_627_Adapac AB.PDF

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

from pdftotext.

mjblacker avatar mjblacker commented on August 13, 2024

Thanks, the problem is the CID IDENTITY_H fonts.

With just using the unicode map on the font object you get around a third of the text out but the rest isn't mapped to characters properly.

I'm working on a change that will read CID font's CMAP which will hopefully make reading international PDF's much better.

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Hi,
Faktura-1587.pdf

This is not working either.
When do you think the change will be done?
Regards,
/Fredrik.

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Hi,
How is it going?
When do you think a solution can be available?

This file is not possible to read at all.
Faktura20541.pdf

I use this syntax.
The fist part is printed out, but if I do another printout after the function call,
it will not show.

    // Check the PDF-file for information
    $uri = $dir.'/'.$fileName;
    $pdf = new Pdf();
    //
    error_log(print_r(array(
        'uri' => $uri,
        'pdf' => $pdf,
        '' => ''
    ), true));
    **$pdfdata = $pdf->getPdfInfo($uri);**

Otherwise the tool is great.

Regards,
/Fredrik

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Hi,
Please, I need this urgently.
Can I atleast get an answer to when it is expected to be changed?
It is much appreciated :).
Regards,
/Fredrik.

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Hi,
Maybe I am not using the complete files?
I am using:
class/PdfToText.phpclass
class/Maps/adobe-charsets.map
class/Maps/unicode-to-ansi.map

Do I also need the CIDTables-directory like class/CIDTables/.?

Btw:
I tried adding libraries:
class/CIDTables
class/contributions
class/FontMetrics
class/FormTemplates to the class-library without any effect.

from pdftotext.

mjblacker avatar mjblacker commented on August 13, 2024

Hey Fredrik,

We are still working out the best way to resolve the issues with CID fonts.

We've made a few changes to the fork on our github if you check that out you should get some information out of the PDF from the unicode map we process even with CID fonts.

No time-frame currently as this is very much a side project

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Hi and thanks alot,
It almost suits my purpose.
Can this be adjusted little more?
I seem to get part of the invoice, but not the part that I want.
Great otherwise.

I actually only needs 2 parameters from the PDF-files.
One is the number of pages and the second if the text in the PDF contains
Invoice (Faktura) or Creditinvoice (Kreditfaktura).
Can this be maintained somehow?

Yes, this is solving my problems for now.
Thank you very much :).

from pdftotext.

FredrikBrandt avatar FredrikBrandt commented on August 13, 2024

Hi again,
I am having problem with this type of invoice, is it because of the qr-code?
It doesn't even load anything.
This line of code will not run correctly:
$pdf = new PdfToText($uri);.

The dropzone will respond with:
Server responded with 0 code.

Can this be fixed?

Here is the invoice.
Faktura20541.pdf

from pdftotext.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.