Code Monkey home page Code Monkey logo

rtf-html-php's People

Contributors

bnd170 avatar bretto36 avatar cyrosy avatar datashaman avatar elvis-epx avatar felixkiss avatar hc0503 avatar henck avatar juliuspc avatar lbm-services avatar leshana avatar lm-cmxkonzepte avatar rafaelapuka avatar rsaalund avatar sipryan avatar vmrfriz avatar zonuexe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rtf-html-php's Issues

Cyrilic chars supporting

hi!
after convert RTF to HTML i have:
...<span style='font-size: 16px;'>&icirc;</span><span style='font-size: 16px;'> </span><span style='font-size: 16px;'>&aring;</span><span style='font-size: 16px;'>&ecirc;</span><span style='font-size: 16px;'>&icirc;</span><span style='font-size: 16px;'>&iacute;</span><span style='font-size: 16px;'>&icirc;</span><span style='font-size: 16px;'>&igrave;</span><span style='font-size: 16px;'>&sup3;</span><span style='font-size: 16px;'>&divide;</span><span style='font-size: 16px;'>&iacute;</span>....

and this code displayed with wrong chars like:
â¿õäè âñ³èì ð³âí³â. åðíó

if text start with bold, fail

When a text start with bold, the text is show as normal font

this is the raw Text

{\rtf1\ansi\ansicpg1252\deff0\deflang3082{\fonttbl{\f0\fnil\fcharset0 Courier New;}}
\viewkind4\uc1\pard\b\f0\fs17 VIT.O.BEST WHEY PROTEIN 100% 4LB\b0 . dentro del mundo de los suplementos deportivos es posiblemente es la mejor prote''edna en cuanto a calidad/precio del re''f1ido mercado de la nutrici''f3n deportiva.\par
\par
\par
\b VIT.O.BEST WHEY PROTEIN 100% 24B\b0 . es una prote''edna perfecta para el uso diario, que adem''e1s tiene una buena capacidad de absorci''f3n y una velocidad de absorci''f3n media.\par
\par
\par
\b VIT.O.BEST WHEY PROTEIN 100%\b0 tiene una gran variedad de sabores deliciosos y muy suaves perfectos para el uso diario sin que terminen siendo empalagosos o \b aburridos. WHEY PROTEIN 100%\b0 est''e1 disponible en sabor a chocolate, yogur de lim''f3n , fresa, crema de caf''e9 y vainilla. \b VIT.O.BEST WHEY PROTEIN 100% \b0 es una prote''edna muy pura que no contiene az''facar y est''e1 totalemente libre de aspartamo.\par
}

this is the dump

{
WORD rtf (1)
WORD ansi\ansicpg (1252)
WORD deff (0)
WORD deflang (3082)
WORD viewkind (4)
WORD uc (1)
WORD pard\b\f (0)
WORD fs (17)
TEXT VIT.O.BEST WHEY PROTEIN 100% 4LB
WORD b (0)
TEXT . dentro del mundo de los suplementos deportivos es posiblemente es la mejor prote
SYMBOL ' (237)
TEXT na en cuanto a calidad/precio del re
SYMBOL ' (241)
TEXT ido mercado de la nutrici
SYMBOL ' (243)
TEXT n deportiva.
WORD par (1)
WORD par (1)
WORD par (1)
WORD b (1)
TEXT VIT.O.BEST WHEY PROTEIN 100% 24B
WORD b (0)
TEXT . es una prote
SYMBOL ' (237)
TEXT na perfecta para el uso diario, que adem
SYMBOL ' (225)
TEXT s tiene una buena capacidad de absorci
SYMBOL ' (243)
TEXT n y una velocidad de absorci
SYMBOL ' (243)
TEXT n media.
WORD par (1)
WORD par (1)
WORD par (1)
WORD b (1)
TEXT VIT.O.BEST WHEY PROTEIN 100%
WORD b (0)
TEXT tiene una gran variedad de sabores deliciosos y muy suaves perfectos para el uso diario sin que terminen siendo empalagosos o
WORD b (1)
TEXT aburridos. WHEY PROTEIN 100%
WORD b (0)
TEXT est
SYMBOL ' (225)
TEXT disponible en sabor a chocolate, yogur de lim
SYMBOL ' (243)
TEXT n , fresa, crema de caf
SYMBOL ' (233)
TEXT y vainilla.
WORD b (1)
TEXT VIT.O.BEST WHEY PROTEIN 100%
WORD b (0)
TEXT es una prote
SYMBOL ' (237)
TEXT na muy pura que no contiene az
SYMBOL ' (250)
TEXT car y est
SYMBOL ' (225)
TEXT totalemente libre de aspartamo.
WORD par (1)
}

RTF Truncated

Hi,
I'm having some problems when I call $document = new Document($evolution);
one Exception error is threw:
Notice: Parse error: Tried to read past end of input; RTF is probably truncated. in C:\xampp\htdocs\App_prontuario\RtfHtmlPhp\Document.php on line 29
I'm making a query in a database and I take the return query and I put in a variable how you can see below:
`<?php

session_start();
require '../../vendor/autoload.php';

use Classes\Pacient\PacientEvolution\PacientEvolution;

$pacientRegistry = intval($_GET['prontuario']);


$pacientEvolution = new PacientEvolution();
$pacientEvo = $pacientEvolution->findPacientEvolution($pacientRegistry);

$evolution = null;

foreach ($pacientEvo as $key => $value) {
	if ($pacientEvo[$key]['EVOLUCAO']) {
		$evolution = $pacientEvo[$key]['EVOLUCAO'];
	}

}

use RtfHtmlPhp\Document;
$document = new Document($evolution);`

But I don't know what is happening, can someone help me? I'm for 3 days looking for some solution to show the RTF content but I can't do yet.

List color

Hello,

There is a problem with the lists, the color is lost in HTML output.
Here is an example of a list with all the blue text in RTF:

{\rtf1\ansi\ansicpg1252\deff0\deflang1036{\fonttbl{\f0\fnil\fprq2\fcharset0 Calibri;}{$ {\colortbl ;\red0\green112\blue192;} {\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\cf1\f0\fs24 Test list:\par \pard\fi-360\li720 -\tab Item1\par -\tab Item2\par -\tab Item3\par \pard\cf0\b\f1\par }

And here is the HTML output (we notice that the list is in black instead of blue):

<p><span style="font-family:Calibri;font-size:16px;color:#0070c0;">Test list:</span></p>
<p><span style="font-family:Calibri;">-&nbsp;Item1</span></p>
<p><span style="font-family:Calibri;">-&nbsp;Item2</span></p>
<p><span style="font-family:Calibri;">-&nbsp;Item3</span></p><br>

Thank you in advance for your help.

only first underline is working

i got an issue with a text that contains more than one word or phrase that's underlined. the first one gets displayed fine, but the second (or third) time an underlining is used is not working.

example:
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil Microsoft Sans Serif;}{\f1\fnil\fcharset0 Microsoft Sans Serif;}} {\colortbl ;\red0\green0\blue0;} \viewkind4\uc1\pard\lang1031\f0\fs17 Filterdichtung f\f1\'fcr Jupiter Gebl\'e4seeinheit \par \par \cf1\ul\b\'dcberschrift 1 - fett + unterstrichen:\cf0\ulnone\b0\par \par Test text 1\par \par \cf1\ul\b\'dcberschrift 2 - fett + unterstrichen:\cf0\ulnone\b0\par \par Test text 2\par \par \cf1\ul\b\'dcberschrift 3 - fett + unterstrichen:\cf0\ulnone\b0\par \par Test text 3\f0\par }

Überschrift 1 - fett + unterstrichen:

is underlined (and bold) nicely.

Überschrift 2 - fett + unterstrichen:

is just bold, but not underlined.

thanks for your help!

Issues with special characters appearing

After the changes to add support for unicode encoding were added there have been a number of issues i've been facing using the package.

I have attached the file of one RTF file that i can't seem to get to convert well. It adds 2 Bullets as i think we aren't handling the fact that RTF was partially unicode compatible. So you could specify an ansi char AND a unicode char. I think the package is interpreting both of them instead of using only 1.

http://www.biblioscape.com/rtf15_spec.htm - search for Control word \uN

An RTF writer, when it encounters a Unicode character with no corresponding ANSI character, should output \uN followed by the best ANSI representation it can manage. Also, if the Unicode character translates into an ANSI character stream with count of bytes differing from the current Unicode Character Byte Count, it should emit the \ucN keyword prior to the \uN keyword to notify the reader of the change.

Any help would be greatly appreciated @mhtamala @henck

test.zip

Image Not Work

my document has a image, but image can't convert to html code. please help me

throw new Exception in Document on line 30

We have throw new Exception in Document.php but PHP 7.2 or more with namespace neeed.

$err = "Parse error: Tried to read past end of input; RTF is probably truncated.";
trigger_error($err);
throw new \Exception($err);

Diacritics not displayed correctly

Hello,
I tried the code, but it does not translate diacritics to UTF-8 (or any other codeset). Example:

Original: Mnoho věcí nechápeme, mnoha věcí se lekáme.

Translated: Mnoho v◊'3fc◊'3f nech◊'3fpeme, mnoha v◊'3fc◊'3f se lek◊'3fme.

I have used an invese funcion for RTF generation with success - see the function below, probably it would be nice to include inverted function (eg. rtf_to_utf8) in your project:

function utf8_to_rtf($utf8_text) {
$utf8_patterns = array(
"[\xC2-\xDF][\x80-\xBF]",
"[\xE0-\xEF][\x80-\xBF]{2}",
"[\xF0-\xF4][\x80-\xBF]{3}",
);
$new_str = $utf8_text;
foreach($utf8_patterns as $pattern) {
$new_str = preg_replace("/($pattern)/e",
"'\u'.hexdec(bin2hex(mb_convert_encoding('$1', 'UTF-16', 'UTF-8'))).'?'",
$new_str);
}
return $new_str;
}

I am not really familiar with UTF-16, so I am not able to suggest an exact working patch right now :-(

not translate all text

I trying translate this text to html:

{\rtf1\ansi\ansicpg1252\deff0\deflang1046{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}}
\viewkind4\uc1\pard\f0\fs20 D
\par BEM SEM QUEIXAS , MELHORA DA CEFALEIA, DE F'c9RIAS
\par COSNCIENTE, ORIENTADA, MOVENDO 4 MEMBROS, SEM PLEGIAS
\par
\par RNM: CISTO PINEAL 1,0X0,7 CM
\par
\par CD: ORIENTA'c7'd5ES + RETORNO +/- 3 MESES COM NOVA RNM PROGRAMADA PARA 6 MESES
\par
\par
\par 20/05/16 D
\par PACIENTE COM MELHORA DA CEFALEIA SE ORGAMIZANDO MELHOR NO ESTUDOS
\par CONSCIENTE, ORIENTADA, MOVENDO 4 MEMBROS, SEM PLEGIAS[
\par
\par CD: ORIENTA'c7'd5ES + ATIVIDADE F'cdSICA
\par }

But result is:

tf1 onttbl 0 nil charset0 MS Sans Serif;�iewkind4 0 s20 D
BEM SEM QUEIXAS , MELHORA DA CEFALEIA, DE FÉRIAS

COSNCIENTE, ORIENTADA, MOVENDO 4 MEMBROS, SEM PLEGIAS

RNM: CISTO PINEAL 1,0X0,7 CM

CD: ORIENTAÇÕES + RETORNO +/- 3 MESES COM NOVA RNM PROGRAMADA PARA 6 MESES

20/05/16 D

PACIENTE COM MELHORA DA CEFALEIA SE ORGAMIZANDO MELHOR NO ESTUDOS

CONSCIENTE, ORIENTADA, MOVENDO 4 MEMBROS, SEM PLEGIAS[

CD: ORIENTAÇÕES + ATIVIDADE FÍSICA

The first line is broken :/

Fatal error: Allowed memory size

I think it's because I have a big file and your table is full ("$this-> group-> children" in the function "ParseControlWord").
Do you think you can handle the big guys?

image

New release needed

Hi,
Please raise the version number to use a version that includes commit e4cff76. Currently composer is installing the version that causes a fatal error.

Conversion always returns "1"

Hello,
I´m doing some trials with this simple code

require("rtf2html.php");
$reader = new RtfReader();
$rtf = file_get_contents("temp/cs.rtf");
if($cb = $reader->Parse($rtf))
echo $cb;

But I´m always getting "1" as Parse($rtf) response.
Curiously if I and a $reader->root->dump();
then I can see the rtf tree on screen...

What do you think that It would be happening with this?

Thank you so much for your help.

Conversion of escaped characters is wrong

The conversion of escaped characters is wrong.

For example I have the following RTF:

{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1031{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 6.3.9600}\viewkind4\uc1 
\pard\sa200\sl276\slmult1\f0\fs22\lang7 Hello \{World\}\par
}

The HTML snippet looks like this:

<span style="font-size: 19px;">Hello {{World}}</span><p>

But the correct result would be:

<span style="font-size: 19px;">Hello {World}</span><p>

And it's the same with other characters, for example the backslash:

{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1031{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 6.3.9600}\viewkind4\uc1 
\pard\sa200\sl276\slmult1\f0\fs22\lang7 C:\\Windows\par
}

Results in a string with double backslash:

<span style="font-size: 19px;">C:\\Windows</span><p>

Output just plain text

This is a really useful library, thank you for the hard work!

Is it possible to just output plaintext only rather than html formatted?

Thanks

line break corruption

There's a line break issue related to missing break statement in HtmlFormatter.php at line 328:
currently:
case 'line': $this->output .= "<br/>";// character value (line feed = ) (carriage return = )
should be:
case 'line': $this->output .= "<br/>"; break;// character value (line feed = ) (carriage return = )

Is this support chinese unicode?

Hi all,

Thanks! It is working perfectly to get the English wordings but it is not working when the RTF contains Chinese characters which are being store in unicode.

Here is my code:
$rtf = '{\rtf1\ansi\ansicpg1252\uc0\deff0{\fonttbl {\f0\fswiss\fcharset0\fprq2 Arial;} {\f1\fnil\fcharset0\fprq2 SimSun;} {\f2\froman\fcharset2\fprq2 Symbol;}} {\colortbl;\red0\green0\blue0;\red255\green255\blue255;} {\stylesheet{\s0\itap0\f0\fs24 [Normal];}{*\cs10\additive Default Paragraph Font;}} {*\generator TX_RTF32 11.0.401.501;} \deftab1134\paperw11907\paperh16443\margl567\margt567\margr567\margb567\pard\itap0\plain\f1\fs20\loch\f1\hich\f1\u20320\u22909\u21527\par }';

$result = $reader->Parse($rtf);
$formatter = new RtfHtml();
$test = $formatter->Format($reader->root);

and it give me this result:
◊u22909◊par

I am expecting to get the result of \u20320\u22909\u21527\ which I can then translated it back to Chinese character.

Is there any one here have similar issue and what is the solution?

Cheers,
Jack

Bad conversion in this case

RTF Translation is wrong in this case:

`<?php
error_reporting(-1);
ini_set('display_errors', 'on');
require DIR . '/vendor/autoload.php';

use RtfHtmlPhp\Document;
use RtfHtmlPhp\Html\HtmlFormatter;

$original = "{\rtf1\ansi\ansicpg1252\deff0{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}\viewkind4\uc1\pard\lang1036\f0\fs16 m'e9lange ma'efs fran'e7ais\par}";
$document = new Document($original); // or use a string directly
$formatter = new HtmlFormatter('UTF-8');
$r = $formatter->Format($document);
file_put_contents('rtf.html', $r);
echo $r;
?>`

Result:

tf1 onttbl 0 nil charset0 Microsoft Sans Serif;�iewkind4 0 s16 mélange maïs français

Expected result:
Something like "Mélange maïs français"

Thanks,

span between special Chars

the converter is great. Very useful thanks for sharing.

One Issue I've been struggling with is...when converting special chars from RTF to HTML e.g.
(RTF text)
Aufbaut**'fc**re\par (bold is the special char in hex)
the converted Text looks like this:
<span....>Aufbaut
ü
re

Converting back to RTF from HTML this results following
Aufbau_'fc_re ( underlines equal whitespaces)

is there any possible solution to change and get the whole word with special chars in one span instead of splitting to multipile spans?

Parse error: RTF text outside of group

I am getting this error after my rtf document is converted and I'm moving on to a new document to convert

$document = new \RtfHtmlPhp\Document($note_object->rtf);
$formatter = new \RtfHtmlPhp\Html\HtmlFormatter('UTF-8');
$plain_text = (strip_tags($formatter->Format($document)));

after I convert my rtf document to plain text, I'm then sending it through sftp, but when I move on to the next document to convert, it crashes with the message 'Parse error: RTF text outside of group'

I'm not sure what is causing this as I know that the object being send into the formatter is indeed RTF

More debug info:

Trace:
	Array
	(
	    [0] => Array
	        (
	            [args] => Array
	                (
	                    [0] => 1024
	                    [1] => Parse error: RTF text outside of group.
	                    [2] => /var/www/html/mysite/vendor/henck/rtf-to-html/src/Document.php
	                    [3] => 286
	                    [4] => Array
	                        (
	                            [text] => RtfHtmlPhp\Text Object
	                                (
	                                    [text] => This was created by the edito
	                                )

	                            [terminate] =>
	                            [err] => Parse error: RTF text outside of group.
	                        )

	                )

	        )

	    [1] => Array
	        (
	            [file] => /var/www/html/mysite/vendor/henck/rtf-to-html/src/Document.php
	            [line] => 286
	            [function] => trigger_error
	            [args] => Array
	                (
	                    [0] => Parse error: RTF text outside of group.
	                )

	        )

	    [2] => Array
	        (
	            [file] => /var/www/html/mysite/vendor/henck/rtf-to-html/src/Document.php
	            [line] => 326
	            [function] => ParseText
	            [class] => RtfHtmlPhp\Document
	            [type] => ->
	            [args] => Array
	                (
	                )

	        )

	    [3] => Array
	        (
	            [file] => /var/www/html/mysite/vendor/henck/rtf-to-html/src/Document.php
	            [line] => 17
	            [function] => Parse
	            [class] => RtfHtmlPhp\Document
	            [type] => ->
	            [args] => Array
	                (
	                    [0] => This was created by the editor
	                )

	        )

	    [4] => Array
	        (
	            [file] => /var/www/html/mysite/private/Controllers/Jobs/Healthelink/Dischargesummary.php
	            [line] => 105
	            [function] => __construct
	            [class] => RtfHtmlPhp\Document
	            [type] => ->
	            [args] => Array
	                (
	                    [0] => This was created by the editor
	                )

	        )

Please extend the parser with text alignment, font color and images

Hi,

let me say first of all that you have done a great job. A nice little tool and it works without external services. I think that's really good!

You write that your parser only interprets the bare essentials that you need on a website. I agree with that as well.
What I would still miss, because I just use it frequently on websites, would be the following things:

  • Text alignment (left-aligned, centered, right-aligned).
  • Background and font color (I use it for example to highlight individual text passages).
  • Images (can easily be embedded as a binary string directly into the img tag).

Maybe you could think again about expanding your parser accordingly.

Greetz

Not converting single quotes and multiple dot operator

Hello,

Thanks for the great plugin.

I'm facing one issue that single quotes and multiple dot operator(eg : .......) is not converting. These operator showing as special character. here is the references

Original Content

image

After conversion

image

Thank you in advance.

No reading the empty cell of table in output as span

I have a .rtf file with a table of data . I can read all the cell and make an array after parsing.
according to this example it should be 14 total count of array but i am getting 13
I mean to say its not reading the empty cell of table. plz help me how can read this empty cell as well .

In empty cell there is  

`

Seq.

ID

Key

Scored

Num Options

Domain

Flags

1

1

C

 

4

ScienceG8V1

2

`

Storing return object as string

There does not seem to be a way to stored the html in a string or array, either by directly using or returning from a function. Not sure if it is related but also if I do an echo in a foreach loop i only get two of my 6 strings.

foreach($letter_para as $para){
$reader = new RtfReader();
$reader->Parse($para["ParagraphText"]);
$formatter = new RtfHtml();
//$reader->root->dump();
$html .= $formatter->Format($reader->root);
}

The dump does show data, again only for two of the 6.

Thanks for the effort and thank you for sharing!

Currency symbol not parsed correctly

I'm trying to convert some text which it looks like it has some currency symbols and it gets converted into the classic 'question mark symbol' usually seen for unrecognised characters.

This is my sample rtf text:

{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}{\f1\fnil\fcharset0 Times New Roman;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\f0\fs16 France - Paris 
\par Trip 1 \'8065 
\par Trip 2  \'8010  
\par Trip 3 \'8026 
\par }

It gets converted to:

<span style="font-size: 14px;">France - Paris 
</span><p>Trip1 �65 
</p>Trip 2  �10 
<p>Trip 3 �26 
</p>

Looking at an RTF guide it looks like the '80 should be the euro currency symbol (https://www.oreilly.com/library/view/rtf-pocket-guide/9781449302047/ch04.html).

Also I've noticed that not all lines get converted to paragraphs - not sure if that would be a problem with the rtf text itself or some conversion issue. The data comes imported from an API so I don't have much control over it so not sure whether it's a data issue or conversion.

Thanks

Parse into variables

Hi,

It is a really good HTML Parser but I've got a question. What if, instead of parsing into HTML, I wanted to parse the document into multiple php variables so I could re-use these ones for something else.
How could I do that ?

Thanks

Error installation via composer

Hello,
When I try to install via composer, I get this message :

Could not find package henck/rtf-to-html at any version for your minimum-stability (stable). Check the package spelling or your minimum-stability

What can I do ?

Erreor with this RTF char : \u233

Hi,

This my RTF string:
{\rtf1\deff0{\fonttbl{\f0 Calibri;}{\f1 Microsoft Sans Serif;}}{\colortbl ;\red0\green0\blue255 ;}{*\defchp \fs22}{\stylesheet {\ql\fs22 Normal;}{*\cs1\fs22 Default Paragraph Font;}{*\cs2\sbasedon1\fs22 Line Number;}{*\cs3\ul\fs22\cf1 Hyperlink;}{*\ts4\tsrowd\fs22\ql\tscellpaddfl3\tscellpaddl108\tscellpaddfb3\tscellpaddfr3\tscellpaddr108\tscellpaddft3\tsvertalt\cltxlrtb Normal Table;}{*\ts5\tsrowd\sbasedon4\fs22\ql\trbrdrt\brdrs\brdrw10\trbrdrl\brdrs\brdrw10\trbrdrb\brdrs\brdrw10\trbrdrr\brdrs\brdrw10\trbrdrh\brdrs\brdrw10\trbrdrv\brdrs\brdrw10\tscellpaddfl3\tscellpaddl108\tscellpaddfr3\tscellpaddr108\tsvertalt\cltxlrtb Table Simple 1;}}{*\listoverridetable}\nouicompat\splytwnine\htmautsp\sectd\pard\plain\ql{\f1\fs17\cf0 100% polyester r\u233'e9sistant Les coutures sont renforc\u233'e9es et les bords sont doubles}\f1\fs17\par}

I get this:
100% polyester r◊'e9sistant Les coutures sont renforc◊'e9es et les bords sont doubles

Instead of this:
100% polyester résistant Les coutures sont renforcées et les bords sont doubles

I think it's due to this rtf char not properly filtered "\u233"

I hope this report will help this amazing tool 👍 !!!

Returning false.

This is returning false.

$reader = new RtfReader();
$rtf = file_get_contents("example.rtf"); // or use a string
$result = $reader->Parse($rtf);
echo "<pre>";
var_dump($result);

Any help?

Table conversion fails

I´m lost with this issue,

I´m trying to conver a RTF document (shown below) wich contains very simple talbes 4 columns and 3 rows and converted document only shows the text contained on table, rest of converted text is shown perfect.

I´ve tryied with different documents, and different tables, and even I´ve try to change document encoding, but result it´s always the same.

Any idea about what could be happening with this?

(by the way... on root dump seems that cell and row elements are present)

Thank you so much for your code and help

____ RTF code ___

{\rtf1\ansi\deff0\adeflang1025
{\fonttbl{\f0\froman\fprq2\fcharset0 Times New Roman;}{\f1\froman\fprq2\fcharset2 Symbol;}{\f2\fswiss\fprq2\fcharset0 Arial;}{\f3\fnil\fprq0\fcharset128 OpenSymbol{*\falt Arial Unicode MS};}{\f4\fnil\fprq2\fcharset0 Microsoft YaHei;}{\f5\fnil\fprq2\fcharset0 Lucida Sans;}{\f6\fswiss\fprq0\fcharset0 Lucida Sans;}}
{\colortbl;\red0\green0\blue0;\red255\green255\blue255;\red153\green153\blue153;\red128\green128\blue128;}
{\stylesheet{\s0\snext0\nowidctlpar{*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\aspalpha\ltrpar\cf0\kerning1\hich\af7\langfe2052\dbch\af5\afs24\lang1081\loch\f0\fs24\lang3082 Predeterminado;}
{*\cs15\snext15\hich\af3\dbch\af3\loch\f3 Vi?etas;}
{\s16\sbasedon0\snext17\sb240\sa120\keepn\hich\af4\dbch\af5\afs28\loch\f2\fs28 Encabezado;}
{\s17\sbasedon0\snext17\sb0\sa120 Cuerpo de texto;}
{\s18\sbasedon17\snext18\sb0\sa120\dbch\af6 Lista;}
{\s19\sbasedon0\snext19\sb120\sa120\noline\i\dbch\af6\afs24\ai\fs24 Etiqueta;}
{\s20\sbasedon0\snext20\noline\dbch\af6 Índice;}
{\s21\sbasedon0\snext21\noline Contenido de la tabla;}
{\s22\sbasedon21\snext22\qc\noline\b\ab Encabezado de la tabla;}
}{*\listtable{\list\listtemplateid1
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u8226 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li720}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9702 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li1080}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9642 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li1440}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u8226 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li1800}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9702 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li2160}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9642 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li2520}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u8226 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li2880}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9702 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li3240}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9642 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li3600}\listid1}
{\list\listtemplateid2
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-432\li432}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-576\li576}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-720\li720}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-864\li864}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1008\li1008}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1152\li1152}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1296\li1296}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1440\li1440}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1584\li1584}\listid2}
}{\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}{\listoverride\listid2\listoverridecount0\ls2}}{\info{\creatim\yr2019\mo6\dy13\hr11\min46}{\revtim\yr2019\mo6\dy13\hr11\min46}{\printim\yr0\mo0\dy0\hr0\min0}{\comment OpenOffice}{\vern4150}}\deftab709

{*\pgdsctbl
{\pgdsc0\pgdscuse195\pgwsxn11906\pghsxn16838\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\pgdscnxt0 Predeterminado;}}
\formshade\paperh16838\paperw11906\margl1134\margr1134\margt1134\margb1134\sectd\sbknone\sectunlocked1\pgndec\pgwsxn11906\pghsxn16838\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\ftnbj\ftnstart1\ftnrstcont\ftnnar\aenddoc\aftnrstcont\aftnstart1\aftnnrlc
\pgndec\pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse nisi tortor, tristique cursus viverra non, suscipit sed lorem. Ut tincidunt pharetra neque, at dignissim lorem mollis id. Nullam sodales facilisis dolor, et lacinia neque vulputate nec.}
\par \pard\plain \s17\sb0\sa120{\ul\ulc0\b\ab\rtlch \ltrch\loch\lang3082
Aenean nulla augue, finibus id tortor eget, pellentesque scelerisque sapien}{\rtlch \ltrch\loch\lang3082
. Vivamus facilisis bibendum erat, eu fringilla sapien vestibulum sit amet. In purus ante, mollis sit amet leo id, }
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\b\ab\rtlch \ltrch\loch\lang3082
Aliquam quis augue at turpis eleifend consequat sit amet quis sem. Proin sed augue sagittis, gravida ipsum quis, accumsan velit. Sed laoreet hendrerit interdum. Proin blandit interdum orci sed faucibus. Donec in eleifend neque. Etiam auctor orci at eros aliquam ornare vitae sed eros}{\rtlch \ltrch\loch\lang3082
. Aliquam pulvinar, urna sed sollicitudin euismod, velit magna sagittis massa, vel eleifend leo tortor ac nunc. Sed sit amet est felis. Sed mauris enim, auctor et erat finibus, placerat iaculis ex. }
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\listtext\pard\plain \hich\af3\dbch\af3\loch\f3 '95\tab}\ilvl0\ls1 \li720\ri0\lin720\rin0\fi-360{\rtlch \ltrch\loch\lang3082
A}
\par \pard\plain \s17\sb0\sa120{\listtext\pard\plain \hich\af3\dbch\af3\loch\f3 '95\tab}\ilvl0\ls1 \li720\ri0\lin720\rin0\fi-360{\rtlch \ltrch\loch\lang3082
B}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
lkjlkjlkjlkjlkj d'f1lskfj d'f1lfkjd sf'f1lkdsj fl'f1dksjf dsl'f1kfj ds'f1lfkjs aflkasdj flksdj fsdl'f1kfj sdlkf jsdlkf jsdl'f1kf jsdlkf'f1 jsdlfk sdjf'f1lksdjflk}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \trowd\trql\ltrrow\trpaddft3\trpaddt55\trpaddfl3\trpaddl55\trpaddfb3\trpaddb55\trpaddfr3\trpaddr55\clbrdrt\brdrs\brdrw1\brdrcf1\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clcbpat3\cellx2409\clbrdrt\brdrs\brdrw1\brdrcf1\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clcbpat3\cellx4818\clbrdrt\brdrs\brdrw1\brdrcf1\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clcbpat3\cellx7227\clbrdrt\brdrs\brdrw1\brdrcf1\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clbrdrr\brdrs\brdrw1\brdrcf1\clcbpat3\cellx9636\pard\plain \s21\noline\intbl{\cf2\rtlch \ltrch\loch\lang3082
SSS}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
GGG}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
DDD}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
EEEE}\cell\row\trowd\trql\ltrrow\trpaddft3\trpaddt55\trpaddfl3\trpaddl55\trpaddfb3\trpaddb55\trpaddfr3\trpaddr55\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx2409\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx4818\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx7227\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clbrdrr\brdrs\brdrw1\brdrcf1\cellx9636\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
AAA}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
aaaaabbbccc}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
MmmmNNN}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
ddd}\cell\row\trowd\trql\ltrrow\trpaddft3\trpaddt55\trpaddfl3\trpaddl55\trpaddfb3\trpaddb55\trpaddfr3\trpaddr55\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx2409\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx4818\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx7227\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clbrdrr\brdrs\brdrw1\brdrcf1\cellx9636\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
AAA}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
Aaaaayy}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
eyyy}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
eyyyy}\cell\row\trowd\trql\ltrrow\trpaddft3\trpaddt55\trpaddfl3\trpaddl55\trpaddfb3\trpaddb55\trpaddfr3\trpaddr55\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx2409\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx4818\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx7227\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clbrdrr\brdrs\brdrw1\brdrcf1\cellx9636\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
AAA}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
AAyy}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
yyye}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
eyyy}\cell\row\pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120\sb0\sa120{\rtlch \ltrch\loch\lang3082
Maecenas tempor sapien a est scelerisque sagittis. Nam ut congue augue, ut bibendum nisl. Vivamus ullamcorper ex ac faucibus pulvinar. Mauris ante nulla, hendrerit vitae congue sit amet, facilisis a justo. Suspendisse imperdiet lectus enim, ac luctus mauris efficitur eget. Aliquam sagittis }
\par }

Real world example of rtf not being touched by this

I am finding it mostly does not work. Here is an example:

{ tf1ansideff0{fonttbl{f0fnilfcharset0 Arial;}{f1fnil Arial;}} viewkind4uc1pardlang2057fs18 Handbuilt bowl with large crab in blacks and yellowsf1par }

Running that through your routine results in

tf1ansideff0fonttblf0fnilfcharset0 Arial;f1fnil Arial;viewkind4uc1pardlang2057fs18 Handbuilt bowl with large crab in blacks and yellowsf1par

whereas what is needed is

Handbuilt bowl with large crab in blacks and yellows

Too many span tags - every(!) umlaut wrapped in span tag

Hi,
the conversion works principally fine, but produces ugly html with lots of span tags where none were required, actually EVERY umlaut (ö/ä/ü) is wrapped with span tags!

e.g.

raffte mehr als die H</span><span style='font-size: 14px;'>&auml;</span><span style='font-size: 14px;'>lfte der Bev</span><span style='font-size: 14px;'>&ouml;</span><span style='font-size: 14px;'>lkerung dahin.

here is my rtf source:

raffte mehr als die H\'e4lfte der Bev\'f6lkerung dahin. 

2.Problem
u8211 should be converted to something like &ndash; (–) or &mdash; (—) but is converted to &loz; (◊)

beklagt sich \u8211 - f\'fcr deutsche Leser
beklagt sich </span>&loz;<span style='font-size: 14px;'>- f</span><span style='font-size: 14px;'>&uuml;</span><span style='font-size: 14px;'>r deutsche Leser

Undefined property: RtfHtml::$output

ErrorException : Undefined property: RtfHtml::$output
C:\Users\brett\workspace\feesynergy\vendor\henck\rtf-to-html\rtf-html-php.php:957

{\\rtf1\\deff0{\\fonttbl{\\f0 Times New Roman;}{\\f1 MS Sans Serif;}}{\\colortbl\\red0\\green0\\blue0 ;\\red0\\green0\\blue255 ;}{\\*\\defchp \\f1\\fs20}{\\*\\listoverridetable}{\\stylesheet {\\ql\\f1\\fs20\\cf0 Normal;}{\\*\\cs1\\f1\\fs20\\cf0 Default Paragraph Font;}{\\*\\cs2\\sbasedon1\\f1\\fs20\\cf0 Line Number;}{\\*\\cs3\\ul\\f1\\fs20\\cf1 Hyperlink;}}\\splytwnine\\htmautsp\\sectd\\pard\\plain\\ql{\\b\\f1\\fs20\\cf0 Professional Fees}\\b\\f1\\fs20\\cf0\\par\\pard\\plain\\ql{\\f1\\fs20\\cf0 As per contractual agreement.}\\f1\\fs20\\cf0\\par}

Update Packagist entry

Hello @bretto36

I see that the composer package for this project is maintained by you. I'd like to update it to reflect the latest commits, but you are now managing the vendor name "henck".

Also, the package name on Packagist is "rtf-to-html", while it should be "rtf-html-php", to conform to the name of this Github repository.

What can be done? Can you make me a maintainer of the package, as well?

Fatal error

Just noticed that your exception in Document is missing either a use or a \

My application just blew up with
Uncaught Error: Class 'RtfHtmlPhp\Exception' not found in .../vendor/henck/rtf-to-html/src/Document.php:30

Should be

throw new \Exception($err);

Composer

Could't install the project using composer, give me this error:

[InvalidArgumentException] Could not find package henck/rtf-to-html at any version for your minimum-stability (stable). Check the package spelling or your minimum-stability

I'm using this command: 'composer require henck/rtf-html-php'

Intermittent error at line 391

If $group is null (I don't know how it can be but it can) this causes a fatal error.

I have added lines as follows to make it work:

if (!isset($group))
      return;

HYPERLINK support

Anyone care to add support for hyperlinks?

E.g.
from RTF: {\field{\*\fldinst HYPERLINK "http://www.google.com/"}{\fldrslt search}}
to HTML: <a href="http://www.google.com/">search</a>

Cheers

ul / li are KO

Hi,

this is a RTF sample:

{\rtf1\ansi\ansicpg1252\deff0\deflang1036{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}{\f1\fnil\fcharset2 Symbol;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\f0\fs17 SAUT DE LIGNE\par
SAUT DE LIGNE\par
\b GRAS\b0
\i ITALIQUE\i0 \ul SOULIGNE \cf1\ulnone\strike BARRE\cf0\strike0\par
\pard{\pntext\f1'B7\tab}{\pn\pnlvlblt\pnf1\pnindent0{\pntxtb'B7}}PUCE\par
{\pntext\f1'B7\tab}PUCE\par
{\pntext\f1'B7\tab}PUCE\par
\pard\ul\b\i\strike MELANGE\par
\par
\par
\pard{\pntext\f1'B7\tab}{
\pn\pnlvlblt\pnf1\pnindent0{\pntxtb'B7}}\cf1\ulnone\b0\i0\strike0 bla bla test\par
{\pntext\f1'B7\tab}essai\cf0\par
}

Before each "PUCE", i should get an

  • tag or a span bullet.

    Great tools :)

  • Testing rtf before parsing

    Is there some hidden method of checking is RTF is valid before parsing it? I am running into issues with flooding my error handler with notices, as my third party RTF provider (an old system) is handing over badly formatted RTF.

    It would be great if I could to something like this

    $document = new Document();
    if ($document->valid($rtf)) {
        $parsed = $document->Parse($rtf);
    }
    

    Also it would be nice if Document didn't both trigger a trigger_error as well throw an exception. This will hand off the error twice to my error handler (well only once if I catch it). An exception should be enough I would imagine? :)

    RTF Strikethrough text not showing as striken in HTML

    Awesome tool you've built here. Im having great success with it, just wondering why one thing isnt quite working correctly... in the following RTF the text that has strike through applied to it ( for a work project with ) is NOT shown as HTML text thats striken... but I can see in the HTML that the css line Is there:

    text-decoration: strikethrough;

    but it looks like the css property to strike text should be:

    text-decoration: line-through

    `{\rtf1\ansi\ansicpg1252\cocoartf1504\cocoasubrtf600
    {\fonttbl\f0\fswiss\fcharset0 Helvetica;}
    {\colortbl;\red255\green255\blue255;\red255\green0\blue0;\red0\green45\blue153;}
    {*\expandedcolortbl;\csgray\c100000;\csgenericrgb\c100000\c0\c0;\csgenericrgb\c0\c17647\c60000;}
    \margl1440\margr1440\vieww10800\viewh8400\viewkind0
    \pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0

    \f0\fs36 \cf0 This is a test
    \fs24
    \b \ul RTF file
    \b0 \ulnone that I need to
    \i \cf2 covert to HTML
    \i0 \cf0 f\strike \strikec0 or a work project with\strike0\striked0
    \b \cf3 some coworkers.}`

    Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.