henck / rtf-html-php Goto Github PK
View Code? Open in Web Editor NEWRTF to HTML converter in PHP
Home Page: http://www.independent-software.com
License: GNU General Public License v2.0
RTF to HTML converter in PHP
Home Page: http://www.independent-software.com
License: GNU General Public License v2.0
Are there plans to add listtext control word support?
Hello,
There is a problem with the lists, the color is lost in HTML output.
Here is an example of a list with all the blue text in RTF:
{\rtf1\ansi\ansicpg1252\deff0\deflang1036{\fonttbl{\f0\fnil\fprq2\fcharset0 Calibri;}{$ {\colortbl ;\red0\green112\blue192;} {\*\generator Msftedit 5.41.21.2510;}\viewkind4\uc1\pard\cf1\f0\fs24 Test list:\par \pard\fi-360\li720 -\tab Item1\par -\tab Item2\par -\tab Item3\par \pard\cf0\b\f1\par }
And here is the HTML output (we notice that the list is in black instead of blue):
<p><span style="font-family:Calibri;font-size:16px;color:#0070c0;">Test list:</span></p>
<p><span style="font-family:Calibri;">- Item1</span></p>
<p><span style="font-family:Calibri;">- Item2</span></p>
<p><span style="font-family:Calibri;">- Item3</span></p><br>
Thank you in advance for your help.
I trying translate this text to html:
{\rtf1\ansi\ansicpg1252\deff0\deflang1046{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}}
\viewkind4\uc1\pard\f0\fs20 D
\par BEM SEM QUEIXAS , MELHORA DA CEFALEIA, DE F'c9RIAS
\par COSNCIENTE, ORIENTADA, MOVENDO 4 MEMBROS, SEM PLEGIAS
\par
\par RNM: CISTO PINEAL 1,0X0,7 CM
\par
\par CD: ORIENTA'c7'd5ES + RETORNO +/- 3 MESES COM NOVA RNM PROGRAMADA PARA 6 MESES
\par
\par
\par 20/05/16 D
\par PACIENTE COM MELHORA DA CEFALEIA SE ORGAMIZANDO MELHOR NO ESTUDOS
\par CONSCIENTE, ORIENTADA, MOVENDO 4 MEMBROS, SEM PLEGIAS[
\par
\par CD: ORIENTA'c7'd5ES + ATIVIDADE F'cdSICA
\par }
But result is:
tf1onttbl0nilcharset0 MS Sans Serif;�iewkind40s20 D
BEM SEM QUEIXAS , MELHORA DA CEFALEIA, DE FÉRIASCOSNCIENTE, ORIENTADA, MOVENDO 4 MEMBROS, SEM PLEGIAS
RNM: CISTO PINEAL 1,0X0,7 CM
CD: ORIENTAÇÕES + RETORNO +/- 3 MESES COM NOVA RNM PROGRAMADA PARA 6 MESES
20/05/16 D
PACIENTE COM MELHORA DA CEFALEIA SE ORGAMIZANDO MELHOR NO ESTUDOS
CONSCIENTE, ORIENTADA, MOVENDO 4 MEMBROS, SEM PLEGIAS[
CD: ORIENTAÇÕES + ATIVIDADE FÍSICA
The first line is broken :/
Hello,
I´m doing some trials with this simple code
require("rtf2html.php");
$reader = new RtfReader();
$rtf = file_get_contents("temp/cs.rtf");
if($cb = $reader->Parse($rtf))
echo $cb;
But I´m always getting "1" as Parse($rtf) response.
Curiously if I and a $reader->root->dump();
then I can see the rtf tree on screen...
What do you think that It would be happening with this?
Thank you so much for your help.
Just noticed that your exception in Document is missing either a use
or a \
My application just blew up with
Uncaught Error: Class 'RtfHtmlPhp\Exception' not found in .../vendor/henck/rtf-to-html/src/Document.php:30
Should be
throw new \Exception($err);
Hello @bretto36
I see that the composer package for this project is maintained by you. I'd like to update it to reflect the latest commits, but you are now managing the vendor name "henck".
Also, the package name on Packagist is "rtf-to-html", while it should be "rtf-html-php", to conform to the name of this Github repository.
What can be done? Can you make me a maintainer of the package, as well?
There's a continue 2
in line 703 of file rtf-html-php.php
. This is a code smell and the function ExtractFontTable
should be refactored.
my document has a image, but image can't convert to html code. please help me
I am getting this error after my rtf document is converted and I'm moving on to a new document to convert
$document = new \RtfHtmlPhp\Document($note_object->rtf);
$formatter = new \RtfHtmlPhp\Html\HtmlFormatter('UTF-8');
$plain_text = (strip_tags($formatter->Format($document)));
after I convert my rtf document to plain text, I'm then sending it through sftp, but when I move on to the next document to convert, it crashes with the message 'Parse error: RTF text outside of group'
I'm not sure what is causing this as I know that the object being send into the formatter is indeed RTF
More debug info:
Trace:
Array
(
[0] => Array
(
[args] => Array
(
[0] => 1024
[1] => Parse error: RTF text outside of group.
[2] => /var/www/html/mysite/vendor/henck/rtf-to-html/src/Document.php
[3] => 286
[4] => Array
(
[text] => RtfHtmlPhp\Text Object
(
[text] => This was created by the edito
)
[terminate] =>
[err] => Parse error: RTF text outside of group.
)
)
)
[1] => Array
(
[file] => /var/www/html/mysite/vendor/henck/rtf-to-html/src/Document.php
[line] => 286
[function] => trigger_error
[args] => Array
(
[0] => Parse error: RTF text outside of group.
)
)
[2] => Array
(
[file] => /var/www/html/mysite/vendor/henck/rtf-to-html/src/Document.php
[line] => 326
[function] => ParseText
[class] => RtfHtmlPhp\Document
[type] => ->
[args] => Array
(
)
)
[3] => Array
(
[file] => /var/www/html/mysite/vendor/henck/rtf-to-html/src/Document.php
[line] => 17
[function] => Parse
[class] => RtfHtmlPhp\Document
[type] => ->
[args] => Array
(
[0] => This was created by the editor
)
)
[4] => Array
(
[file] => /var/www/html/mysite/private/Controllers/Jobs/Healthelink/Dischargesummary.php
[line] => 105
[function] => __construct
[class] => RtfHtmlPhp\Document
[type] => ->
[args] => Array
(
[0] => This was created by the editor
)
)
There does not seem to be a way to stored the html in a string or array, either by directly using or returning from a function. Not sure if it is related but also if I do an echo in a foreach loop i only get two of my 6 strings.
foreach($letter_para as $para){
$reader = new RtfReader();
$reader->Parse($para["ParagraphText"]);
$formatter = new RtfHtml();
//$reader->root->dump();
$html .= $formatter->Format($reader->root);
}
The dump does show data, again only for two of the 6.
Thanks for the effort and thank you for sharing!
Hi,
It is a really good HTML Parser but I've got a question. What if, instead of parsing into HTML, I wanted to parse the document into multiple php variables so I could re-use these ones for something else.
How could I do that ?
Thanks
Hi,
this is a RTF sample:
{\rtf1\ansi\ansicpg1252\deff0\deflang1036{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}{\f1\fnil\fcharset2 Symbol;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\f0\fs17 SAUT DE LIGNE\par
SAUT DE LIGNE\par
\b GRAS\b0
\i ITALIQUE\i0 \ul SOULIGNE \cf1\ulnone\strike BARRE\cf0\strike0\par
\pard{\pntext\f1'B7\tab}{\pn\pnlvlblt\pnf1\pnindent0{\pntxtb'B7}}PUCE\par
{\pntext\f1'B7\tab}PUCE\par
{\pntext\f1'B7\tab}PUCE\par
\pard\ul\b\i\strike MELANGE\par
\par
\par
\pard{\pntext\f1'B7\tab}{\pn\pnlvlblt\pnf1\pnindent0{\pntxtb'B7}}\cf1\ulnone\b0\i0\strike0 bla bla test\par
{\pntext\f1'B7\tab}essai\cf0\par
}
Before each "PUCE", i should get an
Great tools :)
hi!
after convert RTF to HTML i have:
...<span style='font-size: 16px;'>î</span><span style='font-size: 16px;'> </span><span style='font-size: 16px;'>å</span><span style='font-size: 16px;'>ê</span><span style='font-size: 16px;'>î</span><span style='font-size: 16px;'>í</span><span style='font-size: 16px;'>î</span><span style='font-size: 16px;'>ì</span><span style='font-size: 16px;'>³</span><span style='font-size: 16px;'>÷</span><span style='font-size: 16px;'>í</span>....
and this code displayed with wrong chars like:
â¿õäè âñ³èì ð³âí³â. åðíó
If $group is null (I don't know how it can be but it can) this causes a fatal error.
I have added lines as follows to make it work:
if (!isset($group))
return;
PHP Fatal error: Call to a member function GetType() on a non-object in C:....\lib\rtf-html-php\rtf-html-php.php on line 400
Anyone care to add support for hyperlinks?
E.g.
from RTF: {\field{\*\fldinst HYPERLINK "http://www.google.com/"}{\fldrslt search}}
to HTML: <a href="http://www.google.com/">search</a>
Cheers
Hi all,
Thanks! It is working perfectly to get the English wordings but it is not working when the RTF contains Chinese characters which are being store in unicode.
Here is my code:
$rtf = '{\rtf1\ansi\ansicpg1252\uc0\deff0{\fonttbl {\f0\fswiss\fcharset0\fprq2 Arial;} {\f1\fnil\fcharset0\fprq2 SimSun;} {\f2\froman\fcharset2\fprq2 Symbol;}} {\colortbl;\red0\green0\blue0;\red255\green255\blue255;} {\stylesheet{\s0\itap0\f0\fs24 [Normal];}{*\cs10\additive Default Paragraph Font;}} {*\generator TX_RTF32 11.0.401.501;} \deftab1134\paperw11907\paperh16443\margl567\margt567\margr567\margb567\pard\itap0\plain\f1\fs20\loch\f1\hich\f1\u20320\u22909\u21527\par }';
$result = $reader->Parse($rtf);
$formatter = new RtfHtml();
$test = $formatter->Format($reader->root);
and it give me this result:
◊u22909◊par
I am expecting to get the result of \u20320\u22909\u21527\ which I can then translated it back to Chinese character.
Is there any one here have similar issue and what is the solution?
Cheers,
Jack
Could't install the project using composer, give me this error:
[InvalidArgumentException] Could not find package henck/rtf-to-html at any version for your minimum-stability (stable). Check the package spelling or your minimum-stability
I'm using this command: 'composer require henck/rtf-html-php'
Is there some hidden method of checking is RTF is valid before parsing it? I am running into issues with flooding my error handler with notices, as my third party RTF provider (an old system) is handing over badly formatted RTF.
It would be great if I could to something like this
$document = new Document();
if ($document->valid($rtf)) {
$parsed = $document->Parse($rtf);
}
Also it would be nice if Document didn't both trigger a trigger_error as well throw an exception. This will hand off the error twice to my error handler (well only once if I catch it). An exception should be enough I would imagine? :)
If you pass through plain text and not RTF (which is possible if you are fetching content from a db) the function loops endlessly.
Notice: Undefined property: RtfHtml::$output, line 610
Hello,
When I try to install via composer, I get this message :
Could not find package henck/rtf-to-html at any version for your minimum-stability (stable). Check the package spelling or your minimum-stability
What can I do ?
Awesome tool you've built here. Im having great success with it, just wondering why one thing isnt quite working correctly... in the following RTF the text that has strike through applied to it ( for a work project with ) is NOT shown as HTML text thats striken... but I can see in the HTML that the css line Is there:
text-decoration: strikethrough;
but it looks like the css property to strike text should be:
text-decoration: line-through
`{\rtf1\ansi\ansicpg1252\cocoartf1504\cocoasubrtf600
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;\red255\green0\blue0;\red0\green45\blue153;}
{*\expandedcolortbl;\csgray\c100000;\csgenericrgb\c100000\c0\c0;\csgenericrgb\c0\c17647\c60000;}
\margl1440\margr1440\vieww10800\viewh8400\viewkind0
\pard\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\pardirnatural\partightenfactor0
\f0\fs36 \cf0 This is a test
\fs24
\b \ul RTF file
\b0 \ulnone that I need to
\i \cf2 covert to HTML
\i0 \cf0 f\strike \strikec0 or a work project with\strike0\striked0
\b \cf3 some coworkers.}`
can find solution for me content table not convert in this script.
Hi,
This my RTF string:
{\rtf1\deff0{\fonttbl{\f0 Calibri;}{\f1 Microsoft Sans Serif;}}{\colortbl ;\red0\green0\blue255 ;}{*\defchp \fs22}{\stylesheet {\ql\fs22 Normal;}{*\cs1\fs22 Default Paragraph Font;}{*\cs2\sbasedon1\fs22 Line Number;}{*\cs3\ul\fs22\cf1 Hyperlink;}{*\ts4\tsrowd\fs22\ql\tscellpaddfl3\tscellpaddl108\tscellpaddfb3\tscellpaddfr3\tscellpaddr108\tscellpaddft3\tsvertalt\cltxlrtb Normal Table;}{*\ts5\tsrowd\sbasedon4\fs22\ql\trbrdrt\brdrs\brdrw10\trbrdrl\brdrs\brdrw10\trbrdrb\brdrs\brdrw10\trbrdrr\brdrs\brdrw10\trbrdrh\brdrs\brdrw10\trbrdrv\brdrs\brdrw10\tscellpaddfl3\tscellpaddl108\tscellpaddfr3\tscellpaddr108\tsvertalt\cltxlrtb Table Simple 1;}}{*\listoverridetable}\nouicompat\splytwnine\htmautsp\sectd\pard\plain\ql{\f1\fs17\cf0 100% polyester r\u233'e9sistant Les coutures sont renforc\u233'e9es et les bords sont doubles}\f1\fs17\par}
I get this:
100% polyester r◊'e9sistant Les coutures sont renforc◊'e9es et les bords sont doubles
Instead of this:
100% polyester résistant Les coutures sont renforcées et les bords sont doubles
I think it's due to this rtf char not properly filtered "\u233"
I hope this report will help this amazing tool 👍 !!!
We have throw new Exception in Document.php but PHP 7.2 or more with namespace neeed.
$err = "Parse error: Tried to read past end of input; RTF is probably truncated.";
trigger_error($err);
throw new \Exception($err);
ErrorException : Undefined property: RtfHtml::$output
C:\Users\brett\workspace\feesynergy\vendor\henck\rtf-to-html\rtf-html-php.php:957
{\\rtf1\\deff0{\\fonttbl{\\f0 Times New Roman;}{\\f1 MS Sans Serif;}}{\\colortbl\\red0\\green0\\blue0 ;\\red0\\green0\\blue255 ;}{\\*\\defchp \\f1\\fs20}{\\*\\listoverridetable}{\\stylesheet {\\ql\\f1\\fs20\\cf0 Normal;}{\\*\\cs1\\f1\\fs20\\cf0 Default Paragraph Font;}{\\*\\cs2\\sbasedon1\\f1\\fs20\\cf0 Line Number;}{\\*\\cs3\\ul\\f1\\fs20\\cf1 Hyperlink;}}\\splytwnine\\htmautsp\\sectd\\pard\\plain\\ql{\\b\\f1\\fs20\\cf0 Professional Fees}\\b\\f1\\fs20\\cf0\\par\\pard\\plain\\ql{\\f1\\fs20\\cf0 As per contractual agreement.}\\f1\\fs20\\cf0\\par}
i got an issue with a text that contains more than one word or phrase that's underlined. the first one gets displayed fine, but the second (or third) time an underlining is used is not working.
example:
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil Microsoft Sans Serif;}{\f1\fnil\fcharset0 Microsoft Sans Serif;}} {\colortbl ;\red0\green0\blue0;} \viewkind4\uc1\pard\lang1031\f0\fs17 Filterdichtung f\f1\'fcr Jupiter Gebl\'e4seeinheit \par \par \cf1\ul\b\'dcberschrift 1 - fett + unterstrichen:\cf0\ulnone\b0\par \par Test text 1\par \par \cf1\ul\b\'dcberschrift 2 - fett + unterstrichen:\cf0\ulnone\b0\par \par Test text 2\par \par \cf1\ul\b\'dcberschrift 3 - fett + unterstrichen:\cf0\ulnone\b0\par \par Test text 3\f0\par }
Überschrift 1 - fett + unterstrichen:
is underlined (and bold) nicely.
Überschrift 2 - fett + unterstrichen:
is just bold, but not underlined.
thanks for your help!
Hello,
I tried the code, but it does not translate diacritics to UTF-8 (or any other codeset). Example:
Original: Mnoho věcí nechápeme, mnoha věcí se lekáme.
Translated: Mnoho v◊'3fc◊'3f nech◊'3fpeme, mnoha v◊'3fc◊'3f se lek◊'3fme.
I have used an invese funcion for RTF generation with success - see the function below, probably it would be nice to include inverted function (eg. rtf_to_utf8) in your project:
function utf8_to_rtf($utf8_text) {
$utf8_patterns = array(
"[\xC2-\xDF][\x80-\xBF]",
"[\xE0-\xEF][\x80-\xBF]{2}",
"[\xF0-\xF4][\x80-\xBF]{3}",
);
$new_str = $utf8_text;
foreach($utf8_patterns as $pattern) {
$new_str = preg_replace("/($pattern)/e",
"'\u'.hexdec(bin2hex(mb_convert_encoding('$1', 'UTF-16', 'UTF-8'))).'?'",
$new_str);
}
return $new_str;
}
I am not really familiar with UTF-16, so I am not able to suggest an exact working patch right now :-(
This is returning false.
$reader = new RtfReader();
$rtf = file_get_contents("example.rtf"); // or use a string
$result = $reader->Parse($rtf);
echo "<pre>";
var_dump($result);
Any help?
Hi,
I'm having some problems when I call $document = new Document($evolution);
one Exception error is threw:
Notice: Parse error: Tried to read past end of input; RTF is probably truncated. in C:\xampp\htdocs\App_prontuario\RtfHtmlPhp\Document.php on line 29
I'm making a query in a database and I take the return query and I put in a variable how you can see below:
`<?php
session_start();
require '../../vendor/autoload.php';
use Classes\Pacient\PacientEvolution\PacientEvolution;
$pacientRegistry = intval($_GET['prontuario']);
$pacientEvolution = new PacientEvolution();
$pacientEvo = $pacientEvolution->findPacientEvolution($pacientRegistry);
$evolution = null;
foreach ($pacientEvo as $key => $value) {
if ($pacientEvo[$key]['EVOLUCAO']) {
$evolution = $pacientEvo[$key]['EVOLUCAO'];
}
}
use RtfHtmlPhp\Document;
$document = new Document($evolution);`
But I don't know what is happening, can someone help me? I'm for 3 days looking for some solution to show the RTF content but I can't do yet.
After the changes to add support for unicode encoding were added there have been a number of issues i've been facing using the package.
I have attached the file of one RTF file that i can't seem to get to convert well. It adds 2 Bullets as i think we aren't handling the fact that RTF was partially unicode compatible. So you could specify an ansi char AND a unicode char. I think the package is interpreting both of them instead of using only 1.
http://www.biblioscape.com/rtf15_spec.htm - search for Control word \uN
An RTF writer, when it encounters a Unicode character with no corresponding ANSI character, should output \uN followed by the best ANSI representation it can manage. Also, if the Unicode character translates into an ANSI character stream with count of bytes differing from the current Unicode Character Byte Count, it should emit the \ucN keyword prior to the \uN keyword to notify the reader of the change.
Any help would be greatly appreciated @mhtamala @henck
There's a line break issue related to missing break statement in HtmlFormatter.php at line 328:
currently:
case 'line': $this->output .= "<br/>";// character value (line feed =
) (carriage return =
)
should be:
case 'line': $this->output .= "<br/>"; break;// character value (line feed =
) (carriage return =
)
Hi,
let me say first of all that you have done a great job. A nice little tool and it works without external services. I think that's really good!
You write that your parser only interprets the bare essentials that you need on a website. I agree with that as well.
What I would still miss, because I just use it frequently on websites, would be the following things:
Maybe you could think again about expanding your parser accordingly.
Greetz
When a text start with bold, the text is show as normal font
this is the raw Text
{\rtf1\ansi\ansicpg1252\deff0\deflang3082{\fonttbl{\f0\fnil\fcharset0 Courier New;}}
\viewkind4\uc1\pard\b\f0\fs17 VIT.O.BEST WHEY PROTEIN 100% 4LB\b0 . dentro del mundo de los suplementos deportivos es posiblemente es la mejor prote''edna en cuanto a calidad/precio del re''f1ido mercado de la nutrici''f3n deportiva.\par
\par
\par
\b VIT.O.BEST WHEY PROTEIN 100% 24B\b0 . es una prote''edna perfecta para el uso diario, que adem''e1s tiene una buena capacidad de absorci''f3n y una velocidad de absorci''f3n media.\par
\par
\par
\b VIT.O.BEST WHEY PROTEIN 100%\b0 tiene una gran variedad de sabores deliciosos y muy suaves perfectos para el uso diario sin que terminen siendo empalagosos o \b aburridos. WHEY PROTEIN 100%\b0 est''e1 disponible en sabor a chocolate, yogur de lim''f3n , fresa, crema de caf''e9 y vainilla. \b VIT.O.BEST WHEY PROTEIN 100% \b0 es una prote''edna muy pura que no contiene az''facar y est''e1 totalemente libre de aspartamo.\par
}
this is the dump
{
WORD rtf (1)
WORD ansi\ansicpg (1252)
WORD deff (0)
WORD deflang (3082)
WORD viewkind (4)
WORD uc (1)
WORD pard\b\f (0)
WORD fs (17)
TEXT VIT.O.BEST WHEY PROTEIN 100% 4LB
WORD b (0)
TEXT . dentro del mundo de los suplementos deportivos es posiblemente es la mejor prote
SYMBOL ' (237)
TEXT na en cuanto a calidad/precio del re
SYMBOL ' (241)
TEXT ido mercado de la nutrici
SYMBOL ' (243)
TEXT n deportiva.
WORD par (1)
WORD par (1)
WORD par (1)
WORD b (1)
TEXT VIT.O.BEST WHEY PROTEIN 100% 24B
WORD b (0)
TEXT . es una prote
SYMBOL ' (237)
TEXT na perfecta para el uso diario, que adem
SYMBOL ' (225)
TEXT s tiene una buena capacidad de absorci
SYMBOL ' (243)
TEXT n y una velocidad de absorci
SYMBOL ' (243)
TEXT n media.
WORD par (1)
WORD par (1)
WORD par (1)
WORD b (1)
TEXT VIT.O.BEST WHEY PROTEIN 100%
WORD b (0)
TEXT tiene una gran variedad de sabores deliciosos y muy suaves perfectos para el uso diario sin que terminen siendo empalagosos o
WORD b (1)
TEXT aburridos. WHEY PROTEIN 100%
WORD b (0)
TEXT est
SYMBOL ' (225)
TEXT disponible en sabor a chocolate, yogur de lim
SYMBOL ' (243)
TEXT n , fresa, crema de caf
SYMBOL ' (233)
TEXT y vainilla.
WORD b (1)
TEXT VIT.O.BEST WHEY PROTEIN 100%
WORD b (0)
TEXT es una prote
SYMBOL ' (237)
TEXT na muy pura que no contiene az
SYMBOL ' (250)
TEXT car y est
SYMBOL ' (225)
TEXT totalemente libre de aspartamo.
WORD par (1)
}
I'm trying to convert some text which it looks like it has some currency symbols and it gets converted into the classic 'question mark symbol' usually seen for unrecognised characters.
This is my sample rtf text:
{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}{\f1\fnil\fcharset0 Times New Roman;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\f0\fs16 France - Paris
\par Trip 1 \'8065
\par Trip 2 \'8010
\par Trip 3 \'8026
\par }
It gets converted to:
<span style="font-size: 14px;">France - Paris
</span><p>Trip1 �65
</p>Trip 2 �10
<p>Trip 3 �26
</p>
Looking at an RTF guide it looks like the '80 should be the euro currency symbol (https://www.oreilly.com/library/view/rtf-pocket-guide/9781449302047/ch04.html).
Also I've noticed that not all lines get converted to paragraphs - not sure if that would be a problem with the rtf text itself or some conversion issue. The data comes imported from an API so I don't have much control over it so not sure whether it's a data issue or conversion.
Thanks
Hi,
the conversion works principally fine, but produces ugly html with lots of span tags where none were required, actually EVERY umlaut (ö/ä/ü) is wrapped with span tags!
e.g.
raffte mehr als die H</span><span style='font-size: 14px;'>ä</span><span style='font-size: 14px;'>lfte der Bev</span><span style='font-size: 14px;'>ö</span><span style='font-size: 14px;'>lkerung dahin.
here is my rtf source:
raffte mehr als die H\'e4lfte der Bev\'f6lkerung dahin.
2.Problem
u8211 should be converted to something like – (–) or — (—) but is converted to ◊ (◊)
beklagt sich \u8211 - f\'fcr deutsche Leser
beklagt sich </span>◊<span style='font-size: 14px;'>- f</span><span style='font-size: 14px;'>ü</span><span style='font-size: 14px;'>r deutsche Leser
Notice: Undefined property: RtfHtml::$defaultFont
RTF Translation is wrong in this case:
`<?php
error_reporting(-1);
ini_set('display_errors', 'on');
require DIR . '/vendor/autoload.php';
use RtfHtmlPhp\Document;
use RtfHtmlPhp\Html\HtmlFormatter;
$original = "{\rtf1\ansi\ansicpg1252\deff0{\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}\viewkind4\uc1\pard\lang1036\f0\fs16 m'e9lange ma'efs fran'e7ais\par}";
$document = new Document($original); // or use a string directly
$formatter = new HtmlFormatter('UTF-8');
$r = $formatter->Format($document);
file_put_contents('rtf.html', $r);
echo $r;
?>`
Result:
tf1onttbl0nilcharset0 Microsoft Sans Serif;�iewkind40s16 mélange maïs français
Expected result:
Something like "Mélange maïs français"
Thanks,
This is a really useful library, thank you for the hard work!
Is it possible to just output plaintext only rather than html formatted?
Thanks
I´m lost with this issue,
I´m trying to conver a RTF document (shown below) wich contains very simple talbes 4 columns and 3 rows and converted document only shows the text contained on table, rest of converted text is shown perfect.
I´ve tryied with different documents, and different tables, and even I´ve try to change document encoding, but result it´s always the same.
Any idea about what could be happening with this?
(by the way... on root dump seems that cell and row elements are present)
Thank you so much for your code and help
____ RTF code ___
{\rtf1\ansi\deff0\adeflang1025
{\fonttbl{\f0\froman\fprq2\fcharset0 Times New Roman;}{\f1\froman\fprq2\fcharset2 Symbol;}{\f2\fswiss\fprq2\fcharset0 Arial;}{\f3\fnil\fprq0\fcharset128 OpenSymbol{*\falt Arial Unicode MS};}{\f4\fnil\fprq2\fcharset0 Microsoft YaHei;}{\f5\fnil\fprq2\fcharset0 Lucida Sans;}{\f6\fswiss\fprq0\fcharset0 Lucida Sans;}}
{\colortbl;\red0\green0\blue0;\red255\green255\blue255;\red153\green153\blue153;\red128\green128\blue128;}
{\stylesheet{\s0\snext0\nowidctlpar{*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\aspalpha\ltrpar\cf0\kerning1\hich\af7\langfe2052\dbch\af5\afs24\lang1081\loch\f0\fs24\lang3082 Predeterminado;}
{*\cs15\snext15\hich\af3\dbch\af3\loch\f3 Vi?etas;}
{\s16\sbasedon0\snext17\sb240\sa120\keepn\hich\af4\dbch\af5\afs28\loch\f2\fs28 Encabezado;}
{\s17\sbasedon0\snext17\sb0\sa120 Cuerpo de texto;}
{\s18\sbasedon17\snext18\sb0\sa120\dbch\af6 Lista;}
{\s19\sbasedon0\snext19\sb120\sa120\noline\i\dbch\af6\afs24\ai\fs24 Etiqueta;}
{\s20\sbasedon0\snext20\noline\dbch\af6 Índice;}
{\s21\sbasedon0\snext21\noline Contenido de la tabla;}
{\s22\sbasedon21\snext22\qc\noline\b\ab Encabezado de la tabla;}
}{*\listtable{\list\listtemplateid1
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u8226 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li720}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9702 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li1080}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9642 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li1440}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u8226 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li1800}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9702 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li2160}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9642 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li2520}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u8226 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li2880}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9702 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li3240}
{\listlevel\levelnfc23\leveljc0\levelstartat1\levelfollow0{\leveltext '01\u9642 ?;}{\levelnumbers;}\f3\dbch\af3\fi-360\li3600}\listid1}
{\list\listtemplateid2
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-432\li432}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-576\li576}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-720\li720}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-864\li864}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1008\li1008}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1152\li1152}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1296\li1296}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1440\li1440}
{\listlevel\levelnfc0\leveljc0\levelstartat1\levelfollow0{\leveltext '00;}{\levelnumbers;}\fi-1584\li1584}\listid2}
}{\listoverridetable{\listoverride\listid1\listoverridecount0\ls1}{\listoverride\listid2\listoverridecount0\ls2}}{\info{\creatim\yr2019\mo6\dy13\hr11\min46}{\revtim\yr2019\mo6\dy13\hr11\min46}{\printim\yr0\mo0\dy0\hr0\min0}{\comment OpenOffice}{\vern4150}}\deftab709
{*\pgdsctbl
{\pgdsc0\pgdscuse195\pgwsxn11906\pghsxn16838\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\pgdscnxt0 Predeterminado;}}
\formshade\paperh16838\paperw11906\margl1134\margr1134\margt1134\margb1134\sectd\sbknone\sectunlocked1\pgndec\pgwsxn11906\pghsxn16838\marglsxn1134\margrsxn1134\margtsxn1134\margbsxn1134\ftnbj\ftnstart1\ftnrstcont\ftnnar\aenddoc\aftnrstcont\aftnstart1\aftnnrlc
\pgndec\pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse nisi tortor, tristique cursus viverra non, suscipit sed lorem. Ut tincidunt pharetra neque, at dignissim lorem mollis id. Nullam sodales facilisis dolor, et lacinia neque vulputate nec.}
\par \pard\plain \s17\sb0\sa120{\ul\ulc0\b\ab\rtlch \ltrch\loch\lang3082
Aenean nulla augue, finibus id tortor eget, pellentesque scelerisque sapien}{\rtlch \ltrch\loch\lang3082
. Vivamus facilisis bibendum erat, eu fringilla sapien vestibulum sit amet. In purus ante, mollis sit amet leo id, }
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\b\ab\rtlch \ltrch\loch\lang3082
Aliquam quis augue at turpis eleifend consequat sit amet quis sem. Proin sed augue sagittis, gravida ipsum quis, accumsan velit. Sed laoreet hendrerit interdum. Proin blandit interdum orci sed faucibus. Donec in eleifend neque. Etiam auctor orci at eros aliquam ornare vitae sed eros}{\rtlch \ltrch\loch\lang3082
. Aliquam pulvinar, urna sed sollicitudin euismod, velit magna sagittis massa, vel eleifend leo tortor ac nunc. Sed sit amet est felis. Sed mauris enim, auctor et erat finibus, placerat iaculis ex. }
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\listtext\pard\plain \hich\af3\dbch\af3\loch\f3 '95\tab}\ilvl0\ls1 \li720\ri0\lin720\rin0\fi-360{\rtlch \ltrch\loch\lang3082
A}
\par \pard\plain \s17\sb0\sa120{\listtext\pard\plain \hich\af3\dbch\af3\loch\f3 '95\tab}\ilvl0\ls1 \li720\ri0\lin720\rin0\fi-360{\rtlch \ltrch\loch\lang3082
B}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
lkjlkjlkjlkjlkj d'f1lskfj d'f1lfkjd sf'f1lkdsj fl'f1dksjf dsl'f1kfj ds'f1lfkjs aflkasdj flksdj fsdl'f1kfj sdlkf jsdlkf jsdl'f1kf jsdlkf'f1 jsdlfk sdjf'f1lksdjflk}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \trowd\trql\ltrrow\trpaddft3\trpaddt55\trpaddfl3\trpaddl55\trpaddfb3\trpaddb55\trpaddfr3\trpaddr55\clbrdrt\brdrs\brdrw1\brdrcf1\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clcbpat3\cellx2409\clbrdrt\brdrs\brdrw1\brdrcf1\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clcbpat3\cellx4818\clbrdrt\brdrs\brdrw1\brdrcf1\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clcbpat3\cellx7227\clbrdrt\brdrs\brdrw1\brdrcf1\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clbrdrr\brdrs\brdrw1\brdrcf1\clcbpat3\cellx9636\pard\plain \s21\noline\intbl{\cf2\rtlch \ltrch\loch\lang3082
SSS}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
GGG}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
DDD}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
EEEE}\cell\row\trowd\trql\ltrrow\trpaddft3\trpaddt55\trpaddfl3\trpaddl55\trpaddfb3\trpaddb55\trpaddfr3\trpaddr55\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx2409\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx4818\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx7227\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clbrdrr\brdrs\brdrw1\brdrcf1\cellx9636\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
AAA}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
aaaaabbbccc}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
MmmmNNN}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
ddd}\cell\row\trowd\trql\ltrrow\trpaddft3\trpaddt55\trpaddfl3\trpaddl55\trpaddfb3\trpaddb55\trpaddfr3\trpaddr55\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx2409\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx4818\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx7227\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clbrdrr\brdrs\brdrw1\brdrcf1\cellx9636\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
AAA}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
Aaaaayy}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
eyyy}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
eyyyy}\cell\row\trowd\trql\ltrrow\trpaddft3\trpaddt55\trpaddfl3\trpaddl55\trpaddfb3\trpaddb55\trpaddfr3\trpaddr55\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx2409\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx4818\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\cellx7227\clbrdrl\brdrs\brdrw1\brdrcf1\clbrdrb\brdrs\brdrw1\brdrcf1\clbrdrr\brdrs\brdrw1\brdrcf1\cellx9636\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
AAA}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
AAyy}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
yyye}\cell\pard\plain \s21\noline\intbl{\rtlch \ltrch\loch\lang3082
eyyy}\cell\row\pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120{\rtlch \ltrch\loch\lang3082
}
\par \pard\plain \s17\sb0\sa120\sb0\sa120{\rtlch \ltrch\loch\lang3082
Maecenas tempor sapien a est scelerisque sagittis. Nam ut congue augue, ut bibendum nisl. Vivamus ullamcorper ex ac faucibus pulvinar. Mauris ante nulla, hendrerit vitae congue sit amet, facilisis a justo. Suspendisse imperdiet lectus enim, ac luctus mauris efficitur eget. Aliquam sagittis }
\par }
A great script but unfortunately with umlauts like ä, ö, ü, etc. it is not.
"können" is "kamp;apos;f6nnen"
Can someone help me here. Thank you and best regards Dieter
The conversion of escaped characters is wrong.
For example I have the following RTF:
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1031{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 6.3.9600}\viewkind4\uc1
\pard\sa200\sl276\slmult1\f0\fs22\lang7 Hello \{World\}\par
}
The HTML snippet looks like this:
<span style="font-size: 19px;">Hello {{World}}</span><p>
But the correct result would be:
<span style="font-size: 19px;">Hello {World}</span><p>
And it's the same with other characters, for example the backslash:
{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1031{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 6.3.9600}\viewkind4\uc1
\pard\sa200\sl276\slmult1\f0\fs22\lang7 C:\\Windows\par
}
Results in a string with double backslash:
<span style="font-size: 19px;">C:\\Windows</span><p>
I have a .rtf file with a table of data . I can read all the cell and make an array after parsing.
according to this example it should be 14 total count of array but i am getting 13
I mean to say its not reading the empty cell of table. plz help me how can read this empty cell as well .
In empty cell there is
`
Seq. |
ID |
Key |
Scored |
Num Options |
Domain |
Flags |
1 |
1 |
C |
|
4 |
ScienceG8V1 |
2 |
the converter is great. Very useful thanks for sharing.
One Issue I've been struggling with is...when converting special chars from RTF to HTML e.g.
(RTF text)
Aufbaut**'fc**re\par (bold is the special char in hex)
the converted Text looks like this:
<span....>Aufbaut
ü
re
Converting back to RTF from HTML this results following
Aufbau_'fc_re ( underlines equal whitespaces)
is there any possible solution to change and get the whole word with special chars in one span instead of splitting to multipile spans?
Hi,
Please raise the version number to use a version that includes commit e4cff76. Currently composer is installing the version that causes a fatal error.
I am finding it mostly does not work. Here is an example:
{ tf1ansideff0{fonttbl{f0fnilfcharset0 Arial;}{f1fnil Arial;}} viewkind4uc1pardlang2057fs18 Handbuilt bowl with large crab in blacks and yellowsf1par }
Running that through your routine results in
tf1ansideff0fonttblf0fnilfcharset0 Arial;f1fnil Arial;viewkind4uc1pardlang2057fs18 Handbuilt bowl with large crab in blacks and yellowsf1par
whereas what is needed is
Handbuilt bowl with large crab in blacks and yellows
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.