maranget / hevea Goto Github PK
View Code? Open in Web Editor NEWHevea is a fast latex to html translator
Home Page: http://hevea.inria.fr
License: Other
Hevea is a fast latex to html translator
Home Page: http://hevea.inria.fr
License: Other
Hevea optimizes (option: -O
) out some CSS classes that use
pseudo-element selectors; without optimization the generated
HTML code is fine.
Here comes an example:
\documentclass{article}
\usepackage{hevea}
\newstyle{:root}{--initialcapwidth: 60\%}% not optimized away
\newstyle{.initialcap}{margin: auto; width: var(--initialcapwidth)}
%% Bug: next two newstyles are evicted by the optimizer (`-O').
\newstyle{.initialcap::first-letter}{font-size: 250\%}
\newstyle{.initialcap::first-line}{line-height: 120\%}
\newenvironment{initialcap}{\begin{divstyle}{initialcap}}{\end{divstyle}}
\begin{document}
\begin{initialcap}
Well, you know or don't you kennet or haven't I told you every
telling has a taling and that's the he and the she of it. Look,
look, the dusk is growing! My branches lofty are taking root. And
my cold cher's gone ashley. Fieluhr? Filou! What age is at? It
saon is late.
\end{initialcap}
\end{document}
Classes :root
and .initialcap
survive the optimization phase,
whareas .initialcap::first-letter
and .initialcap::first-line
are lost.
Pretty much so you can get the output and insert it into a site yourself, without manually stripping first..
Also note CSS names may in that case collide with ones in use by the user.
Build environment:
openSUSE 15.3
ocaml 4.05.0
ocamlbuild 0.14.0
hevea 2.35 source code
make builds a long list of modules (so in principle build environemnt seems to be sane :-) ) , but eventually fails with
Any hints how to fix this would be appreciated.
Hello, a number of typos which do not affect the functionality. Patch here:
https://salsa.debian.org/ocaml-team/hevea/-/blob/master/debian/patches/typos
Hello,
In Hevea "README" it is indicated that one can choose TARGET=byte if "ocamlbuild" is not installed.
But, in that case, script "ocn.sh" still calls "ocamlbuild".
Is there a way to correct that ?
Thanks.
Olivier Cessenat.
Almost minimal example:
\documentstyle{article}
\begin{document}
\begin{tabular}{p{0.25\linewidth}@{\qquad}l}
Head1 & Head2 \\
\relax% macro call plus following blank lines[s] screw up the output.
Data11 & Data12 \\
Data21 & Data22 \\
\end{tabular}
\end{document}
Translate with hevea tabular-unexpected-p.tex
.
Except, manually reformatted for clarity:
<table style="border-spacing:6px;border-collapse:separate;" class="cellpading0">
<tr>
<td style="vertical-align:top;text-align:left;" > Head1</td>
<td style="text-align:center;white-space:nowrap" >    </td>
<td style="vertical-align:top;text-align:left;white-space:nowrap" >Head2 </td>
</tr>
<tr>
<td style="vertical-align:top;text-align:left;" >
<p>Data11<td style="text-align:center;white-space:nowrap" >    </td></p>
</td>
<td style="vertical-align:top;text-align:left;white-space:nowrap" >Data12 </td>
</tr>
<tr>
<td style="vertical-align:top;text-align:left;" > Data21</td>
<td style="text-align:center;white-space:nowrap" >    </td>
<td style="vertical-align:top;text-align:left;white-space:nowrap" >Data22 </td>
</tr>
</table>
A p
-element sneaks into the second tr
, which harbours a td
.
HeVeA leaves space-eating mode (at least) after macros \label
and \index
. Here is a sufficiently minimum-buggy example:
\documentclass{article}
\usepackage{imakeidx}
\makeindex
\begin{document}
\noindent \textbackslash label
\begin{itemize}
\item Foo
\item\label{it:bar}Bar
\item \label{it:baz}Baz
\item\label{it:bazoo} Bazoo
\item \label{it:bazooka} Bazooka
\end{itemize}
\noindent \textbackslash index
\begin{itemize}
\item Snafoo
\item\index{snabar}Snabar
\item \index{snabaz}Snabaz
\item\index{snabazoo} Snabazoo
\item \index{snabazooka} Snabazooka
\end{itemize}
\noindent \textbackslash label \& \textbackslash index
\begin{itemize}
\item Snarffoo
\item\label{it:snarfbar}\index{snarfbar}Snarfbar
\item \label{it:snarfbaz}\index{snarfbaz}Snarfbaz
\item\label{it:snarfbazoo} \index{snarfbazoo}Snarfbazoo
\item\label{it:snarfbazooka}\index{snarfbazooka} Snarfbazooka
\item \label{it:huzzaburra} \index{huzzaburra} Huzzaburra
\end{itemize}
\end{document}
The problem shows up for those \item
s where there is white-space
between the non-text-generating macro and the actual text of the
item. TeX/LaTeX consistently eats up all whitespace after \item
whether interrupted by a macro or not.
Exemplary output of the first block:
<ul class="itemize">
<li class="li-itemize">Foo</li>
<li class="li-itemize"><a id="it:bar"></a>Bar</li>
<li class="li-itemize"><a id="it:baz"></a>Baz</li>
<li class="li-itemize"><a id="it:bazoo"></a> Bazoo</li>
<li class="li-itemize"><a id="it:bazooka"></a> Bazooka</li>
</ul>
Hi everybody -
I want to translate the LaTeX macro \rule[raise_len]{wdth}{hght}
as faithful as possible.
Currently my most promising approach uses SVG. The question is: do we allow for SVG or
does it exclude too many browsers/applications?
Comments welcome!
Section 5.3 "Comments", includes:
For user convenience, comment equivalents to the latexonly and toimage environment are also provided:
\begin{latexonly} would not work correctly in the preamble. In the preamble, you should use %BEGIN LATEX .
This is more than user convenience. I suggest updating that part of the documentation.
This follows up from ocaml/ocaml#10254
When Hevea is applied to the Ocaml manual, it produces section titles like this one:
'Chapter\u2004\u200d2\u2003The module system'
The \u200d is "Zero Width Joiner", and AFAIU is supposed to be used to combine two letters (producing ligatures, for instance).
Somewhat strangely here it has the effect that Firefox changes the font of the following char "2".
So I was wondering, maybe this is an incorrect use of this "ZWJ".
When extending the documentation in #21 I wanted to write
!`H\aa{}mb\"urg\'ef\o{}\~ns!
to show that the glyphs of the latin1 font are present. However,
I found that the inverted exclamation mark was translated to
something weird: ਐ
which ought to be ¡
.
Further investigation showed that there are also problems with
\t{oo}
: Warning: Command not found: \t\c{o}
: Warning: Application of '\c' on 'o' failedwhich both are mentioned in "LaTeX User's Guide & Reference
Manual" on page 39n (and translate correctly with LaTeX).
This is "just for the record". We have to be careful when using the
dot-less 'i' and the dot-less 'j'.
This example exhibits most of the quirks.
\documentclass{article}
\begin{document}
Dot-less (mathematical) ``$i$'' and ``$j$'':
\[
\imath, \jmath; \vec{\imath}, \vec{\jmath}; \hat{\imath}, \hat{\jmath}; \bar{\imath}, \bar{\jmath};
\]
\begin{itemize}
\item ``i''
\begin{itemize}
\item Accented ``i'': \^i, \'i, \`i;
same in italics -- \textit{``i'': \^i, \'i, \`i};
\item Accented dot-less ``\i'': \^{\i}, \'{\i}, \`{\i};
same in italics -- \textit{``\i'': \^{\i}, \'{\i}, \`{\i}};
\item Accented dot-less, math mode ``$\imath$'': $\vec{\imath}$,
$\hat{\imath}$, and $\bar{\imath}$;
\end{itemize}
\item ``j''
\begin{itemize}
\item Accented ``j'': \^j, \'j, \`j;
same in italics -- \textit{``j'': \^j, \'j, \`j};
\item Accented dot-less ``\j'': \^{\j}, \'{\j}, \`{\j};
same in italics -- \textit{``\j'': \^{\j}, \'{\j}, \`{\j}};
\item Accented dot-less, math mode ``$\jmath$'': $\vec{\jmath}$,
$\hat{\jmath}$, and $\bar{\jmath}$;
\end{itemize}
\end{itemize}
\end{document}
Please compare with the LaTeX rendering for the reference translation.
HEVEA fails to recognize the tag names and comments in HTML source code that is typeset using the listings package. The example input file to reproduce the problem is included below.
I processed the input file with pdflatex as follows.
$ pdflatex example.tex
This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian)
(preloaded format=pdflatex)
...
Output written on example.pdf (1 page, 172393 bytes).
Transcript written on example.log.
This command produced a PDF page that displays:
I processed the input file with hevea as follows.
$ ~/opt/hevea-2.32/bin/hevea -version
hevea 2.32+01 of 2018-07-04
library directory: /home/john/opt/hevea-2.32/lib/hevea
$ ~/opt/hevea-2.32/bin/hevea -fix -O -exec xxdate.exe -o example.html example.tex
Exclude comment 'comment'
./example.tex:42: Warning: keyval, unknown key: 'tag'
./example.tex:42: Warning: keyval, unknown key: 'MoreSelectCharTable'
Fixpoint reached in 1 step(s)
This command produced an HTML page that displays:
Notice that neither the opening and closing tags of the HTML elements nor the comment are highlighted in the HTML file as they are in the PDF.
The following LaTeX source was used as the example.tex input file.
\documentclass[12pt,a5paper]{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage[mono=false]{libertine}
\usepackage[sf]{titlesec}
\usepackage[usenames,dvipsnames]{color}
\usepackage{listings}
\usepackage{hevea}
%BEGIN LATEX
\lstset{basicstyle={\ttfamily\footnotesize}}
\lstset{showstringspaces=false}
\lstset{basewidth=0.5em}
%END LATEX
\newstyle{.source}{font-family:monospace;white-space:pre;}
\lstdefinestyle{syntax}{
keywordstyle={\color{NavyBlue}\bfseries},
commentstyle={\color{OliveGreen}\itshape},
stringstyle={\color{Maroon}\slshape}}
\lstnewenvironment{source}[1]{
\setenvclass{lstlisting}{source}
\lstset{language={#1},style=syntax}}{}
\begin{document}
\noindent
This is the source code of a Java program:
\begin{source}{Java}
public class HelloWorldApp {
public static void main(String[] args) {
// Prints the string to the console
System.out.println("Hello World!");
}
}
\end{source}
\noindent
This is a fragment of HTML source code:
\begin{source}{HTML}
<!-- Fleurons are stylized flowers. -->
<div class="center">
<img class="fleuron" src="images/fleuron.svg"
width="45" height="16" alt="* * *">
</div>
\end{source}
\end{document}
Hello, some of the comments in *.hva files still have iso8859-1 encoded characters, and could migrate to utf8. This might also be the case for some example files but I didn't look into these yet. Patch here:
https://salsa.debian.org/ocaml-team/hevea/-/blob/master/debian/patches/typos
I note that the Hevea in OPAM is 2.29, while the released Hevea is 2.32. Perhaps this could be updated? If necessary I may be able to help.
If we define an empty \chaptername
with LaTeX in styles report
or book
, the (non-breaking) space between \chaptername
and
\thechapter
goes away, too. That way chapter numbers are set
flush right as are all section numbers. Hevea keeps the space,
which skews the layout. The following patch fixes this problem.
The patch is so minute that it does not warrant a PR -- IMHO.
--- a/html/book.hva
+++ b/html/book.hva
@@ -1,5 +1,6 @@
\ifstyleloaded\relax
\else
+\input{ifthen.hva}
\input{bookcommon.hva}
\newcommand{\@book@attr}[1]{\@secid\envclass@attr{#1}}
\@makesection
@@ -10,7 +11,8 @@
\newstyle{.part}{margin:2ex auto;text-align:center}
\@makesection
{\chapter}{-1}{chapter}
- {\@open{h1}{\@book@attr{chapter}}}{\chaptername~\thechapter}{\quad}{\@close{h1}}
+ {\@open{h1}{\@book@attr{chapter}}}%
+ {\ifthenelse{\equal{\chaptername}{}}{}{\chaptername~}\thechapter}{\quad}{\@close{h1}}
\@makesection
{\section}{0}{section}
{\@open{h2}{\@book@attr{section}}}{\thesection}{\quad}{\@close{h2}}%
In the OCaml manual — and likely in other documents too — the authors use labels containing colons, such as c:core-xamples
, which are used verbatim in HTML, generating <a id="c:core-xamples"></a>
. While this is valid HTML, these ID cannot be used in CSS. It would be nice — possibly specifying an option — that all ID generated by hevea be compabile with CSS.
Consider the following foo.tex
file:
\documentstyle{article}
\begin{document}
\begin{rawhtml}
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a>
\end{rawhtml}
\end{document}
Run it through hevea
then hacha
:
$ hevea foo.tex
$ hacha foo.html
foo.html:58: Error while reading HTML: intag: attribute syntax
Adios
Replace <img .../>
by <img ...>
and the problem goes away.
In HTML 5, it is correct to close empty elements such as img
by writing <img ... />
. This should not cause hacha
to crash.
If we redirect hevea output to stdout one or more
auxilliary files appear in the current working directory.
mkdir d
cd d
hevea -o - <<EOF > /dev/null
\documentclass{article}
\begin{document}
\section{S}
\end{document}
EOF
ls
This is particularly unfunny if $PWD has no write premission.
Compiling the following document with hevea -text
\documentclass{book}
\begin{document}
\begin{quote}
\subsubsection{A subsubsection}
\end{quote}
\end{document}
results in the following output
A subsubsection
\00�\00---------------
The invalid characters inserted at the start are random (they change with the contents of the file).
This is probably one of the issue affecting the OCaml manual in ocaml/ocaml#9242 .
(I may look at the issue myself in the upcoming week).
We would benefit from some flexible support for the lang
attribute
to get automatic hyphenation into Hevea-translated documents.
For a start, having <html lang="LANG">
with LANG cleverly filled in
should be enough for monolingual documents. Multilingual documents
require a finer-grained application, which can already be achieved
through the built-in macro \@open@par[ATTRIBUTES]
.
How should I add a <div>
around the navigation links?
Currently, the navigation links from \linkstext
/\setlinkstext
output to HTML like this:
<body >
<a href="thinkstats2001.html"><img src="" alt="Previous" class="navarrow prevarrow"></a>
<a href="index.html"><img src="" alt="Up" class="navarrow uparrow"></a>
<a href="thinkstats2003.html"><img src="" alt="Next" class="navarrow nextarrow"></a>
<hr>
Instead, in order to style them correctly, use a responsive grid, and include role=navigation
for screen readers, I would like to do something like the following:
<body >
<div class="navarrows" role="navigation"><ul>
<li><a class="navarrow" href="thinkstats2001.html" title="Previous">Previous 🡄</a></li>
<li><a class="navarrow" href="index.html" title="Up">Up 🡅</a></li>
<li><a class="navarrow" href="thinkstats2003.html" title="Next">Next 🡆</a></li>
</ul></div>
I've tried this, which somewhat works, but (1) there doesn't seem to be a good way to style the outer <a>
with e.g. {text-decoration:none}
and (2) I need to wrap these in a <div>
in order to include the navlinks in the responsive grid:
\setlinkstext
{\imgsrc[alt="Previous" class="navarrow prevarrow"]{}}
{\imgsrc[alt="Up" class="navarrow uparrow"]{}}
{\imgsrc[alt="Next" class="navarrow nextarrow"]{}}
I am not at all well-versed in LaTex. So,
in a macros.hva
, I've tried various permutations of this with no success:
% NAVIGATION LINKS
\let\oldsetlinkstext=\setlinkstext
% \newcommand{\setlinkstext}[3]
\renewcommand{\setlinkstext}[3]{%
\begin{rawhtml}
<div class="navarrows">
\end{rawhtml}
\oldsetlinkstext{#1}{#2}{#3}
\begin{rawhtml}
</div>
\end{rawhtml}}
From html/hevea.hva:
\newsavebox{\@linkstext}
\newcommand{\setlinkstext}[3]
{\sbox{\@linkstext}
{\@printnostyle{<!--SETENV }%
\@link@arg{PREVTXT}{#1}%
\@link@arg{UPTXT}{#2}%
\@link@arg{NEXTTXT}{#3}%
\@printnostyle{-->
}}}
Also, I'll share what I have so far for a responsive CSS grid that works on mobile in a comment.
It would be helpful to have (1) links from the table of contents entries to the respective pages or (just before) the heading; and (2) a TOC as PDF bookmarks.
There's a way to create a TOC with nested bookmarks in the PDF. Years ago, I did this somehow with Sphinx to Latex to PDF (with pdflatex, IIRC).
\usepackage[]{hyperref}
\hypersetup{ pdfpagemode=UseOutlines, }
hyperref
bookmarks
\hypertarget
hevea 2.32 not compile with ocaml 4.07.1 and ocamlbuild 0.13.0
sh ocb.sh opt File "./check402.ml", line 1: Error: Reference to undefined global
Stdlib__sys'`
`/usr/bin/ocamldep.opt -modules auxx.ml > auxx.ml.depends
/usr/bin/ocamldep.opt -modules misc.ml > misc.ml.depends
/usr/bin/ocamldep.opt -modules location.ml > location.ml.depends
/usr/bin/ocamldep.opt -modules myLexing.ml > myLexing.ml.depends
/usr/bin/ocamldep.opt -modules bytes.ml > bytes.ml.depends
/usr/bin/ocamldep.opt -modules bytes.mli > bytes.mli.depends
/usr/bin/ocamlc.opt -c -w +a-3-4-9-41-45 -annot -safe-string -o bytes.cmi bytes.mli
/usr/bin/ocamlopt.opt -c -w +a-3-4-9-41-45 -annot -safe-string -o bytes.cmx bytes.ml
Please help to fix !
Odoc is a documentation generator for OCaml. It reads doc comments , delimited with (** ... *), and outputs HTML. Since 1.4.0 version it generates nice and shiny output like that:
I think it might be beneficial to unify the hevea output, which also is a generator of HTML version of OCaml manual, with odoc looks.
It would be helpful for SEO and for usability to include the "top title" in the page titles:
<title>
: "Distributions"<title>
: "Distributions [sep] Think Stats"—
, –
, ·
Unfortunately, I don't know how to do this w/ LaTeX.
e.g. Think Stats defines \thetitle
, \thesubtitle
, and \theversion
commands that return strings, but IDK what a good convention would be. https://github.com/AllenDowney/ThinkStats2/blob/c76e1ecdd56a47bbb7ed13ed2fafe1eb1274f3d9/book/book.tex#L45-L47
Maybe a redefineable command?
Lines 300 to 305 in ee94951
Currently the first item of an itemize
environment is not rendered exactly as the other ones: there is a line break after <li...>
, as in the following example:
<ul class="itemize"><li class="li-itemize">
item1
</li><li class="li-itemize">item2
</li><li class="li-itemize">item3
</li></ul>
This turns out to be a problem for some CSS list styles, like the "inside" style, where the line break is converted into a space.
Would it be possible to simply remove this line break? so as to obtain:
<ul class="itemize"><li class="li-itemize">item1
</li><li class="li-itemize">item2
</li><li class="li-itemize">item3
</li></ul>
Hevea constructs multi-line symbols (e.g. automatically-sized
parenthesis) in display math mode out of several glyphs.
Sometimes the pieces do not meet exactly, but leave minute
gaps, which are visually destracting.
After some experiments I have discovered that these gaps
persist even with
line-height: default;
Reducing the line-height to an ad-hoc value of 1.1 "defragments"
all multi-line symbols in my tests. The actual value may still be
too high, but I would like to suggest the following patch anyhow.
--- a/html/hevea.hva
+++ b/html/hevea.hva
@@ -587,7 +587,7 @@
%%Style of display
\def\@orange{\@print{#}FF8000}
\def\@magenta{fuchsia}
-\def\@dtstyle{border-collapse:separate;border-spacing:\@barsz;width:auto;}
+\def\@dtstyle{border-collapse:separate;border-spacing:\@barsz;line-height:1.1;width:auto;}
\def\@dcstyle{white-space:nowrap;padding:0px;}%;width:auto;}
\newstyle{.vdisplay}{\@dtstyle{} empty-cells:show; border:2px solid red;}
\newstyle{.vdcell}{\@dcstyle{} border:2px solid green;}
Here is a minimal example:
\documentclass{article}
\begin{document}\small
\begin{enumerate}
\renewcommand{\labelenumi}{(\arabic{enumi})}
\item An item.
\end{enumerate}
\end{document}
TeX is very user-friendly in this case by dropping all white-space
after the opening of the enumerate
environment and
skipping over non-text generation code up to the first \item
.
Found by xmllint.
I recently worked on getting a Hevea output for the Menhir manual -- see the corresponding merge request. There was a bug in the visual output that I had to fix, and I think this is a problem with Hevea: I believe that the spacing of the tabbing
environment is wrong, at least for this use-case.
Menhir uses tabbing to show a textual representation of parse trees, see Figure 6 in my HTML rendering. In the PDF, it looks roughly as follows:
expression
IF expression THEN expression
IF expression THEN expression . ELSE expression
But the default translation of tabbing
by Hevea uses the class cellpadding0
for HTML tables generated by tabbing
, which sets padding:0;
for the <td>
columns, which results in the columns between next to each other with no space in between. The Hevea output looks like this:
expression
IF expression THENexpression
IF expression THEN expression . ELSE expression
For the Menhir manual, I implemented a workaround, which is to replace all occurrences of cellpadding0
by cellpadding1
in the generated HTML -- see the corresponding commit. This gives me a rendering closer to what LaTeX generates.
As far as I can tell, cellpadding0
is only used for the tabbing environment (open_tabbing
in latexscan.mll). What is a reason for not using cellpadding1
there instead? In my case, using cellpadding0 is a rendering bug, but are there other cases where having any padding gives worse rendering? (Where LaTeX would itself not insert any spacing when using tabbing
?)
I uploaded with this report a small ZIP archive
repro-hevea-tabbing.zip which contains a minimal reproduction case. I also uploaded online the following files from this example:
cellpadding0
with cellpadding1
: it looks less wrongHi, I (quite accidentally) noticed that the imagen
program is using predictable paths under /tmp
:
COM=/tmp/imagen-com.$$
TMPPNG=/tmp/imagen-tmp-png.$$
In particular, $COM
is used to run code:
${GS} ${GSOPTS} -sOutputFile="| sh ${COM} > ${FINAL}" -
There are some well-known exploits that can take advantage of the code above. Basically, someone else on the machine can create those files before you do. Then he owns those files and can write malicious commands into them.
The use of the current PID in the filename helps a little bit, but ultimately PIDs are guessable, and all of them are known beforehand -- they're just sequential integers, after all. So in the end it's still pretty easy for someone to take over the temporary files that you want to use.
Fortunately there is a standard C function called mkstemp()
that can create those files securely. Unfortunately, it's not easily available in POSIX shell script. The following trick can be used however:
posix_mktemp(){
# Securely create a temporary file under ${TMPDIR} using the
# template "tmpXXXXXX", and echo the result to stdout. This works
# around the absence of "mktemp" in the POSIX standard.
printf 'mkstemp(template)' | m4 -D template="${TMPDIR:-/tmp}/tmpXXXXXX"
}
COM=$(posix_mktemp)
TMPPNG=$(posix_mktemp)
Since m4 is POSIX, and since it has an interface to the C mkstemp()
function, we let m4 create the temporary directory instead. This avoids the security issues by using a name that is truly unguessable.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.