michaelrsweet / htmldoc Goto Github PK
View Code? Open in Web Editor NEWHTML Conversion Software
Home Page: https://www.msweet.org/htmldoc
License: GNU General Public License v2.0
HTML Conversion Software
Home Page: https://www.msweet.org/htmldoc
License: GNU General Public License v2.0
Version: 1.8.24rc2
Original reporter:
I would like to provide the same output in 1.8.24rc2 or rc1 as 1.8.23, but I always get an outline with TOC in left side of windows. I try --no-toc with no success.
cat me.html | ./htmldoc.pl me_new2.pdf
$prog='/usr/htmldoc/htmldoc_1.8.24rc2 --no-localfiles -t pdf --left .5in --right .5in --top .5in --bottom .5in --webpage - >' . $ARGV[0];
system("$prog");
Version: 1.9-feature
Original reporter:
How to change the starting page number in the header/footer?
Using the option: --footer ..1
it always begins with 1.
My problem is that I have a program that prints a log book. Everyday I need to print the new pages to add to the book so let's say yesterday I printed pages from 1 to 10 today I'll need the new page numbers to start from 11.
Can you add this new feature ?
Thanks in advance for Your help.
Regards.
Francesco Scarabelli
Version: 1.9-feature
Original reporter: Michael Sweet
Support index generation via external files and via markup.
Version: 1.8cvs
Original reporter: Michael Sweet
HTMLDOC's finder integration does not appear to be working - we need to use the open callback provided by FLTK to get the filename.
Version: 1.8.23
Original reporter: Michael Sweet
[posted by William S Fulton to the htmldoc.general group]
Hi
I've another testcase. This time it shows how htmldoc generates html that is not strictly correct. I'm using HTML TIDY (also called tidy) to check the html generated. HTML TIDY is available from the web standards body - http://www.w3.org. Alternatively the htmldoc generated code can be run through one of the validator sites like http://www.htmlhelp.com/tools/validator/
william@linux:/temp/htmldoctest/tidyoutput> cat test.book/temp/htmldoctest/tidyoutput> cat file1.html
#HTMLDOC 1.8.23
-t html -f test_output.html --book --toclevels 3 --no-numbered --toctitle "Table of Contents" --title --titleimage test.png --linkcolor #0000ff --linkstyle underline --fontsize 10.0 --fontspacing 1.2 --headingfont Helvetica --bodyfont Times --headfootsize 10.0 --headfootfont Helvetica --charset iso-8859-1 --browserwidth 680
file1.html
william@linux:
Three problems.
I've found tidy quite good for checking the htmldoc generated html as it warns about duplicate anchors. So it is a good way of checking that the globalisation of the local anchor names that htmldoc does doesn't cause any problems.
William
Version: 1.8.24rc1
Original reporter: Michael Sweet
It appears that the document outline is not being generated properly. The Users Manual outline consists of the text of each heading...
Version: 1.9-feature
Original reporter: Michael Sweet
Currently HTMLDOC enables interpolation of non-bitmap images. Some customers have complained about this.
Maybe add a new _HD_INTERPOLATE attribute for images, such that we can control this in the HTML with the default being "no"?
Version: 1.9-feature
Original reporter: Michael Sweet
Support background images in tables (tie into CSS1 support)
Version: 1.9-feature
Original reporter: Michael Sweet
Support in-line image maps (MAP elements).
Version: 1.8-current
Original reporter: Michael Sweet
Michael,
I added a fix to parse_paragraph to handle the case of inlined
images w/o any whitespace between them -- parse_paragraph was
putting them on the same line, even though their combined width
would exceed format_width.
this most frequently happened in a table row w/ a set width, that ==
to the width of the images to go in the row, blowing out the side of
the row.
the following is from htmldoc 1.9: [ps-pdf.cxx v.1.99],
my patch is actually to an earlier version w/o margins ...
line 4349:
if (temp->element == HD_ELEMENT_IMG)
{
if ((border = htmlGetVariable(temp, (uchar *)"BORDER")) != NULL)
borderspace = atof((char *)border);
else if (temp->link)
borderspace = 1;
else
borderspace = 0;
borderspace *= PagePrintWidth / _htmlBrowserWidth;
temp_width += 2 * borderspace;
}
prev = temp;
temp = temp->next;
temp_width += prev->width;
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
/* if we're at/beyond the known limits of our working width, break
out.
KM-PATCH.
*/
if ( temp_width >= format_width ) {
break;
}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
if (prev->element == HD_ELEMENT_BR)
break;
}
since I'm working from 1.8xx format_width would probably be replaced
w/ margins->width() ... ?
Have a good 4th,
Mark
Version: 1.10-feature
Original reporter: Michael Sweet
Add support for TrueType fonts - reading font information and embedding in PS and PDF output.
Version: 1.9-feature
Original reporter: Michael Sweet
Support additional external Type1 (ASCII) fonts.
Version: 1.8.24rc2
Original reporter:
Any items that appear as links (i.e. within tags) do not appear in the output.
The easiest way to see this is to run the testsuite/links.html file through htmldoc. The command I used was:
htmldoc --outfile links.pdf links.html
None of the links appear in the PDF.
I see this on both Linux (Fedora Core 2) and Solaris 8, both compiled with gcc.
I looked at both 1.8.24rc1 and 1.8.23 and both of those version process the links correctly.
Version: 1.10-feature
Original reporter:
Looking to properly 'bundle' images in html pages with htmldoc's '-t html' option, but images with absolute paths are omitted.
Would be great to grab them too, not just relative paths.
Htmldoc gurus, what say you?
Thanks,
Micki
Version: 1.8.24rc2
Original reporter:
The default output format of htmldoc is inconsistant and doesn't match the man page.
The man page states that if htmldoc is called as:
htmldoc [options] filename1.html
then the PDF is printed on the stdout. But that is not what I see. This example (on Linux Fedora Core 2) causes HTML to be printed on stdout.
<TITLE>Link Test</TITLE>
This is a link test.
Click on this image:
< .
Link Test
This is a link test.
This is the target of the link.
If I specify the format to be PDF, then it works as expected.
[ogxsvt@USDAL04516 htmldoc-1.8.24rc2]$ htmldoc/htmldoc -t pdf14 testsuite/links.html > links.pdf
PAGES: 4
BYTES: 169334
[ogxsvt@USDAL04516 htmldoc-1.8.24rc2]$ file links.pdf
links.pdf: PDF document, version 1.4
Trying to use htmldoc as a filter also requires specifying the format.
[ogxsvt@USDAL04516 htmldoc-1.8.24rc2]$ htmldoc/htmldoc - < testsuite/links.html > links.pdf
[ogxsvt@USDAL04516 htmldoc-1.8.24rc2]$ file links.pdf
links.pdf: HTML document text
[ogxsvt@USDAL04516 htmldoc-1.8.24rc2]$ htmldoc/htmldoc -t pdf14 - < testsuite/links.html > links.pdf
PAGES: 4
BYTES: 169174
[ogxsvt@USDAL04516 htmldoc-1.8.24rc2]$ file links.pdf
links.pdf: PDF document, version 1.4
But if the --outfile option is specified the --format/-t option is no longer required and PDF is generated.
[ogxsvt@USDAL04516 htmldoc-1.8.24rc2]$ htmldoc/htmldoc --outfile links.pdf testsuite/links.html
PAGES: 4
BYTES: 169334
[ogxsvt@USDAL04516 htmldoc-1.8.24rc2]$ file links.pdf
links.pdf: PDF document, version 1.3
I tried this on both 1.8.23 and 1.8.24rc2 and saw the same behavior.
Version: 1.8.24rc1
Original reporter: Michael Sweet
The current screenshots are out-of-date.
Also, from Steve:
Figure 3-1 �.Finger pointer for Title/Image field (labeled 3) and change Browse pointer to 4.
A screen shot of the the page tab when opened.
The screen shot numbers need to be changed all changed to one more progression. Instead of 4, the number is now 5, etc, etc.
If we don�t have time to modify the screen shots, we can kill the finger pointer of the title/image field (as asked for earlier) and improvise.
Version: 1.10-feature
Original reporter:
It would be nice to have an option allowing to fit a webpage to one single page (PDF or PS).
My problem is that I have some text followed by an image with fixed dimensions in a HTML page. If the text is too long, the image gets skipped to the second page of the produced PDF. Since it's hard to determine the height of the text to adapt the image dimensions, htmldoc could do the job by scaling the whole page down if necessary.
Version: 1.8.24rc3
Original reporter:
Removing jpegtran.c and pngtest.c from the project file was sufficient to compile the application successfully.
Version: 1.8.23
Original reporter:
Since 1.8.23, I have repeatedly seen ERR014 (table too wide; truncation); I get this on a wide range of documents, but it can be seen using command line below.
Because of this, I have previously stayed at 1.8.22; since 1.8.24rc1 introduces some features I want, I'm changing my code to discard this error (as it stands, if I see anything other than BYTES and PAGES I stop operation); not sure if this is a "false positive", and since I can work around it, it can't be critical, but I thought I'd mention it in case it is more of a problem than it appears.
Ignoring this error, the resultant PDFs appear to be OK.
Sorry the command line is so long, but it goes through a long process to get this far, and I thought I'd stick with the examples I know...
Command line example (obviously you'll need to fix output / working paths, but URL should be OK, as it is publically available):
HTMLDOC -f "D:\PDFTemp\24rc1.pdf" -t pdf --webpage --size A4 --portrait --no-toc --left 1.00in --right 0.5in --top 0.5in --bottom 0.5in --header .t. --footer cl: --logoimage "/_RMVirtual/Images/rmlogo.gif" --fontspacing 1.2 --fontsize 11.0 --headingfont Helvetica --bodyfont Helvetica --headfootfont Helvetica --browserwidth 800 --no-links --linkstyle plain --pagemode outline --path "http://www.rm.com/" --pagelayout one --no-numbered --nup 1 --color --no-pscommands --no-xrxcomments --compression=9 --jpeg=0 --charset 8859-1 --no-embedfonts --firstpage c1 --pageeffect none --pageduration 10 --effectduration 1.0 --encryption --permissions print --permissions no-modify --permissions copy --permissions no-annotate --owner-password "" --user-password "" "http://www.rm.com"
Version: 1.9-feature
Original reporter: Bill Janssen
It would be useful if the URL of the document could be an element of a header or footer, by adding another code (perhaps "u") to the list of elements that can make up a header or footer.
Version: 1.8cvs
Original reporter: Michael Sweet
The HTMLDOC users manual shortcut opens Notepad instead of Adobe Reader.
Version: 1.8cvs
Original reporter: Michael Sweet
GUI does not save "strict HTML" setting in book files.
Version: 1.8.23
Original reporter:
When you have more than 1 ul list in your document, the 2nd and subsequent lists pick up the alphanumerical index where the previous list at that nested level left off. You end up with lists which look like this after converted to pdf:
List closed, with some text here.
Version: 1.10-feature
Original reporter: Michael Sweet
Add code to do rewriting of strings per Unicode recommendations.
Version: 1.9-feature
Original reporter:
It would be really nice to have an option to specify that table rows are not allowed to break across pages. If a row doesn't completely fit on the current page then print the entire row on the next page.
Version: 1.9-feature
Original reporter: Michael Sweet
[Reference PR #5021]
Customer would like to see a booklet mode, such that the pages are rendered to be printed and folded to produce a booklet.
Key features desired:
- No scaling of pages, just double or quadruple the output page size and rotate.
- A 2-up mode for printing double-sided, saddle-stitched booklets
- A 4-up mode for printing single-sided booklets that are folded twice like what you do with typical PC greeting card software.
Implementation suggestion: implement as number up with values -2 and -4.
Version: 1.9-feature
Original reporter: Michael Sweet
Use UTF-8 as the native (internal) encoding and output encoding.
Support other input encodings which map to UTF-8 on input.
Version: 1.9-feature
Original reporter: Michael Sweet
In 1.8.23, the specified header is not shown on pages that contain a
My Question: Is there a possibility to disable this "feature" in 1.8.24?
Thanks,
Tom
Version: 1.9-feature
Original reporter:
Hi,
We are in very bad need of converting HTML to PDF with style support.
Got to know that HTML style is supported in HTMLDOC 1.9.
When can we tentatively expect HTMLDOC version 1.9 which supports style attribute.
Thanks Inadvance.
Vani.
Version: 1.9-feature
Original reporter: Michael Sweet
Add new built-in text editor with syntax hilighting, other features.
Version: 1.10-feature
Original reporter: Michael Sweet
Add code to subset fonts to produce smaller output.
Version: 1.10-feature
Original reporter: Michael Sweet
Add new file/filter classes to handle all file and network IO as well as filtering (compression/encoding).
Version: 1.8.24rc3
Original reporter: Michael Sweet
Apache 2.0.30 and higher need the AcceptPathInfo option enabled for the HTMLDOC CGI mode to work.
Version: 1.10-feature
Original reporter: Michael Sweet
Add code to do character pair kerning from font metrics.
Version: 1.9-feature
Original reporter: Michael Sweet
Support INPUT and TEXTAREA.
PostScript output gets boxes for text, password, checkbox, radio, and textarea fields.
PDF output gets corresponding annotation objects.
Version: 1.8.24rc1
Original reporter:
It's very touchy, alter the content of the cells, or # of rows & columns and either you get the ERR014 or it succeeds.
Seems to be a combination of the width and a page break accross a table. remove one row, and all is ok; remove 3 columns and all is ok.
Commandline:
-t pdf --webpage --no-title --no-toc --no-links --footer l1. --logoimage powerby_en.gif --left 18 --right 18 --top 18 --bottom 18 --header .t. --landscape --size letter -f out.pdf
Version: 1.8.24rc3
Original reporter: Bill Janssen
1.8.24rc3 seems to generate PDF that AFPL ghostscript 8.00 can no longer render. In particular, words seem to be spaced too far apart. 1.8.23 works just fine here. To see this effect, try the following with both 1.8.23 and 1.8.24rc3:
htmldoc --header "..1" --footer "t.D" --headfootsize 8 --size Letter --no-strict --webpage --links --linkstyle plain --linkcolor "#80" -f "/tmp/foo.pdf" -t pdf14 'http://plato.stanford.edu/entries/albert-saxony/'
followed by
/sw/bin/gs -sDEVICE=png16m -sOutputFile="/tmp/page%05d.png" -dUseCropBox -dBATCH -r300 -dNOPAUSE "/tmp/foo.pdf"
Then examine /tmp/foo.pdf and /tmp/page00001.png side by side.
I see that when I run this with htmldoc 1.8.23, ghostscript will output the following messages, which do not occur when using htmldoc 1.8.24rc3:
Loading NimbusRomNo9L-Regu font from /sw/share/ghostscript/fonts/n021003l.pfb... 1980148 674232 1534544 247129 2 done.
Loading NimbusSanL-Bold font from /sw/share/ghostscript/fonts/n019004l.pfb... 2020340 719089 1554640 259891 2 done.
Loading NimbusRomNo9L-ReguItal font from /sw/share/ghostscript/fonts/n021023l.pfb... 2080628 777654 1574736 276879 2 done.
Loading StandardSymL font from /sw/share/ghostscript/fonts/s050000l.pfb... 2120820 818813 1574736 282618 2 done.
Loading NimbusSanL-Regu font from /sw/share/ghostscript/fonts/n019003l.pfb... 2161012 866744 1574736 286801 2 done.
I've tried this only on MacOS X, but do not believe it to be a platform-specific problem.
Version: 1.8-current
Original reporter:
when outputting to PDF and using the --grey, colors still appear in the output for table row and cell background color. not sure about anywhere else I haven't tried it.
I have reproduced this with the cvs version on Linux, and with RC1 on FreeBSD.
below is a sample document, and the command line used to reproduce this.
Doc:
col1 | col2 | col3 | col4 |
data1 | data2 | data3 | data4 |
Commandline:
htmldoc -t pdf --gray --webpage --no-title --no-toc --no-links --footer l1. --logoimage powerby_en.gif --left 18 --right 18 --top 18 --bottom 18 --header .t. --size letter -f comp comp
Version: 1.9-feature
Original reporter: Michael Sweet
Rewrite documentation.
Version: 1.8.23
Original reporter:
Hi,
When we try to convert a html which has span tag for formatting with style attribute specifying font-size,bold,etc.Formatting is not reflected on generated PDF.
Here is the sample HTML code which was tried.
Form A
FormA in PDF is not bold.
Find attached generated PDF.
Can any one please help us if something is wrong with HTML ?
Thanks Inadvance.
Version: 1.9-feature
Original reporter: Michael Sweet
Support reading of style data from an external file, in-line in a STYLE element, and in-line in a STYLE attribute.
Support the following CSS1 attributes: background*, border*, clear, color, display, float, font*, height, letter-spacing, line-height, list-style*, margin*, padding*, page-break-*, text-align, vertical-align, white-space, width, and word-spacing.
Support selection by element, class, pseudo-class (:hover, etc.), and ID. No generic attribute-based selection, and no media selectors.
Support @import.
Version: 1.8.23
Original reporter:
If background color is specified on a row, and a cell has no data, no background color is applied to the cell.
Version: 1.8.24rc2
Original reporter:
I see this on both Solaris 8 and Linux (Fedora Core 2).
Running configure like this:
./configure --prefix=/usr/local
does not install the binaries into the correct directory. The executables get installed into /usr/bin instead of /usr/local/bin/. This happens because the exec_prefix variable in the Makedefs file gets set to /usr instead of to what the prefix was set to.
The fix is a simple patch to the configure.in file.
--- configure.in.old 2004-10-29 10:08:09.190269825 -0500
+++ configure.in 2004-10-29 10:08:37.802443737 -0500
@@ -275,7 +275,7 @@
fi
if test "$exec_prefix" = "NONE"; then
if test "$bindir" = "${exec_prefix}/bin"; then
Version: 1.9-feature
Original reporter:
Cygwin appends .exe onto binaries that it compiles, so this patch teaches the EPM descriptor to look for that.
Version: 1.9-feature
Original reporter: Michael Sweet
Support THEAD to place row(s) at the top of each page that a table exists.
Version: 1.8cvs
Original reporter: Michael Sweet
The Windows installer does not add the HTMLDOC program folder to the path environment variable, preventing easy access to htmldoc.exe from the command-line.
Version: 1.9-current
Original reporter:
Htmldoc puts an extra blank line before the first list item. Attached are the test files: test.html, test.book, and test.pdf.
This happenned on Linux RH8 with 1.8.23. It also happenned with 1.8.24rc3.
Version: 1.9-feature
Original reporter: Michael Sweet
Use new margin classes to support floating objects (e.g. left/right aligned images and tables) that cross block boundaries.
Version: 1.9-feature
Original reporter: Michael Sweet
Add crop marks and other standard printing marks to the PostScript and PDF output.
Also add OPI comments, etc. for PostScript output as needed for commercial printing.
Version: 1.9-feature
Original reporter: Michael Sweet
Support output of text, images, tables, etc. using style data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.