NAME
html2latex -- convert HTML markup to LaTeX markup
SYNOPSIS
html2latex [opt ...] [file ...]
DESCRIPTION
For each file argument, html2latex converts the text as
HTML markup to LaTeX markup. If no files are specified, a usage
message is given. Input will be taken from standard input for files
named -. Output will to a similarly named file with a
.tex extension (html2latex recognises
.html extensions).
Options modify the action of html2latex. The options are:
- -n
- Number sections.
- -p
- Place page breaks after the title page (if present) and the
table of contents (if present).
- -c
- Generate a table of contents.
- -s
- Create no files -- LaTeX is output to stdout.
- -t Title
- Generate a title page, with the title ``Title''.
- -a Author
- Generate a title page, with the author ``Author''.
- -h Header
- Place the text ``Header'' after \begin{document}.
- -f Footer
- Place the text ``Footer'' before \end{document}.
- -o Options
- Specify the options to \documentstyle.
EXAMPLES
An example of use is
html2latex -n - < file.html | less
This converts file.html to LaTeX and pages through the
output. The sections (corresponding to heading tags in the HTML
source) will be numbered.
Another example is
html2latex -t 'Introduction to HTML' -a gnat -p -c -o
'[bookman]{article}' html-intro
This takes input from the file html-intro, writing to
html-intro.tex, and adds a title page (with title
Introduction to HTML and author gnat)
and table of contents with page-breaks after both. The sections of
the document are not numbered. The LaTeX source includes the line
\documentstyle[bookman]{article}.
SEE ALSO
latex(1)
BUGS
Current the only HTML tags supported are: TITLE, H1, H2, H3, H4, H5,
H6, UL, OL, DL, DT, DD, LI, B, I, U, EM, STRONG, CODE, SAMP, KBD, VAR,
DFN, CITE, LISTING. The only recognised SGML escapes are &.amp,
&.lt, &.gt. ADDRESS tags are handled badly.
The COMPACT attribute to a DL tag is not recognised.
MENU and DIR styles are not handled well.
TITLE text are ignored.
Currently PRE tags are not handled at all.
The entire file is read into memory. For long HTML documents on
machines with little memory, this may cause problems.
CREDITS
Nathan Torkington adapted the HTML parser from NCSA's Xmosaic package
(file://ncsa.uiuc.edu/Web/xmosaic) and wrote the conversion
code. The HTML parser code is subject to the NCSA restrictions. The
conversion code is subject to the VUW restrictions. Enquiries should
be sent via e-mail to Nathan.Torkington@vuw.ac.nz.