ignore a specfic span type I am generating HTML output for ePub from LaTeX source. But, am having difficulty eliminating the "Chapter " at the start of each chapter.

0 Reactions

You can try to suppress both those tags using CSS in an included .css file (or inserting in the HTML code between <style> and </style>):

span.titlemark, span.titlemark + br {
display: none;
}

But you would have to test that on all devices to see if their renderers correctly handle it.¹

If you don't want to go into the effort of testing this, it is better to remove both nodes altogether, with appropriate parsing of the input. Using python (2.x) and the BeautifulSoup package², you can do:

import sys
import io
from bs4 import BeautifulSoup

with io.open(sys.argv[1]) as fp:
soup = BeautifulSoup(fp)

for node in soup.select("span.titlemark"):
print node.get_text()
sibling = node.find_next_sibling()
if sibling and sibling.name == 'br' and not sibling.get_text():
sibling.extract()
node.extract()

with io.open(sys.argv[1], 'w') as fp:
fp.write(unicode(soup))

to get rid of both.³ BeautifulSoup supports several html/xml parser, depending on the type and quality of the output of htlatex, you might need to experiment with the alternatives to get better/faster results.

htlatex is a shell script, so you could make a copy (/usr/local/bin/htlatexstrip) and add calling the python script as a postprocessing step in there.

¹ The X + Y suppresses the sibling <br /> node
² install with pip install beautifulsoup4 or easy_install beautifulsoup4
³ I am sure you can do something like that easily in PERL (or Ruby) as well, I just don't know how

Free books android app tbrJar TBR JAR Read Free books online gutenberg

Load Full (0)