I have a document and almost all of the content is in HTML format.

Software Framework

If the resulting XML document is well-formed you can use XSLT to "translate" it. See http://metalab.unc.edu/xml/books/bible/updates/14.html for a relatively quick explanation of XSLT. You can download a translator at http://xml.apache.org/xalan/getstarted.html#download.

If the tags are simple enough, you can write a script or program to remove them. For example:

sed -e 's/<foo>//g' -e 's:</foo>::g' input.xml > output.xml