What is the best way to convert HTML to XML, separating the content(data) from presentation in HTML? Are there any Java API's that we can make use of?

Brill Pappin

Try Cocoon and ECS, or JetSpeed from the Apache group (java.apache.org).

Thijs Stalenhoef adds:

There aren't really any APIs specifically for doing this. What do you want to do with the HTML-data after it has been converted to XML? If all you want to do is display it again using different style-sheets you should consider converting the HTML to XHTML. XHTML is simply an XML compliant form of HTML

If you want to convert the HTML to XML compliant with a DTD or schema of your own making then doing it can pose many problems. It all depends on the HTML. Do all the HTML-files use the same "template"? Then it is possible to write a program to convert it. If they are all different it is probably easy to do it by hand.