I want to parse HTML documents on the web into something understandable generically. I was thinking of using an XML parser with the HTML DTD from W3C. Does this sound sensible or am I missing something?

Davanum Srinivas

use JTidy, It can convert HTML documents into XHTML/XML:


0 Comments  (click to add your comment)
Comment and Contribute






(Maximum characters: 1200). You have 1200 characters left.