I want to parse HTML documents on the web into something understandable generically. I was thinking of using an XML parser with the HTML DTD from W3C. Does this sound sensible or am I missing something?

Davanum Srinivas

use JTidy, It can convert HTML documents into XHTML/XML:

http://lempinen.net/sami/jtidy/

0 Comments  (click to add your comment)
Comment and Contribute

 

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

About | Sitemap | Contact