dcsimg
Problem with HTTPUnit
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   matthew_magliocca
Posted On:   Wednesday, December 6, 2006 10:21 AM

Ok for this project I need to be able to make DOMS out of webpages. The project actually requires me to save a copy to our own server so a user can edit a copy of the page without touching the one online, and then I need to make a DOM out of the saved and modified copy. I can make the saved copy easy enough but for no particular reason I can find, HTTPUnit chokes on it. It is very close to identical to the original version except for some new tags. The odd thing is that it claims a JS function called MM_preloadImages is undefined from my saved copy but it does not object to this function when I make a DOM from the online copy. Also new branches of the DOM are in places where no user editing or modifying at all occured.    More>>

Ok for this project I need to be able to make DOMS out of webpages. The project actually requires me to save a copy to our own server so a user can edit a copy of the page without touching the one online, and then I need to make a DOM out of the saved and modified copy.



I can make the saved copy easy enough but for no particular reason I can find, HTTPUnit chokes on it. It is very close to identical to the original version except for some new tags.



The odd thing is that it claims a JS function called MM_preloadImages is undefined from my saved copy but it does not object to this function when I make a DOM from the online copy. Also new branches of the DOM are in places where no user editing or modifying at all occured. Is there something I'm doing wrong with this?



I use URLConnetion getInputStream to read the webpage into a string so the user can work on it. Then I write it out to disk on the server. When I try to use HTTPUnit on the new version it objects to the function that was there all along and I find new branches and nodes that weren't there the first time. There are no user additions or modifications to the areas I find these nodes.





I have noticed some odd formatting issues with my saved copy but I'm not sure if these should cause a problem since I thought HTML largely ignored white space.





The pages look rather identical to me at least in content, though I will admit i'm having trouble getting the HTML comments for some reason. Could that be signifigant? Is there some issue with how I'm getting the webpage to save that alters it to HTTPUnit's detriment? I don't think it could be the editing that does it because thats not where the issue turns up.

Thanks a ton

   <<Less
About | Sitemap | Contact