dcsimg
Hi, I need to develop a java program that lists all .doc(ms word) files and searches thru each and every file for the keyword.
1 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Kula_reddy
Posted On:   Friday, December 7, 2001 03:15 PM

Hi, I need to develop a java program that lists all .doc(ms word) files and searches thru each and every file for the keyword.


When I look for key word in word file using input streams its not detecting though the word is present in that document. the input stream that is comming from word document is bibary and the key word is just a text. is there any way that i can convert binary to text and then look for the key word.


If u have any other solution to search for key word in word and rtf files. please reply to kula-s@mailcity.com

Re: Hi, I need to develop a java program that lists all .doc(ms word) files and searches thru each and every file for the keyword.

Posted By:   Simon_Ablett  
Posted On:   Thursday, December 20, 2001 05:09 AM

An ASCII text file contains only characters between ASCII value 32 and 127 plus CR (ASCII code 13), LF (ASCII code 10) and TAB (ASCII code 9) characters. Binary files can contain values between 0 and 255 (the maximum range for a single byte). So, it doesn't make sense to 'convert binary to text'. What you are probably better doing is telling your comparator method to ignore all characters than are not text i.e. that are greater than 127 or less than 32. That should work most of the time.

There is however an added complication in that Word documents often include 'change information' (try creating a small Word file containing only 'Hello World'. Save it. Open it again and change 'World' to 'Everyone' before saving it again. Now view the file through a 'dump' program!!). This could cause your search program to erroneously report successful finds when none are visible.

Hope that this helps.

Regards.
About | Sitemap | Contact