Re: Hi, I need to develop a java program that lists all .doc(ms word) files and searches thru each and every file for the keyword.
Thursday, December 20, 2001 05:09 AM
An ASCII text file contains only characters between ASCII value 32 and 127 plus CR (ASCII code 13), LF (ASCII code 10) and TAB (ASCII code 9) characters. Binary files can contain values between 0 and 255 (the maximum range for a single byte). So, it doesn't make sense to 'convert binary to text'. What you are probably better doing is telling your comparator method to ignore all characters than are not text i.e. that are greater than 127 or less than 32. That should work most of the time.
There is however an added complication in that Word documents often include 'change information' (try creating a small Word file containing only 'Hello World'. Save it. Open it again and change 'World' to 'Everyone' before saving it again. Now view the file through a 'dump' program!!). This could cause your search program to erroneously report successful finds when none are visible.
Hope that this helps.