Saturday, April 21, 2007 04:01 PM
...Files may be of type Java,cobol,c etc...
So you're getting a file containing arbitrary source code, and you have to read it and guess which language it is written in? What a weird problem, though it does sound like fun. At least you must have a list of the possible languages.
Maybe an initial brute force way would be to run a compiler, preprocessor, or a syntax checker (find them, or write them yourself, I guess, with some regexps, or in lex/flex/ANTLR etc?), for the start of a source code file, in each possible language you might be expecting, one at a time, until one of them doesn't blow up due to invalid syntax? and probably you want to start with the languages that have the most restricted syntax first, at least for the start of a source code file.....