Posted By:
Harald_Kirsch
Posted On:
Friday, June 18, 2004 06:20 AM
200000 regular expressions generalized from 200000 protein names must be matched in text in order to annotate the text with links into a protein database. My first guess to try this with antlr would be to rewrite the regexps as an antlr lexer grammar. Each regexp would be a rule, and the rule would rewrite the token to be a link to the database. I know antlr only from the overview and quick examples of its homepage. Before I dig any deeper, I would appreciate hints as to whether there is any chance of success. Concerns arising from my limited knowledge of antlr include: "warning:lexical nondetermini
More>>
200000 regular expressions generalized from
200000 protein names must be matched in text in
order to annotate the text with links into
a protein database.
My first guess to try this with antlr would
be to rewrite the regexps as an antlr lexer
grammar. Each regexp would be a rule, and the
rule would rewrite the token to be a link
to the database.
I know antlr only from the overview and
quick examples of its homepage. Before I dig
any deeper, I would appreciate hints as to
whether there is
any chance of success. Concerns arising from
my limited knowledge of antlr include:
"warning:lexical nondeterminism between rules"
copying "unmatched" input unchanged to the output
size
Thanks,
Harald.
<<Less