dcsimg
Help with basic lex: SPACE characters getting grouped into text of WORD tokens.
0 posts in topic
Flat View  Flat View
TOPIC ACTIONS:
 

Posted By:   Nathan_Shearer
Posted On:   Wednesday, January 19, 2011 04:02 AM

With these rules, I keep getting the space characters grouped into the text of the WORD tokens. text_line : SPACE* WORD (SPACE | WORD | DIRECTIVE)* ( NEWLINE | EOF ) ; WORD : ~( DIRECTIVE_IND | SPACE | NEWLINE ) TEXT_CHARS* ; DIRECTIVE : DIRECTIVE_IND TEXT_CHARS+ ; SPACE : ( ' ' | ' ' )+ ; NEWLINE : ' '? ' ' ; fragment DIRECTIVE_IND : '@' ; fragment TEXT_CHARS : ( ~( ' ' | ' ' | ' ' | ' ' ) )+ ; This tokenizes "abcd 1234" to: "abcd" (WORD) " 1234" (WORD) Why not ? "abcd" (WORD) " " (SPACE   More>>

With these rules, I keep getting the space characters grouped into the text of the WORD tokens.

			
text_line : SPACE* WORD (SPACE | WORD | DIRECTIVE)* ( NEWLINE | EOF ) ;

WORD : ~( DIRECTIVE_IND | SPACE | NEWLINE ) TEXT_CHARS* ;

DIRECTIVE : DIRECTIVE_IND TEXT_CHARS+ ;

SPACE : ( ' ' | ' ' )+ ;

NEWLINE : '
'? '
' ;

fragment DIRECTIVE_IND : '@' ;

fragment TEXT_CHARS : ( ~( ' ' | ' ' | '
' | '
' ) )+ ;


This tokenizes "abcd 1234" to:

  1. "abcd" (WORD)

  2. " 1234" (WORD)



Why not ?

  1. "abcd" (WORD)

  2. " " (SPACE)

  3. "1234" (WORD)



Any help would be greatly appreciated    <<Less
About | Sitemap | Contact