1024programmer Blog Antlr missing XXX at and mismatched input ” analysis of several error conditions_mismatched input expecting antlr_FserSuN’s blog

Antlr missing XXX at and mismatched input ” analysis of several error conditions_mismatched input expecting antlr_FserSuN’s blog

class=”markdown_views prism-atom-one-dark”>

The Antlr version used in this article is 4.7.2

1 lexical rules are ambiguous

1.1 Problem Background

The demo comes from the first section of Chapter 7, an example of a property file. I modified the original grammar as follows.

grammar Demo;

 file : prop+;
 prop : ID '=' STRING;

 ID : [a-z]+;
 STRING : [a-z0-9]+;
 

Validation of input text in the Grammar Console resulted in an error.

Enter:

abc=abc
 

Error message:

line 1:4 missing STRING at 'abc'
 line 1:7 mismatched input '' expecting '='
 

The syntax analysis tree at this time is as follows:
Insert picture description here

1.2 Analysis

After entering the character stream abc=abc, due to long-term lack of use of relevant tools and theoretical review. According to the Chinese grammar of the grammar file, it is wrong to think that the generated word stream is id,=,string. and match in that order. I thought it met ID, matched to abc, = matched to the character =, and finally encountered STRING to match the lexical definition of STRING.

After flipping through the book, I thought about it, this must be a problem of data word recognition error. Recall that the whole process is symbol flow -> word flow -> syntax tree. After generating the code through the generator, print it to see what type of word stream is.

public class AntlrTest {
     
     public static void main(String[] args  ) throws Exception{
     
         String input = "abc=abc";
         CharStream charStream = CharStreams.fromString(input);
         DemoLexer demoLexer = new DemoLexer(charStream);
         System.out.println(demoLexer.getAllTokens());
     }
 }
 

The output is as follows

[[@-1,0:2=  'abc',<2>,1:  0], [@-1,3:3  ='=',1,1:3], [@-  1,4:6='abc'  ,2,1:4  ]]
 

Angle brackets indicate the type of word. According to the source code in the lexical analyzer, the types can be seen as follows.

 public static final int
 T__0=1, ID=2, STRING=3;
 

Therefore, abc after the equal sign is recognized as an ID. Therefore, STRING cannot be found when encountering STRING during grammatical analysis based on the grammar. Therefore, under the grammatical description in the previous article, the STRING input can only be a number to avoid this problem.

1.3 Summary

  • The overall work process should be clear
  • The word type should be clear

2 Calling Lexer’s getAllTokens results in

2.1 Question Background

Exercise the CSV example in Chapter 8, after simplifying the grammar, as follows.

grammar Demo;
 csv : WORD;
 WORD : [0-9a-z]+;
 WS : [ \t\n]+ - skip ;  
 

Test code

public class AntlrTest {
     
     public static void main(String[] args  ) throws Exception{
     
         String input = "1";
         CharStream charStream = CharStreams.fromString(input);
         DemoLexer demoLexer = new DemoLexer(charStream);
         System.out.println(demoLexer.getAllTokens());
         CommonTokenStream commonTokenStream = new CommonTokenStream(demoLexer);
         DemoParser demoParser = new DemoParser(commonTokenStream);
         demoParser.csv();
     }
 }
 

Error output

line 1:1 missing WORD at ''
 

2.2 Analysis

It was very strange to have this problem at first. It returns to normal after removing the code demoLexer.getAllTokens(). Then check the source code of getAllTokens. getAllTokens is to call the nextToken method of lexer to get the next lexical symbol

public List? extends Token getAllTokens() {
     
 ListToken tokens = new ArrayListToken>  ();
 Token t = nextToken();
 while ( t.getType()!=  Token.EOF ) {
     
 tokens.add(t);
 t = nextToken();
 }
 return tokens;
 }
 

Inside nextToken, EOF will be returned when _hitEOF is true. After calling getAllTokens, the _hitEOF flag will be set to true,

if (_hitEOF) {
     
 emitEOF();
 return _token;
 }
 

Call demoParser.csv(); when generating a syntax tree, words will also be obtained from the token stream, which isWhen getAllTokens is called, you will get EOF when you get it again. The match method calls getCurrentToken to get EOF. Therefore, a subsequent missing WORD at ” exception is generated. Finally remove the getAllTokens call back to normal.

public Token match(int ttype) throws  RecognitionException {
     
 Token t = getCurrentToken();
 if ( t.getType()==  ttype ) {
     
 if ( ttype==Token.EOF ) {
     
 matchedEOF = true;
 }
 _errHandler.reportMatch(this);
 consume();
 }
 else {
     
 t = _errHandler.recoverInline(this);
 if ( _buildParseTrees && t.getTokenIndex()  ==-1 ) {
     
 // we must have conjured up a new token during single token insertion
 // if it's not the current symbol
 _ctx.addErrorNode(createErrorNode(_ctx,t));
 }
 }
 return t;
 }
 	
 "main@1" prio=5 tid=0x1 nid=NA runnable
   java.lang.Thread.State: RUNNABLE
 at org.antlr.v4.runtime.Parser.match(  Parser.java:198)
 at antcode.DemoParser.csv(DemoParser.java:119)
 at complie.AntlrTest.main(AntlrTest.java:20)

 "Finalizer@759" daemon prio=8 tid=0x3 nid=NA waiting
   java.lang.Thread.State: WAITING
 at java.lang.Object.wait(Object.java:  -1)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.  java:143)
 at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.  java:164)
 at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

 "Reference Handler@760" daemon prio=10 tid  =0x2 nid=NA waiting
   java.lang.Thread.State: WAITING
 at java.lang.Object.wait(Object.java:  -1)
 at java.lang.Object.wait(Object.java:  502)
 at java.lang.ref.Reference.tryHandlePending(Reference.  java:191)
 at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

 "Signal Dispatcher@758" daemon prio=9 tid  =0x4 nid=NA runnable
   java.lang.Thread.State: RUNNABLE
 

2.3 Summary

Avoid calling the lexer’s getAllTokens() method before getting parseTree, the method will encounter the problem that EOF cannot get TOKEN

or”>:119)
at complie.AntlrTest.main(AntlrTest.java:20)

“Finalizer@759” daemon prio=8 tid=0x3 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java: 1)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue. java:143)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue. java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)

“Reference Handler@760” daemon prio=10 tid =0x2 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java: 1)
at java.lang.Object.wait(Object.java: 502)
at java.lang.ref.Reference.tryHandlePending(Reference. java:191)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)

“Signal Dispatcher@758” daemon prio=9 tid =0x4 nid=NA runnable
java.lang.Thread.State: RUNNABLE

2.3 Summary

Avoid calling the lexer’s getAllTokens() method before getting parseTree, the method will encounter the problem that EOF cannot get TOKEN

class=”token number”>0x4 nid=NA runnable
java.lang.Thread.State: RUNNABLE

2.3 Summary

Avoid calling the lexer’s getAllTokens() method before getting parseTree, the method will encounter the problem that EOF cannot get TOKEN

This article is from the internet and does not represent1024programmerPosition, please indicate the source when reprinting:https://www.1024programmer.com/antlr-missing-xxx-at-and-mismatched-input-analysis-of-several-error-conditions_mismatched-input-expecting-antlr_fsersuns-blog/

author: admin

Previous article
Next article

Leave a Reply

Your email address will not be published. Required fields are marked *

Contact Us

Contact us

181-3619-1160

Online consultation: QQ交谈

E-mail: [email protected]

Working hours: Monday to Friday, 9:00-17:30, holidays off

Follow wechat
Scan wechat and follow us

Scan wechat and follow us

Follow Weibo
Back to top
首页
微信
电话
搜索