class=”markdown_views prism-atom-one-dark”>
The Antlr version used in this article is 4.7.2
1 lexical rules are ambiguous
1.1 Problem Background
The demo comes from the first section of Chapter 7, an example of a property file. I modified the original grammar as follows.
grammar Demo;
file : prop+;
prop : ID '=' STRING;
ID : [a-z]+;
STRING : [a-z0-9]+;
Validation of input text in the Grammar Console resulted in an error.
Enter:
abc=abc
Error message:
line 1:4 missing STRING at 'abc'
line 1:7 mismatched input '' expecting '='
The syntax analysis tree at this time is as follows:
1.2 Analysis
After entering the character stream abc=abc, due to long-term lack of use of relevant tools and theoretical review. According to the Chinese grammar of the grammar file, it is wrong to think that the generated word stream is id,=,string. and match in that order. I thought it met ID, matched to abc, = matched to the character =, and finally encountered STRING to match the lexical definition of STRING.
After flipping through the book, I thought about it, this must be a problem of data word recognition error. Recall that the whole process is symbol flow -> word flow -> syntax tree. After generating the code through the generator, print it to see what type of word stream is.
public class AntlrTest {
public static void main(String[] args ) throws Exception{
String input = "abc=abc";
CharStream charStream = CharStreams.fromString(input);
DemoLexer demoLexer = new DemoLexer(charStream);
System.out.println(demoLexer.getAllTokens());
}
}
The output is as follows
[[@-1,0:2= 'abc',<2>,1: 0], [@-1,3:3 ='=',1,1:3], [@- 1,4:6='abc' ,2,1:4 ]]
Angle brackets indicate the type of word. According to the source code in the lexical analyzer, the types can be seen as follows.
public static final int
T__0=1, ID=2, STRING=3;
Therefore, abc after the equal sign is recognized as an ID. Therefore, STRING cannot be found when encountering STRING during grammatical analysis based on the grammar. Therefore, under the grammatical description in the previous article, the STRING input can only be a number to avoid this problem.
1.3 Summary
- The overall work process should be clear
- The word type should be clear
2 Calling Lexer’s getAllTokens results in
2.1 Question Background
Exercise the CSV example in Chapter 8, after simplifying the grammar, as follows.
grammar Demo;
csv : WORD;
WORD : [0-9a-z]+;
WS : [ \t\n]+ - skip ;
Test code
public class AntlrTest {
public static void main(String[] args ) throws Exception{
String input = "1";
CharStream charStream = CharStreams.fromString(input);
DemoLexer demoLexer = new DemoLexer(charStream);
System.out.println(demoLexer.getAllTokens());
CommonTokenStream commonTokenStream = new CommonTokenStream(demoLexer);
DemoParser demoParser = new DemoParser(commonTokenStream);
demoParser.csv();
}
}
Error output
line 1:1 missing WORD at ''
2.2 Analysis
It was very strange to have this problem at first. It returns to normal after removing the code demoLexer.getAllTokens(). Then check the source code of getAllTokens. getAllTokens is to call the nextToken method of lexer to get the next lexical symbol
public List? extends Token getAllTokens() {
ListToken tokens = new ArrayListToken> ();
Token t = nextToken();
while ( t.getType()!= Token.EOF ) {
tokens.add(t);
t = nextToken();
}
return tokens;
}
Inside nextToken, EOF will be returned when _hitEOF is true. After calling getAllTokens, the _hitEOF flag will be set to true,
if (_hitEOF) {
emitEOF();
return _token;
}
Call demoParser.csv(); when generating a syntax tree, words will also be obtained from the token stream, which isWhen getAllTokens is called, you will get EOF when you get it again. The match method calls getCurrentToken to get EOF. Therefore, a subsequent missing WORD at ” exception is generated. Finally remove the getAllTokens call back to normal.
public Token match(int ttype) throws RecognitionException {
Token t = getCurrentToken();
if ( t.getType()== ttype ) {
if ( ttype==Token.EOF ) {
matchedEOF = true;
}
_errHandler.reportMatch(this);
consume();
}
else {
t = _errHandler.recoverInline(this);
if ( _buildParseTrees && t.getTokenIndex() ==-1 ) {
// we must have conjured up a new token during single token insertion
// if it's not the current symbol
_ctx.addErrorNode(createErrorNode(_ctx,t));
}
}
return t;
}
"main@1" prio=5 tid=0x1 nid=NA runnable
java.lang.Thread.State: RUNNABLE
at org.antlr.v4.runtime.Parser.match( Parser.java:198)
at antcode.DemoParser.csv(DemoParser.java:119)
at complie.AntlrTest.main(AntlrTest.java:20)
"Finalizer@759" daemon prio=8 tid=0x3 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java: -1)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue. java:143)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue. java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler@760" daemon prio=10 tid =0x2 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java: -1)
at java.lang.Object.wait(Object.java: 502)
at java.lang.ref.Reference.tryHandlePending(Reference. java:191)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
"Signal Dispatcher@758" daemon prio=9 tid =0x4 nid=NA runnable
java.lang.Thread.State: RUNNABLE
2.3 Summary
Avoid calling the lexer’s getAllTokens() method before getting parseTree, the method will encounter the problem that EOF cannot get TOKEN
or”>:119)
at complie.AntlrTest.main(AntlrTest.java:20)
“Finalizer@759” daemon prio=8 tid=0x3 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java: –1)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue. java:143)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue. java:164)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
“Reference Handler@760” daemon prio=10 tid =0x2 nid=NA waiting
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Object.java: –1)
at java.lang.Object.wait(Object.java: 502)
at java.lang.ref.Reference.tryHandlePending(Reference. java:191)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
“Signal Dispatcher@758” daemon prio=9 tid =0x4 nid=NA runnable
java.lang.Thread.State: RUNNABLE
2.3 Summary
Avoid calling the lexer’s getAllTokens() method before getting parseTree, the method will encounter the problem that EOF cannot get TOKEN
class=”token number”>0x4 nid=NA runnable
java.lang.Thread.State: RUNNABLE
2.3 Summary
Avoid calling the lexer’s getAllTokens() method before getting parseTree, the method will encounter the problem that EOF cannot get TOKEN