Unclear on how the tool outputs lexical/syntactical errors

Jun 30, 2012 at 2:47 PM
Edited Jun 30, 2012 at 2:55 PM

I have noticed that the lexer replaces unexpected terminals with white space as it reads through tokens in the input stream. However, I haven't found a way to get the lexer to actually output the errors. Utilizing the Hime.Redist 0.4.0, I've tried using ParserError, but there's some inconsistency with the constructor - it won't let me initialize an instance of it because it claims there's no constructor that accepts 2 arguments (when it probably should!). If it's alright, I'd like an explanation on how the tool actually manages errors, because I don't quite understand it yet. I do know that it does handle errors, which is great, though!

I would prefer it, however, if it were simply just lexical errors for now, though.

Coordinator
Jun 30, 2012 at 3:23 PM

Hello,

Thank you for your input. First, if you used the himecc tool without any parameter, the generated parser uses RNGLR. Because the RNGLR implementation is still experimental in this version, you can still force the use of the LALR(1) method by passing the –m LALR1 option.

This is important because the behavior of the generated parser regarding errors depends on the used parsing method (RNGLR or LALR). The behavior of the lexer however does not change and is as follow:

The lexer scans your input and tries to match the token you specified in your grammar. When at one point the lexer cannot match any token it drops the problematic character, sends an error to the parser, and continue with the next character in the input. All parsers have an Errors property, which is the list of both the encountered lexer and parser errors. The the lexer never replaces characters in the input.

Now, the lexer is quite “dumb” in that is does not know the context in which it is matching the token (this is normal). It happens that due some errors in the grammar or in the input the lexer passes a token that is not expected by the parser. The first thing a parser does is to report it so that it is available in its Errors property. In addition, the LALR parsers try to recover the error with the following strategy. A LALR parser first tries to ignore the unexpected token and continue as if it wasn’t there. If this works, then the parser continues. If it fails, the parser then tries to insert one of expected token (which will be empty because it wasn’t matched in the input by the lexer) and then continue with the unexpected token. If this works, then the parser continues. If it fails, the parser finally tries to insert one expected token and then continue without the unexpected token. If even that fails, then the parser halts and throws an exception.

Conversely, the RNGLR parser does not have any error recovery and when it cannot continue anymore it just halts and throws an exception. The reason the RNGLR parsers do not have an error recovery procedure is that it is much more complicated J.

So, to summarize and respond to your issues:

-          The Errors property of the parser contains all the lexer and parser errors

-          Only LALR parser generated with the –m LALR1 and –m LR1 might insert or replace tokens, in which case you should check the Errors property

-          The problem with the constructor does not ring a bell, can you be more specific (which option did you passed to the himecc tool?)

I hope it helps!

Laurent

Jun 30, 2012 at 3:32 PM
Edited Jun 30, 2012 at 3:35 PM

Hello, thanks again for the explanation.

Though, really, my main issue lies in the fact I'm trying to integrate the generated xxxLexer file into my own program which takes an input stream and reads tokens from it. I've gotten the lexer to read inputs properly but I cannot get it to output the error messages.

As you mentioned and I have noticed, utilizing the LALR parser (I guess lexer because that's the only thing I've included thus far in my program) does ignore/drop the unexpected token. However, I really want the fact that there was an error at all to be output into Console (so I can mention that the erroneous token was dropped/ignored).

As for the constructor, I have been trying to instantiate it using

ParserError error = new ParserError(lexer.CurrentLine,lexer.CurrentColumn)

But it does give me that error in that there's no constructor that takes 2 arguments. However, I also noticed that the access level of the instantiation method wouldn't allow that to happen in the first place, which is why I'm confused.

Coordinator
Jun 30, 2012 at 3:52 PM

OK, I think I see your point. The lexer is not intended to be used separately but you still can as follow:

public class Program
{        
        static void Main()
        {
            (new Program()).Execute();
        }

        private void Execute()
        {
            Test_Lexer lexer = new Test_Lexer("myinput");
            lexer.OnError = new Hime.Redist.Parsers.OnErrorHandler(OnLexicalError);
            lexer.GetNextToken();
        }

        private void OnLexicalError(Hime.Redist.Parsers.ParserError lexicalError)
        {
            System.Console.WriteLine(lexicalError.Message);
        }
}

However I fail to see why you need to instantiate the ParserError class. Instances of this class are created by the lexers and parser. And indeed you are correct in that its constructor is not accessible outside the library (which is a design decision).

Jul 1, 2012 at 12:05 AM

Derp.

Okay, thank you a lot. I was just really confused as to how to actually use the error handler properly. I figured that the lexer(s) and the parser would automatically create it, but I wasn't sure how to actually /call/ it. Whoops! Thanks so much!