How to output detected rules on console using the parser?

Jul 20, 2012 at 4:08 AM

Hello again, I am inquiring this time on the mechanics of how the parser actually takes the input and matches the terminals with the syntactic rule. As far as I know, it can output the parsed result as AST, but I want to be able to output the parsed result as if it were a line; for example, if we use the MathExp grammar provided as a sample grammar:

		exp_atom-> NUMBER {OnNumber}
			| '('exp ')' ;
How do we get the parser to output that it detected "exp_atom" in a line as plain text instead of as an AST? Basically, I want to emulate what I did with the lexer except with the parser, but once again I am not so clear on this.

Coordinator
Jul 20, 2012 at 4:41 AM
Edited Jul 20, 2012 at 4:43 AM

Hello,

I dont't have a clear idea of what you want to do. Do you mean in the provided example you want to write the exp_atom name on the console?

One way to achieve this is to get the full AST and walk it to output the nodes' names. You can get the name of the symbol as a string as follow:

SyntaxTreeNode node = ... ;
string name = node.Symbol.Name;

A second way, similar to a SAX parsing of XML is to put an action symbol at the end of the syntactic rules you are interested in:

exp_atom-> NUMBER {OnExpAtom}
    | '('exp ')' {OnExpAtom} ;

You will have to provide an implementation of the OnExpAtom method afterward.
It will be calle by the parser when it reduces the corresponding rule.
It is passed the syntax tree node corresponding to the rule, that it to say the exp_atom node in the example.
You can then output however you please the content.
Note that in this example rule, the OnExpAtom is specified in the two choices of the rule so that the corresponding method will be called in both cases.

Does it answer your question ?
Again, if it does not, could you provide an example of the output you expect in your example?
Best,
Laurent

Jul 20, 2012 at 6:20 AM
lwouters wrote:

Hello,

I dont't have a clear idea of what you want to do. Do you mean in the provided example you want to write the exp_atom name on the console?

One way to achieve this is to get the full AST and walk it to output the nodes' names. You can get the name of the symbol as a string as follow:

SyntaxTreeNode node = ... ;
string name = node.Symbol.Name;

In the provided example, I want to be able to have the parser read a given line and then output that "exp_atom" was detected on the console, for example:

Line 1: "exp_atom" detected
Line 2: "exp_op0" detected
Line 3: "exp_op1" detected
.
.
.
Line x: ...

With the method you provided, this would mean I would have to produce an AST for every single line in a text input and walk the AST to output the nodes' names, right?

Coordinator
Jul 20, 2012 at 7:12 AM

In the released version an AST is always produced (provided the input is valid), so that you can always walk it afterward. However, as mentioned in the previous post, you can modify the example grammar as follow:

exp_atom-> NUMBER {OnRule}
	| '('exp ')' {OnRule} ;

exp_op0	-> exp_atom {OnRule}
	|  exp_op0 '*' exp_atom {OnRule}
	|  exp_op0 '/' exp_atom {OnRule};

exp_op1	-> exp_op0 {OnRule}
	|  exp_op1 '+' exp_op0 {OnRule}
	|  exp_op1 '-' exp_op0 {OnRule};

exp	-> exp_op1 {OnRule} ;
Then, you will have to implement the Actions interface generated within the parser class:

class MyActions : MathExpParser.Actions
{
	public void OnRule(SyntaxTreeNode SubRoot)
	{
		Console.WriteLine(SubRoot.Symbol.Name + " detected");
	}
}

Then, pass an instance of a class implementing this interface to the constructor of the parser. This should output what you expect.

MathExpParser parser = new MathExpParser(lexer, new MyActions());
Laurent

Jul 20, 2012 at 7:21 AM

Thank you for the response! ^^

I understand how it works now.

Jul 21, 2012 at 2:47 PM

Okay, I've correctly passed the instance of the actions to the constructor of the parser but it looks like I might have done something wrong with how I had the parser read the input.

while((input = sr.ReadLine()) != null) {

String line = input;

lexer = new pseudo_cLexer(line);
parser = new pseudo_cParser(lexer,new ParserActions());
            	
//parser.Analyse();
//Am I doing it right here? 
           
}
With the code snippet above, I tried to do it line by line since that's how I did it with the lexer, but I am pretty sure I did it wrong here and I didn't actually get it like I said I did. Whoops. As far as I know, my error lies somewhere with how I pass the input to the parser and how I have the parser read it, as the error I get occurs on runtime.

Coordinator
Jul 21, 2012 at 3:45 PM

You do not need to read your input line by line and passing them to a new instance of a parser. This does not work because you create a new instance of the lexer and parser for each line. This cannot work if you think of grammar rules that may span multiple lines, for example a class definition in C# generally spans multiple lines and is (at the top level) a single rule.

What you should do is modify your grammar as follow:

SEPARATOR -> (WHITE_SPACE | NEWLINE) + ;

In this way, line separators are just normal separators that can happen anywhere in a rule. Then modify your code:

lexer = new pseudo_cLexer(reader);
parser = new pseudo_cParser(lexer,new ParserActions());
parser.Analyse();

Lexers can takes text reader as parameter in their constructor, instead of a simple string. In fact, constructors taking string as a parameter are just here for convienience.

Determining the line number where a rule has been matched is a bit tricky because the rule may span multiple line. What you can do is walk the sub AST given by the callback to find the first token (in the Symbol property of the nodes). Tokens are of class SymbolTextToken, which gives you access to the Line and Column properties containing the info you need.

I hope this helps.

Jul 22, 2012 at 1:24 AM
Edited Jul 22, 2012 at 1:24 AM

http://i.imgur.com/heupJ.png

I have taken the necessary steps both to modify the grammar and to modify the code, but this time I am getting this NullReferenceException. This time I'm really not sure what could be wrong here.

Coordinator
Jul 22, 2012 at 5:59 AM

You sure have an interesting case! From what I see, it comes from the static constructor of the parser class but i cannot tell more without seeing the code. Could you send me your grammar? Also, which command line did you use to generate this parser?

Jul 22, 2012 at 6:07 AM
Edited Jul 22, 2012 at 6:08 AM

Yes, I had a feeling it comes from the static constructor; that is where most issues regarding the error come from, but again, I wasn't sure.

Grammar: http://pastebin.com/YjD1CmjQ <-- This grammar has an error in it, maybe that has something to do with it?
Code (program.cs): http://pastebin.com/x8T61twc
Code Snippet - Actions (Below)

namespace parser
{

	public class ParserActions : pseudo_cParser.Actions
	{
		public void OnRule(SyntaxTreeNode SubRoot)
		{
			
			Console.WriteLine(SubRoot.Symbol.ToString() + " found");
			
		}
	}
}


Command Line - Default - himecc PseudoC.gram

Coordinator
Jul 22, 2012 at 6:37 AM

Yes, there was an issue with the parsing method you are currently using (RNGLR). It has been fixed in the codebase but unfortunately there has been no release since.

To circumvent this issue, you can use the -m LALR1 option in the command line to generate your parser. The downside is that the it is a more restrictive method than the default one. However I tried it with your grammar and it works.

Altenatively, if you absolutely need the RNGLR method because at a later time your grammar is ambiguous, you can either:

  • Use the -m LRStar option to use the LR(*) parsing method, which is more powerful than LALR but still in a prototypical state
  • Checkout the source code of the tool in the last revision of the default branch and build the tool with the bug fixes and use whatever parsing method yout want. There is visual studio solution so its easy.

Well, you just reminded me this has been a long time since the last release so I'll try to build a stable version today.

Jul 22, 2012 at 6:51 AM
Edited Jul 22, 2012 at 12:09 PM

EDIT 2: I tried using the bug-fixed tool and it seemed to work okay, until I tried to run parser.Analyse(); and ran into some issue, probably it has to do with the input text.

I'm not sure that it's actually performing the actions as stated in ParserActions() which is what I passed... Speaking of, I'm still not fully clear on how to actually perform the traversals in the AST. For some reason, during parser.Analyse(), it seems to ignore the OnRule action I defined in the Actions interface class because nothing is being written to the console.

EDIT 3: I think the problem is that parser.Analyse() isn't actually being performed properly...

EDIT 4: Hitting the error again:

Not sure what's wrong here...

            try {
            	lexer = new pseudo_cLexer(reader);
            	Console.WriteLine("----- Lexer Integrity Test -----");
            	Console.WriteLine("Complete");
            	Console.WriteLine("----- Parser -----");
            	parser = new pseudo_cParser(lexer, action);
            	Console.WriteLine("Parser set up complete");
            	root = parser.Analyse();
            	if(root != null)
            		Console.WriteLine("Complete");
            	else
            		Console.WriteLine("Error in parsing");
            }
            catch (Exception ex) {
            	Console.WriteLine(ex.ToString());
            }

 

Coordinator
Jul 24, 2012 at 6:01 AM

Sorry about the delay, Codeplex doesn't send me an email for edits, just for new posts ...

There is nothing wrong with your code. Onthe other hand, I've modified your grammar as follow:

http://pastebin.com/SEUAhE4a

I compiled it with the -m LALR1 option using the version 0.5.0. Then i tested with the input:

using test.hd;
using testx.hd;

It works and I got the output:

I mainly changed the definition of NEWLINE to support Windows, Mac and Linux line ending style that may cause problem, removed '\n' in IMPORT_STMT as it is unneeded.

I also changed the grammar axioms to IMPORTS to show line ending are working fine.

However I fail to see how you got your error. Does the above work for you?

Jul 28, 2012 at 1:08 AM

Sorry for the huge delay, I have been unable to check back due to lots of university work.

The above only works for me until it tries to read anything past IMPORTS.

I assume this is because of the grammar axiom being set to IMPORTS.

However, I tried to make the grammar axiom to be PROGRAM, but it could NOT generate the parser and lexer objects at all... It said there were errors in trying to generate the terminal symbols.