Rascal/Concepts/SyntaxDefinitionAndParsing

Name Rascal/Concepts/SyntaxDefinitionAndParsing

Synopsis Syntax definition and parser generation for new languages.

Description All source code analysis projects need to extract information directly from the source code. There are two main approaches to this:

Lexical information: Use regular expressions to extract useful, but somewhat superficial, flat, information. This can be achieved using regular expression patterns, see Patterns/Regular.
Structured information: Use syntax analysis to extract the complete, nested, structure of the source code in the form of a syntax tree. Rascal can directly manipulate the parse trees, but it also enables user-defined mappings from parse tree to abstract syntax tree.

Using SyntaxDefinitions you can define the syntax of any (programming) language. Then Rascal:

will generate the parser, and
will provide pattern matching and pattern construction on parse trees and abstract syntax trees, see Patterns/Abstract and Patterns/Concrete.

Examples Let's use the Exp language as example. It contains the following elements:

Integer constants, e.g., 123.
A multiplication operator, e.g., 3*4.
An addition operator, e.g., 3+4.
Multiplication is left-associative and has precedence over addition.
Addition is left-associative.
Parentheses can be used to override the precedence of the operators.

Here are some examples:

123
2+3+4
2+3*4
(2+3)*4

The EXP language can be defined as follows:

module demo::lang::Exp::Concrete::WithLayout::Syntax

layout Whitespace = [\t-\n\r\ ]*; 
    
lexical IntegerLiteral = [0-9]+;           

start syntax Exp 
  = IntegerLiteral          
  | bracket "(" Exp ")"     
  > left Exp "*" Exp        
  > left Exp "+" Exp        
  ;

Now you may parse and manipulate programs in the EXP language. Let's demonstrate parsing an expression:

rascal>import demo::lang::Exp::Concrete::WithLayout::Syntax;
ok
rascal>import ParseTree;
ok
rascal>parse(#start[Exp], "2+3*4");
start(sort("Exp")): `2+3*4`
Tree: appl(prod(start(sort("Exp")),[layouts("Whitespace"),label("top",sort("Exp")),layouts("Whitespace")],{}),[appl(prod(layouts("Whitespace"),[\iter-star(\char-class([range(9,10),range(13,13),range(32,32)]))],{}),[appl(regular(\iter-star(\char-class([range(9,10),range(13,13),range(32,32)]))),[])[@loc=|file://-|(0,0,<1,0>,<1,0>)]])[@loc=|file://-|(0,0,<1,0>,<1,0>)],appl(prod(sort("Exp"),[sort("Exp"),layouts("Whitespace"),lit("+"),layouts("Whitespace"),sort("Exp")],{assoc(left())}),[appl(prod(sort("Exp"),[lex("IntegerLiteral")],{}),[appl(prod(lex("IntegerLiteral"),[iter(\char-class([range(48,57)]))],{}),[appl(regular(iter(\char-class([range(48,57)]))),[char(50)])[@loc=|file://-|(0,1,<1,0>,<1,1>)]])[@loc=|file://-|(0,1,<1,0>,<1,1>)]])[@loc=|file://-|(0,1,<1,0>,<1,1>)],appl(prod(layouts("Whitespace"),[\iter-star(\char-class([range(9,10),range(13,13),range(32,32)]))],{}),[appl(regular(\iter-star(\char-class([range(9,10),range(13,13),range(32,32)]))),[])[@loc=|file://-|(1,0,<1,1>,<1,1>)]])[@loc=|file://-|(1,0,<1,1>,<1,1>)],appl(prod(lit("+"),[\char-class([range(43,43)])],{}),[char(43)]),appl(prod(layouts("Whitespace"),[\iter-star(\char-class([range(9,10),range(13,13),range(32,32)]))],{}),[appl(regular(\iter-star(\char-class([range(9,10),range(13,13),range(32,32)]))),[])[@loc=|file://-|(2,0,<1,2>,<1,2>)]])[@loc=|file://-|(2,0,<1,2>,<1,2>)],appl(prod(sort("Exp"),[sort("Exp"),layouts("Whitespace"),lit("*"),layouts("Whitespace"),sort("Exp")],{assoc(left())}),[appl(prod(sort("Exp"),[lex("IntegerLiteral")],{}),[appl(prod(lex("IntegerLiteral"),[iter(\char-class([range(48,57)]))],{}),[appl(regular(iter(\char-class([range(48,57)]))),[char(51)])[@loc=|file://-|(2,1,<1,2>,<1,3>)]])[@loc=|file://-|(2,1,<1,2>,<1,3>)]])[@loc=|file://-|(2,1,<1,2>,<1,3>)],appl(prod(layouts("Whitespace"),[\iter-star(\char-class([range(9,10),range(13,13),range(32,32)]))],{}),[appl(regular(\iter-star(\char-class([range(9,10),range(13,13),range(32,32)]))),[])[@loc=|file://-|(3,0,<1,3>,<1,3>)]])[@loc=|file://-|(3,0,<1,3>,<1,3>)],appl(prod(lit("*"),[\char-class([range(42,42)])],{}),[char(42)]),appl(prod(layouts("Whitespace"),[\iter-star(\char-class([range(9,10),range(13,13)...

First we import the syntax definition and the ParseTree module that provides the parsing functionality. Finally, we parse 2+3*4 using the start symbol Exp.

Don't be worried, we are just showing the resulting parse tree here. It intended for programs and not for humans. The points we want to make are:

Rascal grammars are relatively easy to read and write (unfortunately, writing grammars will never become simple).
Parser generation is completely implicit.
Given a syntax definition, it can be used immediately for parsing.

See Recipes:Exp for a more extensive presentation of the EXP language and Recipes:Languages for other language examples.

Pitfalls

The SyntaxDefinition feature has recently been designed and implemented and is still going through some growing pains. This includes both implementation and documentation.

[Edit] | [New Subconcept] | [Recompile Course] | [Warnings]

Is this page unclear, or have you spotted an error? Please add a comment below and help us to improve it. For all other questions and remarks, visit ask.rascal-mpl.org.