Rascal/Libraries/Prelude/ParseTree

Synopsis Library functions for parse trees.

Usage import ParseTree;

Description A concrete syntax tree or parse tree

is an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In Rascal parse trees, the interior nodes are labeled by rules of the grammar, while the leaf nodes are labeled by terminals (characters) of the grammar.

Tree is the universal parse tree data type in Rascal and can be used to represent parse trees for any language.

Tree is a subtype of the type Node.
All SyntaxDefinition types (non-terminals) are sub-types of Tree
All ConcreteSyntax expressions produce parse trees the types of which are non-terminals
Trees can be annotated in various ways, see IDEConstruction features. Most importantly the \loc annotation always points to the source location of any (sub) parse tree.

Parse trees are usually analyzed and constructed using ConcreteSyntax expressions and patterns.

Advanced users may want to create tools that analyze any parse tree, regardless of the SyntaxDefinition that generated it, you can manipulate them on the abstract level.

In Tree is the full definition of Tree, Production and Symbol. A parse tree is a nested tree structure of type Tree.

Most internal nodes are applications (appl) of a Production to a list of children Tree nodes. Production is the abstract representation of a SyntaxDefinition rule, which consists of a definition of an alternative for a Symbol by a list of Symbols.
The leaves of a parse tree are always characters (char), which have an integer index in the UTF8 table.
Some internal nodes encode ambiguity (amb) by pointing to a set of alternative Tree nodes.

The Production and Symbol types are an abstract notation for rules in SyntaxDefinitions, while the Tree type is the actual notation for parse trees.

Parse trees are called parse forests when they contain amb nodes.

You can analyze and manipulate parse trees in three ways:

Directly on the Tree level, just like any other AlgebraicDataType
Using ConcreteSyntax
Using Actions

The type of a parse tree is the symbol that it's production produces, i.e. appl(prod(sort("A"),[],{}),[]) has type A. Ambiguity nodes Each such a non-terminal type has Tree as its immediate super-type.

The ParseTree library provides:

associativity: Choice under associativity is flattened.
Condition: constructors for declaring preconditions and postconditions on symbols
doc: Annotate a parse tree node with a documentation string.
docs: Annotate a parse tree node with documentation strings for several locations.
implode: Implode a parse tree according to a given (ADT) type.
isNonTerminalType:
link: Annotate a parse tree node with the target of a reference.
links: Annotate a parse tree node with multiple targets for a reference.
loc: Annotate a parse tree node with a source location.
message: Annotate a parse tree node with an (error) message.
messages: Annotate a parse tree node with a list of (error) messages.
parse: Parse input text (from a string or a location) and return a parse tree.
priority: Nested priority is flattened.
Production:
saveParser: Save the current object parser to a file.
Symbol:
Tree: The Tree data type as produced by the parser.
treeAt: Select the innermost Tree of a given type which is enclosed by a given location.
TreeSearchResult: Tree search result type for treeAt.
unparse: Yield the string of characters that form the leafs of the given parse tree.

Pitfalls For historical reasons the name of the annotation is "loc" and this interferes with the Rascal keyword loc for the type of Locations. Therefore the annotation name has to be escaped as \loc when it is declared or used.

[Edit] | [New Subconcept] | [Recompile Course] | [Warnings]

Is this page unclear, or have you spotted an error? Please add a comment below and help us to improve it. For all other questions and remarks, visit ask.rascal-mpl.org.