Tree Construction Reference
Lark builds a tree automatically based on the structure of the grammar, where each rule that is matched becomes a branch (node) in the tree, and its children are its matches, in the order of matching.
For example, the rule
node: child1 child2 will create a tree node with two children. If it is matched as part of another rule (i.e. if it isn’t the root), the new rule’s tree node will become its parent.
item* will result in a list of items, equivalent to writing
item item item ...
item? will return the item if it matched, or nothing.
maybe_placeholders=True (the default), then using
[item] will return the item if it matched, or the value
None, if it didn’t.
 behaves like
Terminals are always values in the tree, never branches.
Lark filters out certain types of terminals by default, considering them punctuation:
Terminals that won’t appear in the tree are:
Unnamed literals (like
Terminals whose name starts with an underscore (like
Terminals that will appear in the tree are:
Unnamed regular expressions (like
Named terminals whose name starts with a letter (like
Note: Terminals composed of literals and other terminals always include the entire match without filtering any part.
start: PNAME pname PNAME: "(" NAME ")" pname: "(" NAME ")" NAME: /\w+/ %ignore /\s+/
Lark will parse “(Hello) (World)” as:
start (Hello) pname World
Rules prefixed with
! will retain all their literals regardless.
expr: "(" expr ")" | NAME+ NAME: /\w+/ %ignore " "
Lark will parse “((hello world))” as:
expr expr expr "hello" "world"
The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.
Shaping the tree
Users can alter the automatic construction of the tree using a collection of grammar features.
Rules whose name begins with an underscore will be inlined into their containing rule.
start: "(" _greet ")" _greet: /\w+/ /\w+/
Lark will parse “(hello world)” as:
start "hello" "world"
Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child, after filtering.
start: greet greet ?greet: "(" /\w+/ ")" | /\w+/ /\w+/
Lark will parse “hello world (planet)” as:
start greet "hello" "world" "planet"
Rules that begin with an exclamation mark will keep all their terminals (they won’t get filtered).
!expr: "(" expr ")" | NAME+ NAME: /\w+/ %ignore " "
Will parse “((hello world))” as:
expr ( expr ( expr hello world ) )
! prefix is usually a “code smell”, and may point to a flaw in your grammar design.
Aliases - options in a rule can receive an alias. It will be then used as the branch name for the option, instead of the rule name.
start: greet greet greet: "hello" | "world" -> planet
Lark will parse “hello world” as:
start greet planet