Working with the SPPF

When parsing with Earley, Lark provides the ambiguity='forest' option to obtain the shared packed parse forest (SPPF) produced by the parser as an alternative to it being automatically converted to a tree.

Lark provides a few tools to facilitate working with the SPPF. Here are some things to consider when deciding whether or not to use the SPPF.

Pros

  • Efficient storage of highly ambiguous parses
  • Precise handling of ambiguities
  • Custom rule prioritizers
  • Ability to handle infinite ambiguities
  • Directly transform forest -> object instead of forest -> tree -> object

Cons

  • More complex than working with a tree
  • SPPF may contain nodes corresponding to rules generated internally
  • Loss of Lark grammar features:
    • Rules starting with ‘_’ are not inlined in the SPPF
    • Rules starting with ‘?’ are never inlined in the SPPF
    • All tokens will appear in the SPPF

SymbolNode

class lark.parsers.earley_forest.SymbolNode(s, start, end)

A Symbol Node represents a symbol (or Intermediate LR0).

Symbol nodes are keyed by the symbol (s). For intermediate nodes s will be an LR0, stored as a tuple of (rule, ptr). For completed symbol nodes, s will be a string representing the non-terminal origin (i.e. the left hand side of the rule).

The children of a Symbol or Intermediate Node will always be Packed Nodes; with each Packed Node child representing a single derivation of a production.

Hence a Symbol Node with a single child is unambiguous.

Variables:
  • s – A Symbol, or a tuple of (rule, ptr) for an intermediate node.
  • start – The index of the start of the substring matched by this symbol (inclusive).
  • end – The index of the end of the substring matched by this symbol (exclusive).
  • is_intermediate – True if this node is an intermediate node.
  • priority – The priority of the node’s symbol.
is_ambiguous

Returns True if this node is ambiguous.

children

Returns a list of this node’s children sorted from greatest to least priority.

PackedNode

class lark.parsers.earley_forest.PackedNode(parent, s, rule, start, left, right)

A Packed Node represents a single derivation in a symbol node.

Variables:
  • rule – The rule associated with this node.
  • parent – The parent of this node.
  • left – The left child of this node. None if one does not exist.
  • right – The right child of this node. None if one does not exist.
  • priority – The priority of this node.
children

Returns a list of this node’s children.

ForestVisitor

class lark.parsers.earley_forest.ForestVisitor

An abstract base class for building forest visitors.

This class performs a controllable depth-first walk of an SPPF. The visitor will not enter cycles and will backtrack if one is encountered. Subclasses are notified of cycles through the on_cycle method.

Behavior for visit events is defined by overriding the visit*node* functions.

The walk is controlled by the return values of the visit*node_in methods. Returning a node(s) will schedule them to be visited. The visitor will begin to backtrack if no nodes are returned.

visit_token_node(node)

Called when a Token is visited. Token nodes are always leaves.

visit_symbol_node_in(node)

Called when a symbol node is visited. Nodes that are returned will be scheduled to be visited. If visit_intermediate_node_in is not implemented, this function will be called for intermediate nodes as well.

visit_symbol_node_out(node)

Called after all nodes returned from a corresponding visit_symbol_node_in call have been visited. If visit_intermediate_node_out is not implemented, this function will be called for intermediate nodes as well.

visit_packed_node_in(node)

Called when a packed node is visited. Nodes that are returned will be scheduled to be visited.

visit_packed_node_out(node)

Called after all nodes returned from a corresponding visit_packed_node_in call have been visited.

on_cycle(node, path)

Called when a cycle is encountered.

Parameters:
  • node – The node that causes a cycle.
  • path – The list of nodes being visited: nodes that have been entered but not exited. The first element is the root in a forest visit, and the last element is the node visited most recently. path should be treated as read-only.
get_cycle_in_path(node, path)

A utility function for use in on_cycle to obtain a slice of path that only contains the nodes that make up the cycle.

ForestTransformer

class lark.parsers.earley_forest.ForestTransformer

The base class for a bottom-up forest transformation. Most users will want to use TreeForestTransformer instead as it has a friendlier interface and covers most use cases.

Transformations are applied via inheritance and overriding of the transform*node methods.

transform_token_node receives a Token as an argument. All other methods receive the node that is being transformed and a list of the results of the transformations of that node’s children. The return value of these methods are the resulting transformations.

If Discard is raised in a node’s transformation, no data from that node will be passed to its parent’s transformation.

transform(root)

Perform a transformation on an SPPF.

transform_symbol_node(node, data)

Transform a symbol node.

transform_intermediate_node(node, data)

Transform an intermediate node.

transform_packed_node(node, data)

Transform a packed node.

transform_token_node(node)

Transform a Token.

TreeForestTransformer

class lark.parsers.earley_forest.TreeForestTransformer(tree_class=<class 'lark.tree.Tree'>, prioritizer=<lark.parsers.earley_forest.ForestSumVisitor object>, resolve_ambiguity=True)

A ForestTransformer with a tree Transformer-like interface. By default, it will construct a tree.

Methods provided via inheritance are called based on the rule/symbol names of nodes in the forest.

Methods that act on rules will receive a list of the results of the transformations of the rule’s children. By default, trees and tokens.

Methods that act on tokens will receive a token.

Alternatively, methods that act on rules may be annotated with handles_ambiguity. In this case, the function will receive a list of all the transformations of all the derivations of the rule. By default, a list of trees where each tree.data is equal to the rule name or one of its aliases.

Non-tree transformations are made possible by override of __default__, __default_token__, and __default_ambig__.

Note

Tree shaping features such as inlined rules and token filtering are not built into the transformation. Positions are also not propagated.

Parameters:
  • tree_class – The tree class to use for construction
  • prioritizer – A ForestVisitor that manipulates the priorities of nodes in the SPPF.
  • resolve_ambiguity – If True, ambiguities will be resolved based on priorities.
__default__(name, data)

Default operation on tree (for override).

Returns a tree with name with data as children.

__default_ambig__(name, data)

Default operation on ambiguous rule (for override).

Wraps data in an ‘_ambig_’ node if it contains more than one element.

__default_token__(node)

Default operation on Token (for override).

Returns node.

handles_ambiguity

lark.parsers.earley_forest.handles_ambiguity(func)

Decorator for methods of subclasses of TreeForestTransformer. Denotes that the method should receive a list of transformed derivations.