Working with the SPPF¶
When parsing with Earley, Lark provides the ambiguity='forest'
option
to obtain the shared packed parse forest (SPPF) produced by the parser as
an alternative to it being automatically converted to a tree.
Lark provides a few tools to facilitate working with the SPPF. Here are some things to consider when deciding whether or not to use the SPPF.
Pros
Efficient storage of highly ambiguous parses
Precise handling of ambiguities
Custom rule prioritizers
Ability to handle infinite ambiguities
Directly transform forest -> object instead of forest -> tree -> object
Cons
More complex than working with a tree
SPPF may contain nodes corresponding to rules generated internally
Loss of Lark grammar features:
Rules starting with ‘_’ are not inlined in the SPPF
Rules starting with ‘?’ are never inlined in the SPPF
All tokens will appear in the SPPF
SymbolNode¶
- class lark.parsers.earley_forest.SymbolNode(s, start, end)¶
A Symbol Node represents a symbol (or Intermediate LR0).
Symbol nodes are keyed by the symbol (s). For intermediate nodes s will be an LR0, stored as a tuple of (rule, ptr). For completed symbol nodes, s will be a string representing the non-terminal origin (i.e. the left hand side of the rule).
The children of a Symbol or Intermediate Node will always be Packed Nodes; with each Packed Node child representing a single derivation of a production.
Hence a Symbol Node with a single child is unambiguous.
- Parameters
s – A Symbol, or a tuple of (rule, ptr) for an intermediate node.
start – The index of the start of the substring matched by this symbol (inclusive).
end – The index of the end of the substring matched by this symbol (exclusive).
- Properties:
is_intermediate: True if this node is an intermediate node. priority: The priority of the node’s symbol.
- property is_ambiguous¶
Returns True if this node is ambiguous.
- property children¶
Returns a list of this node’s children sorted from greatest to least priority.
PackedNode¶
- class lark.parsers.earley_forest.PackedNode(parent, s, rule, start, left, right)¶
A Packed Node represents a single derivation in a symbol node.
- Parameters
rule – The rule associated with this node.
parent – The parent of this node.
left – The left child of this node.
None
if one does not exist.right – The right child of this node.
None
if one does not exist.priority – The priority of this node.
- property children¶
Returns a list of this node’s children.
ForestVisitor¶
- class lark.parsers.earley_forest.ForestVisitor(single_visit=False)¶
An abstract base class for building forest visitors.
This class performs a controllable depth-first walk of an SPPF. The visitor will not enter cycles and will backtrack if one is encountered. Subclasses are notified of cycles through the
on_cycle
method.Behavior for visit events is defined by overriding the
visit*node*
functions.The walk is controlled by the return values of the
visit*node_in
methods. Returning a node(s) will schedule them to be visited. The visitor will begin to backtrack if no nodes are returned.- Parameters
single_visit – If
True
, non-Token nodes will only be visited once.
- visit_token_node(node)¶
Called when a
Token
is visited.Token
nodes are always leaves.
- visit_symbol_node_in(node)¶
Called when a symbol node is visited. Nodes that are returned will be scheduled to be visited. If
visit_intermediate_node_in
is not implemented, this function will be called for intermediate nodes as well.
- visit_symbol_node_out(node)¶
Called after all nodes returned from a corresponding
visit_symbol_node_in
call have been visited. Ifvisit_intermediate_node_out
is not implemented, this function will be called for intermediate nodes as well.
- visit_packed_node_in(node)¶
Called when a packed node is visited. Nodes that are returned will be scheduled to be visited.
- visit_packed_node_out(node)¶
Called after all nodes returned from a corresponding
visit_packed_node_in
call have been visited.
- on_cycle(node, path)¶
Called when a cycle is encountered.
- Parameters
node – The node that causes a cycle.
path – The list of nodes being visited: nodes that have been entered but not exited. The first element is the root in a forest visit, and the last element is the node visited most recently.
path
should be treated as read-only.
- get_cycle_in_path(node, path)¶
A utility function for use in
on_cycle
to obtain a slice ofpath
that only contains the nodes that make up the cycle.
ForestTransformer¶
- class lark.parsers.earley_forest.ForestTransformer¶
The base class for a bottom-up forest transformation. Most users will want to use
TreeForestTransformer
instead as it has a friendlier interface and covers most use cases.Transformations are applied via inheritance and overriding of the
transform*node
methods.transform_token_node
receives aToken
as an argument. All other methods receive the node that is being transformed and a list of the results of the transformations of that node’s children. The return value of these methods are the resulting transformations.If
Discard
is raised in a node’s transformation, no data from that node will be passed to its parent’s transformation.- transform(root)¶
Perform a transformation on an SPPF.
- transform_symbol_node(node, data)¶
Transform a symbol node.
- transform_intermediate_node(node, data)¶
Transform an intermediate node.
- transform_packed_node(node, data)¶
Transform a packed node.
- transform_token_node(node)¶
Transform a
Token
.
TreeForestTransformer¶
- class lark.parsers.earley_forest.TreeForestTransformer(tree_class=<class 'lark.tree.Tree'>, prioritizer=<lark.parsers.earley_forest.ForestSumVisitor object>, resolve_ambiguity=True, use_cache=False)¶
A
ForestTransformer
with a treeTransformer
-like interface. By default, it will construct a tree.Methods provided via inheritance are called based on the rule/symbol names of nodes in the forest.
Methods that act on rules will receive a list of the results of the transformations of the rule’s children. By default, trees and tokens.
Methods that act on tokens will receive a token.
Alternatively, methods that act on rules may be annotated with
handles_ambiguity
. In this case, the function will receive a list of all the transformations of all the derivations of the rule. By default, a list of trees where each tree.data is equal to the rule name or one of its aliases.Non-tree transformations are made possible by override of
__default__
,__default_token__
, and__default_ambig__
.Note
Tree shaping features such as inlined rules and token filtering are not built into the transformation. Positions are also not propagated.
- Parameters
tree_class – The tree class to use for construction
prioritizer – A
ForestVisitor
that manipulates the priorities of nodes in the SPPF.resolve_ambiguity – If True, ambiguities will be resolved based on priorities.
use_cache (bool) – If True, caches the results of some transformations, potentially improving performance when
resolve_ambiguity==False
. Only use if you know what you are doing: i.e. All transformation functions are pure and referentially transparent.
- __default__(name, data)¶
Default operation on tree (for override).
Returns a tree with name with data as children.
- __default_ambig__(name, data)¶
Default operation on ambiguous rule (for override).
Wraps data in an ‘_ambig_’ node if it contains more than one element.
- __default_token__(node)¶
Default operation on
Token
(for override).Returns
node
.
handles_ambiguity¶
- lark.parsers.earley_forest.handles_ambiguity(func)¶
Decorator for methods of subclasses of
TreeForestTransformer
. Denotes that the method should receive a list of transformed derivations.