Working with the SPPF¶
When parsing with Earley, Lark provides the ambiguity='forest'
option
to obtain the shared packed parse forest (SPPF) produced by the parser as
an alternative to it being automatically converted to a tree.
Lark provides a few tools to facilitate working with the SPPF. Here are some things to consider when deciding whether or not to use the SPPF.
Pros
- Efficient storage of highly ambiguous parses
- Precise handling of ambiguities
- Custom rule prioritizers
- Ability to handle infinite ambiguities
- Directly transform forest -> object instead of forest -> tree -> object
Cons
- More complex than working with a tree
- SPPF may contain nodes corresponding to rules generated internally
- Loss of Lark grammar features:
- Rules starting with ‘_’ are not inlined in the SPPF
- Rules starting with ‘?’ are never inlined in the SPPF
- All tokens will appear in the SPPF
SymbolNode¶
-
class
lark.parsers.earley_forest.
SymbolNode
(s, start, end)¶ A Symbol Node represents a symbol (or Intermediate LR0).
Symbol nodes are keyed by the symbol (s). For intermediate nodes s will be an LR0, stored as a tuple of (rule, ptr). For completed symbol nodes, s will be a string representing the non-terminal origin (i.e. the left hand side of the rule).
The children of a Symbol or Intermediate Node will always be Packed Nodes; with each Packed Node child representing a single derivation of a production.
Hence a Symbol Node with a single child is unambiguous.
Parameters: - s – A Symbol, or a tuple of (rule, ptr) for an intermediate node.
- start – The index of the start of the substring matched by this symbol (inclusive).
- end – The index of the end of the substring matched by this symbol (exclusive).
- Properties:
- is_intermediate: True if this node is an intermediate node. priority: The priority of the node’s symbol.
-
is_ambiguous
¶ Returns True if this node is ambiguous.
-
children
¶ Returns a list of this node’s children sorted from greatest to least priority.
PackedNode¶
-
class
lark.parsers.earley_forest.
PackedNode
(parent, s, rule, start, left, right)¶ A Packed Node represents a single derivation in a symbol node.
Parameters: - rule – The rule associated with this node.
- parent – The parent of this node.
- left – The left child of this node.
None
if one does not exist. - right – The right child of this node.
None
if one does not exist. - priority – The priority of this node.
-
children
¶ Returns a list of this node’s children.
ForestVisitor¶
-
class
lark.parsers.earley_forest.
ForestVisitor
(single_visit=False)¶ An abstract base class for building forest visitors.
This class performs a controllable depth-first walk of an SPPF. The visitor will not enter cycles and will backtrack if one is encountered. Subclasses are notified of cycles through the
on_cycle
method.Behavior for visit events is defined by overriding the
visit*node*
functions.The walk is controlled by the return values of the
visit*node_in
methods. Returning a node(s) will schedule them to be visited. The visitor will begin to backtrack if no nodes are returned.Parameters: single_visit – If True
, non-Token nodes will only be visited once.-
visit_token_node
(node)¶ Called when a
Token
is visited.Token
nodes are always leaves.
-
visit_symbol_node_in
(node)¶ Called when a symbol node is visited. Nodes that are returned will be scheduled to be visited. If
visit_intermediate_node_in
is not implemented, this function will be called for intermediate nodes as well.
-
visit_symbol_node_out
(node)¶ Called after all nodes returned from a corresponding
visit_symbol_node_in
call have been visited. Ifvisit_intermediate_node_out
is not implemented, this function will be called for intermediate nodes as well.
-
visit_packed_node_in
(node)¶ Called when a packed node is visited. Nodes that are returned will be scheduled to be visited.
-
visit_packed_node_out
(node)¶ Called after all nodes returned from a corresponding
visit_packed_node_in
call have been visited.
-
on_cycle
(node, path)¶ Called when a cycle is encountered.
Parameters: - node – The node that causes a cycle.
- path – The list of nodes being visited: nodes that have been
entered but not exited. The first element is the root in a forest
visit, and the last element is the node visited most recently.
path
should be treated as read-only.
-
get_cycle_in_path
(node, path)¶ A utility function for use in
on_cycle
to obtain a slice ofpath
that only contains the nodes that make up the cycle.
-
ForestTransformer¶
-
class
lark.parsers.earley_forest.
ForestTransformer
¶ The base class for a bottom-up forest transformation. Most users will want to use
TreeForestTransformer
instead as it has a friendlier interface and covers most use cases.Transformations are applied via inheritance and overriding of the
transform*node
methods.transform_token_node
receives aToken
as an argument. All other methods receive the node that is being transformed and a list of the results of the transformations of that node’s children. The return value of these methods are the resulting transformations.If
Discard
is raised in a node’s transformation, no data from that node will be passed to its parent’s transformation.-
transform
(root)¶ Perform a transformation on an SPPF.
-
transform_symbol_node
(node, data)¶ Transform a symbol node.
-
transform_intermediate_node
(node, data)¶ Transform an intermediate node.
-
transform_packed_node
(node, data)¶ Transform a packed node.
-
transform_token_node
(node)¶ Transform a
Token
.
-
TreeForestTransformer¶
-
class
lark.parsers.earley_forest.
TreeForestTransformer
(tree_class=<class 'lark.tree.Tree'>, prioritizer=<lark.parsers.earley_forest.ForestSumVisitor object>, resolve_ambiguity=True, use_cache=False)¶ A
ForestTransformer
with a treeTransformer
-like interface. By default, it will construct a tree.Methods provided via inheritance are called based on the rule/symbol names of nodes in the forest.
Methods that act on rules will receive a list of the results of the transformations of the rule’s children. By default, trees and tokens.
Methods that act on tokens will receive a token.
Alternatively, methods that act on rules may be annotated with
handles_ambiguity
. In this case, the function will receive a list of all the transformations of all the derivations of the rule. By default, a list of trees where each tree.data is equal to the rule name or one of its aliases.Non-tree transformations are made possible by override of
__default__
,__default_token__
, and__default_ambig__
.Note
Tree shaping features such as inlined rules and token filtering are not built into the transformation. Positions are also not propagated.
Parameters: - tree_class – The tree class to use for construction
- prioritizer – A
ForestVisitor
that manipulates the priorities of nodes in the SPPF. - resolve_ambiguity – If True, ambiguities will be resolved based on priorities.
- use_cache (bool) – If True, caches the results of some transformations,
potentially improving performance when
resolve_ambiguity==False
. Only use if you know what you are doing: i.e. All transformation functions are pure and referentially transparent.
-
__default__
(name, data)¶ Default operation on tree (for override).
Returns a tree with name with data as children.
-
__default_ambig__
(name, data)¶ Default operation on ambiguous rule (for override).
Wraps data in an ‘_ambig_’ node if it contains more than one element.
-
__default_token__
(node)¶ Default operation on
Token
(for override).Returns
node
.