Parsers are innately complicated and confusing. They're difficult to understand, difficult to write, and difficult to use. Even experts on the subject can become baffled by the nuances of these complicated state-machines.
Lark's mission is to make the process of writing them as simple and abstract as possible. by the following design principles:
Keep the grammar clean and simple
Don't force the user to decide on things that the parser can figure out on its own
Usability is more important than performance
Performance is still very important
Follow the Zen Of Python, whenever possible and applicable
In accordance with these principles, I arrived at the following design choices:
1. Separation of code and grammar
Grammars are the de-facto reference for your language, and for the structure of your parse-tree. For any non-trivial language, the conflation of code and grammar always turns out convoluted and difficult to read.
The grammars in Lark are EBNF-inspired, so they are especially easy to read & work with.
2. Always build a parse-tree (unless told not to)
Trees are always simpler to work with than state-machines.
Trees allow you to see the "state-machine" visually
Trees allow your computation to be aware of previous and future states
Trees allow you to process the parse in steps, instead of forcing you to do it all at once.
And anyway, every parse-tree can be replayed as a state-machine, so there is no loss of information.
See this answer in more detail here.
To improve performance, you can skip building the tree for LALR(1), by providing Lark with a transformer (see the JSON example).
3. Earley is the default
The Earley algorithm can accept any context-free grammar you throw at it (i.e. any grammar you can write in EBNF, it can parse). That makes it extremely useful for beginners, who are not aware of the strange and arbitrary restrictions that LALR(1) places on its grammars.
As the users grow to understand the structure of their grammar, the scope of their target language and their performance requirements, they may choose to switch over to LALR(1) to gain a huge performance boost, possibly at the cost of some language features.
In short, "Premature optimization is the root of all evil."
Other design features
Automatically resolve terminal collisions whenever possible
Automatically keep track of line & column numbers