Introduction

What is Sipha?

Sipha is a flexible, incremental parsing library for Rust with support for multiple parsing algorithms. Sipha is designed from the ground up to support incremental parsing—the ability to efficiently re-parse only the changed portions of your code, making it ideal for interactive applications like IDEs, editors, and language servers.

Unlike traditional parsers that re-parse entire files on every change, Sipha can:

Reuse unchanged subtrees from previous parses
Reparse only affected regions instead of the entire file
Maintain parse caches for efficient updates
Dramatically improve performance in interactive editing scenarios

Architecture Overview

graph TB
    Input[Source Text] --> Lexer[Lexer]
    Lexer --> Tokens[Tokens]
    Tokens --> Parser[Parser Backend]
    Parser --> Tree[Syntax Tree]
    Tree --> Incremental[Incremental Parser]
    Incremental --> Cache[Parse Cache]
    Incremental --> Output[Updated Tree]
    
    subgraph Backends
        LL[LL Parser]
        LR[LR Parser]
        GLR[GLR Parser]
    end
    
    Parser -.-> Backends
    
    style Input fill:#e1f5ff,color:#000000
    style Tree fill:#c8e6c9,color:#000000
    style Output fill:#c8e6c9,color:#000000
    style Cache fill:#fff9c4,color:#000000

Project Status

Note

Sipha 0.5.0 provides a stable foundation for incremental parsing. The core API is stable, and we continue to add features and improvements based on user feedback. We welcome contributions!

Key Features

Note

For a detailed feature comparison with other parsing libraries, see the README in the repository.

Sipha provides a comprehensive parsing solution with these key capabilities:

Incremental Parsing (Primary Focus)

Sipha’s standout feature is its incremental parsing capability. When you edit code, Sipha can:

Reuse unchanged subtrees from previous parses
Reparse only affected regions instead of the entire file
Maintain parse caches for efficient updates
Dramatically improve performance in interactive editing scenarios

This makes Sipha perfect for building language servers, IDEs, and other tools that need to parse code as users type.

Learn more in the Incremental Parsing section.

Additional Features

Multiple parsing backends: Choose from LL(k), LR, and GLR (via feature flags) - see Parsing Backends
Immutable syntax trees: Green/red tree representation for efficient manipulation - see Syntax Trees
Error recovery: Configurable error recovery strategies for robust parsing - see Error Handling
Flexible grammar definition: Builder API for defining your grammar - see Grammars
Unicode support: Full Unicode support for identifiers and text (optional)
Rich diagnostics: Beautiful error messages with miette integration (optional)
Tree traversal: Visitor patterns and query APIs for working with syntax trees

Use Cases

Sipha is particularly well-suited for:

Language Servers: Building LSP implementations that need fast, incremental parsing
IDEs and Editors: Providing real-time syntax analysis and error checking
Code Analysis Tools: Tools that need to parse and analyze code efficiently
Interactive Development Tools: Any tool that needs to parse code as users edit it
Complex Language Parsing: Using GLR backend for languages with inherent ambiguities (like C++)

Comparison with Alternatives

Feature	Sipha	pest	nom	lalrpop
Incremental parsing	✅	❌	❌	❌
Multiple backends	✅	❌	❌	❌
Syntax trees	✅	✅	❌	✅
Error recovery	✅	✅	✅	✅
Grammar DSL	❌	✅	❌	✅
Zero-copy	Partial	✅	✅	❌

When to Use Sipha

Use Sipha when:

Building language servers or IDEs
Need incremental parsing for interactive editing
Want flexibility in choosing parsing algorithms
Need rich syntax tree manipulation
Parsing ambiguous grammars (use GLR backend)

When to Consider Alternatives

Consider alternatives when:

Simple one-off parsers (pest or nom might be simpler)
Maximum performance for batch parsing (nom might be faster)
Prefer declarative grammar DSLs (pest or lalrpop)

What’s Next?

Ready to get started? Head to Getting Started to install Sipha and build your first parser!

For a deeper dive into specific topics:

Learn about Core Concepts to understand Sipha’s architecture
Explore Incremental Parsing to understand Sipha’s key feature
Check out Examples to see Sipha in action
Read the Architecture section to understand design decisions

Getting Started

This guide will help you get up and running with Sipha. We’ll cover installation, a quick start example, and an overview of the core concepts.

Installation

Add Sipha to your Cargo.toml:

[dependencies]
sipha = "0.5.0"

Or with specific features:

[dependencies]
sipha = { version = "0.5.0", features = ["diagnostics", "unicode", "backend-ll"] }

Available Features

backend-ll: Enable LL(k) parser backend (default)
backend-lr: Enable LR parser backend
backend-glr: Enable GLR parser backend (requires backend-lr)
diagnostics: Enable rich error diagnostics with miette
unicode: Enable full Unicode support for identifiers
visitor: Enable syntax tree visitor patterns
query: Enable XPath-like query API for syntax trees
tree-utils: Enable tree diffing and validation utilities

Quick Start

Let’s build a simple arithmetic expression parser step by step. This example will help you understand the core concepts.

Step 1: Define Your Syntax Kinds

First, define the tokens and non-terminals your parser will use. Sipha uses a unified SyntaxKind trait for both terminals (tokens) and non-terminals (grammar rules):

use sipha::syntax::SyntaxKind;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    // Terminals (produced by lexer)
    Number,
    Plus,
    Minus,
    Multiply,
    Divide,
    LParen,
    RParen,
    Whitespace,
    Eof,
    // Non-terminals (produced by parser)
    Expr,
    Term,
    Factor,
}

impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool {
        !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor)
    }
    
    fn is_trivia(self) -> bool {
        matches!(self, ArithSyntaxKind::Whitespace)
    }
}

Step 2: Build a Lexer

Create a lexer to tokenize your input text:

use sipha::syntax::SyntaxKind;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    Number, Plus, Minus, Multiply, Divide, LParen, RParen, Whitespace, Eof,
    Expr, Term, Factor,
}
impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor) }
    fn is_trivia(self) -> bool { matches!(self, ArithSyntaxKind::Whitespace) }
}
use sipha::lexer::{LexerBuilder, Pattern, CharSet};

let lexer = LexerBuilder::new()
    .token(ArithSyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1,
        max: None,
    })
    .token(ArithSyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(ArithSyntaxKind::Minus, Pattern::Literal("-".into()))
    .token(ArithSyntaxKind::Multiply, Pattern::Literal("*".into()))
    .token(ArithSyntaxKind::Divide, Pattern::Literal("/".into()))
    .token(ArithSyntaxKind::LParen, Pattern::Literal("(".into()))
    .token(ArithSyntaxKind::RParen, Pattern::Literal(")".into()))
    .token(ArithSyntaxKind::Whitespace, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
        min: 1,
        max: None,
    })
    .trivia(ArithSyntaxKind::Whitespace)
    .build(ArithSyntaxKind::Eof, ArithSyntaxKind::Number)
    .expect("Failed to build lexer");

Step 3: Tokenize Input

use sipha::lexer::{LexerBuilder, Pattern, CharSet};
use sipha::syntax::SyntaxKind;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    Number, Plus, Minus, Multiply, Divide, LParen, RParen, Whitespace, Eof,
    Expr, Term, Factor,
}
impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor) }
    fn is_trivia(self) -> bool { matches!(self, ArithSyntaxKind::Whitespace) }
}
let lexer = LexerBuilder::new()
    .token(ArithSyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1, max: None,
    })
    .token(ArithSyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(ArithSyntaxKind::Minus, Pattern::Literal("-".into()))
    .token(ArithSyntaxKind::Multiply, Pattern::Literal("*".into()))
    .token(ArithSyntaxKind::Divide, Pattern::Literal("/".into()))
    .token(ArithSyntaxKind::LParen, Pattern::Literal("(".into()))
    .token(ArithSyntaxKind::RParen, Pattern::Literal(")".into()))
    .token(ArithSyntaxKind::Whitespace, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
        min: 1, max: None,
    })
    .trivia(ArithSyntaxKind::Whitespace)
    .build(ArithSyntaxKind::Eof, ArithSyntaxKind::Number)
    .expect("Failed to build lexer");
let input = "42 + 10";
let tokens = lexer.tokenize(input)
    .expect("Failed to tokenize input");

Step 4: Define Non-Terminals and Build Grammar

use sipha::syntax::SyntaxKind;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    Number, Plus, Minus, Multiply, Divide, LParen, RParen, Whitespace, Eof,
    Expr, Term, Factor,
}
impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor) }
    fn is_trivia(self) -> bool { matches!(self, ArithSyntaxKind::Whitespace) }
}
use sipha::grammar::{GrammarBuilder, NonTerminal, Expr};
use sipha::lexer::Token as LexerToken;
use sipha::syntax::{TextRange, TextSize};
use std::convert::TryFrom;

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum ArithNonTerminal {
    Expr,
    Term,
    Factor,
}

impl NonTerminal for ArithNonTerminal {
    fn name(&self) -> &str {
        match self {
            ArithNonTerminal::Expr => "Expr",
            ArithNonTerminal::Term => "Term",
            ArithNonTerminal::Factor => "Factor",
        }
    }
}

// Helper function to create tokens with proper ranges
fn create_token(kind: ArithSyntaxKind, text: &str, offset: u32) -> LexerToken<ArithSyntaxKind> {
    let len = TextSize::from(u32::try_from(text.len()).unwrap_or(0));
    LexerToken::new(kind, text, TextRange::at(TextSize::from(offset), len))
}

// Build your grammar rules
let grammar = GrammarBuilder::new()
    .entry_point(ArithNonTerminal::Expr)
    // Simple grammar: Expr -> Number
    .rule(ArithNonTerminal::Expr, Expr::token(create_token(
        ArithSyntaxKind::Number,
        "42",
        0
    )))
    .build()
    .expect("Failed to build grammar");

Step 5: Parse!

use sipha::syntax::SyntaxKind;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    Number, Plus, Minus, Multiply, Divide, LParen, RParen, Whitespace, Eof,
    Expr, Term, Factor,
}
impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor) }
    fn is_trivia(self) -> bool { matches!(self, ArithSyntaxKind::Whitespace) }
}
use sipha::grammar::{GrammarBuilder, NonTerminal, Expr};
use sipha::lexer::Token as LexerToken;
use sipha::syntax::{TextRange, TextSize};
use std::convert::TryFrom;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum ArithNonTerminal { Expr, Term, Factor, }
impl NonTerminal for ArithNonTerminal {
    fn name(&self) -> &str { match self { ArithNonTerminal::Expr => "Expr", ArithNonTerminal::Term => "Term", ArithNonTerminal::Factor => "Factor", } }
}
fn create_token(kind: ArithSyntaxKind, text: &str, offset: u32) -> LexerToken<ArithSyntaxKind> {
    let len = TextSize::from(u32::try_from(text.len()).unwrap_or(0));
    LexerToken::new(kind, text, TextRange::at(TextSize::from(offset), len))
}
let grammar = GrammarBuilder::new()
    .entry_point(ArithNonTerminal::Expr)
    .rule(ArithNonTerminal::Expr, Expr::token(create_token(ArithSyntaxKind::Number, "42", 0)))
    .build().expect("Failed to build grammar");
use sipha::lexer::{LexerBuilder, Pattern, CharSet};
let lexer = LexerBuilder::new()
    .token(ArithSyntaxKind::Number, Pattern::Repeat { pattern: Box::new(Pattern::CharClass(CharSet::digits())), min: 1, max: None })
    .token(ArithSyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(ArithSyntaxKind::Minus, Pattern::Literal("-".into()))
    .token(ArithSyntaxKind::Multiply, Pattern::Literal("*".into()))
    .token(ArithSyntaxKind::Divide, Pattern::Literal("/".into()))
    .token(ArithSyntaxKind::LParen, Pattern::Literal("(".into()))
    .token(ArithSyntaxKind::RParen, Pattern::Literal(")".into()))
    .token(ArithSyntaxKind::Whitespace, Pattern::Repeat { pattern: Box::new(Pattern::CharClass(CharSet::whitespace())), min: 1, max: None })
    .trivia(ArithSyntaxKind::Whitespace)
    .build(ArithSyntaxKind::Eof, ArithSyntaxKind::Number).expect("Failed to build lexer");
let input = "42 + 10";
let tokens = lexer.tokenize(input).expect("Failed to tokenize input");
use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;

let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

let result = parser.parse(&tokens, ArithNonTerminal::Expr);

For a complete working example, see the Basic Arithmetic Example or check out examples/basic_arithmetic.rs in the repository.

Core Concepts Overview

Before diving deeper, here’s a quick overview of Sipha’s core concepts:

Syntax Kinds

Sipha uses a unified SyntaxKind trait for both terminals (tokens) and non-terminals (grammar rules). This design simplifies the API and allows for flexible grammar definitions.

See Syntax Kinds for more details.

Lexers

Lexers convert raw text into tokens. Sipha provides a flexible lexer builder API that supports:

Pattern matching (literals, character classes, repetitions)
Trivia handling (whitespace, comments)
DFA-based tokenization for performance

See Lexers for more details.

Grammars

Grammars define the structure of your language. Sipha uses a builder API to define grammar rules declaratively.

See Grammars for more details.

Parsing Backends

Sipha supports multiple parsing algorithms:

LL(k): Top-down predictive parsing
LR: Bottom-up shift-reduce parsing
GLR: Generalized LR for ambiguous grammars

See Parsing Backends for more details.

Syntax Trees

Sipha uses an immutable green/red tree representation:

Green trees: Compact, shared representation
Red trees: Convenient API for traversal

See Syntax Trees for more details.

Incremental Parsing

Sipha’s key feature is incremental parsing—the ability to efficiently re-parse only changed regions. This is essential for interactive applications.

See Incremental Parsing for more details.

Next Steps

Now that you have Sipha installed and understand the basics, you can:

Read about Core Concepts in detail
Explore Incremental Parsing to understand Sipha’s key feature
Check out Examples to see real-world usage
Learn about Parsing Backends to choose the right algorithm

Syntax Kinds

Sipha uses a unified SyntaxKind trait for both terminals (tokens) and non-terminals (grammar rules). This design simplifies the API and allows for flexible grammar definitions.

Overview

In traditional parser generators, terminals and non-terminals are separate types. Sipha unifies them into a single SyntaxKind enum, which simplifies the API and makes it easier to work with syntax trees.

The SyntaxKind Trait

The SyntaxKind trait is the foundation of Sipha’s type system:

use std::hash::Hash;
use std::fmt::Debug;

pub trait SyntaxKind: Copy + Clone + PartialEq + Eq + Hash + Debug {
    fn is_terminal(self) -> bool;
    fn is_trivia(self) -> bool;
}

Required Methods

is_terminal(self) -> bool: Returns true if this kind represents a terminal (token), false if it’s a non-terminal (grammar rule).
is_trivia(self) -> bool: Returns true if this kind represents trivia (whitespace, comments) that should be ignored during parsing.

Defining Your Syntax Kinds

You define syntax kinds as an enum that implements the SyntaxKind trait:

use sipha::syntax::SyntaxKind;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum MySyntaxKind {
    // Terminals (produced by lexer)
    Number,
    Plus,
    Minus,
    LParen,
    RParen,
    Whitespace,
    Eof,
    // Non-terminals (produced by parser)
    Expr,
    Term,
    Factor,
}

impl SyntaxKind for MySyntaxKind {
    fn is_terminal(self) -> bool {
        !matches!(self, 
            MySyntaxKind::Expr | 
            MySyntaxKind::Term | 
            MySyntaxKind::Factor
        )
    }
    
    fn is_trivia(self) -> bool {
        matches!(self, MySyntaxKind::Whitespace)
    }
}

Terminals vs Non-Terminals

Terminals

Terminals are produced by the lexer and represent actual tokens in the input text:

Keywords (e.g., if, while, return)
Operators (e.g., +, -, *, /)
Identifiers (e.g., variable names)
Literals (e.g., numbers, strings)
Punctuation (e.g., (, ), {, })
Trivia (e.g., whitespace, comments)

Non-Terminals

Non-terminals are produced by the parser and represent grammar rules:

Expression nodes (e.g., Expr, Term, Factor)
Statement nodes (e.g., Stmt, IfStmt, WhileStmt)
Declaration nodes (e.g., Decl, FnDecl, VarDecl)

Trivia

Trivia are tokens that don’t affect the parse tree structure but are preserved for formatting and error messages:

Whitespace
Comments
Line breaks

Mark trivia kinds with is_trivia() returning true. The parser will automatically skip trivia during parsing, but they’ll still be available in the syntax tree.

Best Practices

Use descriptive names: Choose names that clearly indicate what the kind represents.
Group related kinds: Keep terminals and non-terminals organized logically.
Mark trivia correctly: Ensure is_trivia() returns true for all trivia kinds.
Use exhaustive matching: When implementing is_terminal(), use exhaustive patterns to catch all cases.

Example: Complete Syntax Kind Definition

use sipha::syntax::SyntaxKind;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum CalculatorSyntaxKind {
    // Terminals
    Number,
    Plus,
    Minus,
    Multiply,
    Divide,
    LParen,
    RParen,
    Whitespace,
    Eof,
    // Non-terminals
    Expr,
    Term,
    Factor,
}

impl SyntaxKind for CalculatorSyntaxKind {
    fn is_terminal(self) -> bool {
        !matches!(
            self,
            CalculatorSyntaxKind::Expr
                | CalculatorSyntaxKind::Term
                | CalculatorSyntaxKind::Factor
        )
    }
    
    fn is_trivia(self) -> bool {
        matches!(self, CalculatorSyntaxKind::Whitespace)
    }
}

Next Steps

Now that you understand syntax kinds, you can:

Learn about Lexers to see how terminals are produced
Explore Grammars to see how non-terminals are used
Check out Syntax Trees to see how kinds are used in trees

Lexers

Lexers convert raw text into tokens. Sipha provides a flexible lexer builder API that supports pattern matching, trivia handling, and DFA-based tokenization for performance.

Overview

The lexer is the first stage of parsing. It takes source text and converts it into a stream of tokens that the parser can consume. Sipha’s lexer uses a DFA (Deterministic Finite Automaton) for efficient tokenization.

Building a Lexer

Use LexerBuilder to construct a lexer:

use sipha::lexer::{LexerBuilder, Pattern, CharSet};

let lexer = LexerBuilder::new()
    .token(MySyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1,
        max: None,
    })
    .token(MySyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(MySyntaxKind::Whitespace, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
        min: 1,
        max: None,
    })
    .trivia(MySyntaxKind::Whitespace)
    .build(MySyntaxKind::Eof, MySyntaxKind::Number)
    .expect("Failed to build lexer");

Patterns

Sipha supports several pattern types for matching tokens:

Literal Patterns

Match exact strings:

.token(MySyntaxKind::Plus, Pattern::Literal("+".into()))
.token(MySyntaxKind::LParen, Pattern::Literal("(".into()))

Character Class Patterns

Match character ranges:

use sipha::lexer::CharSet;

// Match digits [0-9]
.token(MySyntaxKind::Number, Pattern::CharClass(CharSet::digits()))

// Match letters [a-zA-Z]
.token(MySyntaxKind::Ident, Pattern::CharClass(CharSet::new(vec!['a'..='z', 'A'..='Z'])))

// Match whitespace
.token(MySyntaxKind::Whitespace, Pattern::CharClass(CharSet::whitespace()))

Repeat Patterns

Match repeating patterns:

// Match one or more digits
.token(MySyntaxKind::Number, Pattern::Repeat {
    pattern: Box::new(Pattern::CharClass(CharSet::digits())),
    min: 1,
    max: None, // No maximum
})

// Match zero or more whitespace
.token(MySyntaxKind::Whitespace, Pattern::Repeat {
    pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
    min: 0,
    max: None,
})

// Match exactly 3 digits
.token(MySyntaxKind::ThreeDigits, Pattern::Repeat {
    pattern: Box::new(Pattern::CharClass(CharSet::digits())),
    min: 3,
    max: Some(3),
})

Regex Patterns

For complex patterns, use regex:

// Match floating point numbers
.token(MySyntaxKind::Float, Pattern::Regex(r"\d+\.\d+".into()))

// Match email addresses
.token(MySyntaxKind::Email, Pattern::Regex(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}".into()))

Any Pattern

Match any single character:

.token(MySyntaxKind::AnyChar, Pattern::Any)

Keywords

Keywords are reserved words that should be matched as literals rather than identifiers:

let lexer = LexerBuilder::new()
    .token(MySyntaxKind::Ident, Pattern::CharClass(CharSet::new(vec!['a'..='z', 'A'..='Z'])))
    .keyword("if", MySyntaxKind::If)
    .keyword("else", MySyntaxKind::Else)
    .keyword("while", MySyntaxKind::While)
    .build(MySyntaxKind::Eof, MySyntaxKind::Ident)
    .expect("Failed to build lexer");

Keywords are matched with higher priority than general patterns, so "if" will match as a keyword rather than an identifier.

Trivia

Trivia are tokens that don’t affect parsing but are preserved in the syntax tree (e.g., whitespace, comments):

let lexer = LexerBuilder::new()
    .token(MySyntaxKind::Whitespace, Pattern::CharClass(CharSet::whitespace()))
    .token(MySyntaxKind::Comment, Pattern::Regex(r"//.*".into()))
    .trivia(MySyntaxKind::Whitespace)
    .trivia(MySyntaxKind::Comment)
    .build(MySyntaxKind::Eof, MySyntaxKind::Number)
    .expect("Failed to build lexer");

The parser will automatically skip trivia during parsing, but they’ll still be available in the syntax tree for formatting and IDE features.

Custom Matchers

For complex tokenization logic, use custom matchers:

let lexer = LexerBuilder::new()
    .custom_token(MySyntaxKind::String, |text, pos| {
        // Custom string matching logic
        if text[pos..].starts_with('"') {
            // Find closing quote
            let mut end = pos + 1;
            while end < text.len() && text.as_bytes()[end] != b'"' {
                if text.as_bytes()[end] == b'\\' {
                    end += 2; // Skip escape sequence
                } else {
                    end += 1;
                }
            }
            if end < text.len() {
                Some((end - pos + 1, TokenValue::String(text[pos+1..end].to_string())))
            } else {
                None // Unterminated string
            }
        } else {
            None
        }
    })
    .build(MySyntaxKind::Eof, MySyntaxKind::Number)
    .expect("Failed to build lexer");

Custom matchers return Option<(usize, TokenValue)> where:

usize is the number of characters matched
TokenValue is the token’s value (if any)

Tokenization

Once the lexer is built, tokenize input:

let input = "42 + 10";
let tokens = lexer.tokenize(input)
    .expect("Failed to tokenize input");

for token in &tokens {
    println!("{:?}: {}", token.kind, token.text);
}

DFA Construction

Sipha’s lexer uses a DFA (Deterministic Finite Automaton) for efficient tokenization. The DFA is built from NFA (Nondeterministic Finite Automaton) patterns, which are then converted to a DFA for O(n) tokenization performance.

The DFA construction process:

Pattern to NFA: Each pattern is converted to an NFA
NFA to DFA: The NFA is converted to a DFA using subset construction
Optimization: The DFA is optimized for size and performance

Priority and Ambiguity

When multiple patterns could match the same input, Sipha uses priority to resolve ambiguity:

Keywords have the highest priority (lowest number)
Patterns are prioritized by order of definition (earlier = higher priority)
Longest match wins when priorities are equal

Error Handling

The lexer returns LexerError for invalid input:

match lexer.tokenize(input) {
    Ok(tokens) => {
        // Process tokens
    }
    Err(error) => {
        match error.kind {
            LexerErrorKind::UnexpectedChar { .. } => {
                // Handle unexpected character
            }
            LexerErrorKind::InvalidNumber { reason } => {
                // Handle invalid number format
            }
            // ... other error kinds
        }
    }
}

Best Practices

Order patterns carefully: More specific patterns should come before general ones
Use keywords for reserved words: This ensures proper matching priority
Mark trivia correctly: Ensure all trivia kinds are marked with .trivia()
Use character classes efficiently: Prefer character classes over regex when possible
Test edge cases: Test with empty input, very long input, and invalid input

Example: Complete Lexer

use sipha::lexer::{LexerBuilder, Pattern, CharSet};

let lexer = LexerBuilder::new()
    // Numbers
    .token(MySyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1,
        max: None,
    })
    // Operators
    .token(MySyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(MySyntaxKind::Minus, Pattern::Literal("-".into()))
    .token(MySyntaxKind::Multiply, Pattern::Literal("*".into()))
    .token(MySyntaxKind::Divide, Pattern::Literal("/".into()))
    // Parentheses
    .token(MySyntaxKind::LParen, Pattern::Literal("(".into()))
    .token(MySyntaxKind::RParen, Pattern::Literal(")".into()))
    // Whitespace (trivia)
    .token(MySyntaxKind::Whitespace, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
        min: 1,
        max: None,
    })
    .trivia(MySyntaxKind::Whitespace)
    .build(MySyntaxKind::Eof, MySyntaxKind::Number)
    .expect("Failed to build lexer");

Next Steps

Learn about Grammars to see how tokens are used in parsing
Explore Syntax Trees to see how tokens appear in trees
Check out Custom Patterns for advanced lexer usage

Grammars

Grammars define the structure of your language. Sipha uses a builder API to define grammar rules declaratively.

Overview

A grammar in Sipha consists of:

Non-terminals: Grammar rules (e.g., Expr, Stmt, Decl)
Production rules: How non-terminals are expanded (e.g., Expr -> Term + Expr)
Entry point: The starting non-terminal for parsing

GrammarBuilder API

Use GrammarBuilder to construct grammars:

use sipha::grammar::{GrammarBuilder, Expr, NonTerminal, Token};

let grammar = GrammarBuilder::new()
    .entry_point(MyNonTerminal::Expr)
    .rule(MyNonTerminal::Expr, Expr::token(some_token))
    .build()
    .expect("Failed to build grammar");

Grammar Expressions

Grammar rules use Expr to define right-hand sides:

Token Expressions

Match a specific token:

use sipha::grammar::Expr;
use sipha::lexer::Token;
use sipha::syntax::{TextRange, TextSize};

let token = Token::new(
    MySyntaxKind::Number,
    "42",
    TextRange::at(TextSize::zero(), TextSize::from(2))
);
.rule(MyNonTerminal::Factor, Expr::token(token))

Sequence Expressions

Match a sequence of expressions:

.rule(MyNonTerminal::Expr, Expr::seq(vec![
    Expr::non_terminal(MyNonTerminal::Term),
    Expr::token(plus_token),
    Expr::non_terminal(MyNonTerminal::Expr),
]))

Choice Expressions

Match one of several alternatives:

.rule(MyNonTerminal::Expr, Expr::choice(vec![
    Expr::non_terminal(MyNonTerminal::Term),
    Expr::non_terminal(MyNonTerminal::Expr),
]))

Optional Expressions

Match an optional expression:

.rule(MyNonTerminal::Stmt, Expr::seq(vec![
    Expr::non_terminal(MyNonTerminal::If),
    Expr::optional(Box::new(Expr::non_terminal(MyNonTerminal::Else))),
]))

Repeat Expressions

Match zero or more repetitions:

// Zero or more
.rule(MyNonTerminal::Args, Expr::repeat(
    Box::new(Expr::non_terminal(MyNonTerminal::Arg)),
    0,
    None,
))

// One or more
.rule(MyNonTerminal::Stmts, Expr::repeat(
    Box::new(Expr::non_terminal(MyNonTerminal::Stmt)),
    1,
    None,
))

Empty Expression

Match nothing (epsilon):

.rule(MyNonTerminal::Optional, Expr::Empty)

Complete Example: Arithmetic Grammar

use sipha::grammar::{GrammarBuilder, Expr, NonTerminal};
use sipha::lexer::Token as LexerToken;
use sipha::syntax::{TextRange, TextSize};

// Helper function to create tokens with proper ranges
fn create_token(kind: ArithSyntaxKind, text: &str, offset: u32) -> LexerToken<ArithSyntaxKind> {
    let len = TextSize::from(u32::try_from(text.len()).unwrap_or(0));
    LexerToken::new(kind, text, TextRange::at(TextSize::from(offset), len))
}

let grammar = GrammarBuilder::new()
    .entry_point(ArithNonTerminal::Expr)
    
    // Expr -> Term | Expr + Term | Expr - Term
    .rule(ArithNonTerminal::Expr, Expr::choice(vec![
        Expr::seq(vec![
            Expr::non_terminal(ArithNonTerminal::Expr),
            Expr::token(create_token(ArithSyntaxKind::Plus, "+", 0)),
            Expr::non_terminal(ArithNonTerminal::Term),
        ]),
        Expr::seq(vec![
            Expr::non_terminal(ArithNonTerminal::Expr),
            Expr::token(create_token(ArithSyntaxKind::Minus, "-", 0)),
            Expr::non_terminal(ArithNonTerminal::Term),
        ]),
        Expr::non_terminal(ArithNonTerminal::Term),
    ]))
    
    // Term -> Factor | Term * Factor | Term / Factor
    .rule(ArithNonTerminal::Term, Expr::choice(vec![
        Expr::seq(vec![
            Expr::non_terminal(ArithNonTerminal::Term),
            Expr::token(create_token(ArithSyntaxKind::Multiply, "*", 0)),
            Expr::non_terminal(ArithNonTerminal::Factor),
        ]),
        Expr::seq(vec![
            Expr::non_terminal(ArithNonTerminal::Term),
            Expr::token(create_token(ArithSyntaxKind::Divide, "/", 0)),
            Expr::non_terminal(ArithNonTerminal::Factor),
        ]),
        Expr::non_terminal(ArithNonTerminal::Factor),
    ]))
    
    // Factor -> Number | ( Expr )
    .rule(ArithNonTerminal::Factor, Expr::choice(vec![
        Expr::token(create_token(ArithSyntaxKind::Number, "42", 0)),
        Expr::seq(vec![
            Expr::token(create_token(ArithSyntaxKind::LParen, "(", 0)),
            Expr::non_terminal(ArithNonTerminal::Expr),
            Expr::token(create_token(ArithSyntaxKind::RParen, ")", 0)),
        ]),
    ]))
    
    .build()
    .expect("Failed to build grammar");

Grammar Validation

Sipha validates grammars before use:

let grammar = GrammarBuilder::new()
    .entry_point(MyNonTerminal::Expr)
    .rule(MyNonTerminal::Expr, Expr::token(some_token))
    .build()
    .expect("Failed to build grammar");

// Validate grammar for a specific backend
let errors = LlParser::validate(&grammar);
if !errors.is_empty() {
    for error in errors {
        eprintln!("Grammar error: {:?}", error);
    }
}

Common validation errors:

Left recursion: Some backends don’t support left recursion
Ambiguity: Some backends require unambiguous grammars
Missing rules: Referenced non-terminals must be defined
Unreachable rules: Rules that can’t be reached from the entry point

Backend Hints

Provide hints to backends about how to handle rules:

use sipha::grammar::hint::PrecedenceHint;

.rule(MyNonTerminal::Expr, Expr::choice(vec![
    // Higher precedence
    Expr::seq(vec![
        Expr::non_terminal(MyNonTerminal::Term),
        Expr::token(multiply_token),
        Expr::non_terminal(MyNonTerminal::Term),
    ]).with_hint(PrecedenceHint::new(2)),
    // Lower precedence
    Expr::seq(vec![
        Expr::non_terminal(MyNonTerminal::Expr),
        Expr::token(plus_token),
        Expr::non_terminal(MyNonTerminal::Term),
    ]).with_hint(PrecedenceHint::new(1)),
]))

Grammar Analysis

Sipha provides tools for analyzing grammars:

Left recursion detection: Find rules with left recursion
Reachability analysis: Find unreachable rules
First/Follow sets: Compute FIRST and FOLLOW sets for LL parsing
Conflict detection: Find shift/reduce and reduce/reduce conflicts

See Grammar Analysis for more details.

Best Practices

Start simple: Begin with a simple grammar and add complexity gradually
Use meaningful names: Choose clear names for non-terminals
Factor common patterns: Extract common subexpressions into separate rules
Handle precedence explicitly: Use precedence hints or grammar structure
Test thoroughly: Test with various inputs, including edge cases
Validate early: Validate grammars before using them in parsers

Next Steps

Learn about Parsing Basics to see how grammars are used
Explore Parsing Backends to understand backend-specific requirements
Check out Grammar Analysis for advanced grammar tools

Parsing Basics

This chapter explains how parsing works in Sipha, from creating a parser to understanding parse results.

Overview

Parsing is the process of converting a stream of tokens into a syntax tree according to a grammar. Sipha provides multiple parsing backends, each with different characteristics and capabilities.

Creating a Parser

First, create a parser from a grammar:

use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;

let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

Different backends have different configuration options:

LL parser: LlConfig with lookahead settings
LR parser: LrConfig with LALR vs canonical LR options
GLR parser: GlrConfig with ambiguity handling options

Parsing Input

Parse a stream of tokens:

let result = parser.parse(&tokens, MyNonTerminal::Expr);

The parse method takes:

Tokens: A slice of tokens from the lexer
Entry point: The non-terminal to start parsing from

Parse Results

The parse method returns a ParseResult:

pub struct ParseResult<T, N> {
    pub root: GreenNode<T::Kind>,
    pub errors: Vec<ParseError<T, N>>,
    pub warnings: Vec<ParseWarning<T, N>>,
    pub metrics: ParseMetrics,
    #[cfg(feature = "backend-glr")]
    pub forest: Option<ParseForest<T::Kind>>,
}

Root Node

The root field contains the syntax tree root. Even if there are errors, the parser attempts to produce a tree:

let root = SyntaxNode::new_root(result.root.clone());
println!("Root kind: {:?}", root.kind());

Errors

The errors field contains parsing errors:

if !result.errors.is_empty() {
    for error in &result.errors {
        eprintln!("Parse error at {:?}: {}", error.span, error.message);
    }
}

Common error types:

Unexpected token: Expected one token but found another
Unexpected EOF: Reached end of input unexpectedly
Missing token: Expected a token that wasn’t found
Invalid grammar: Grammar-related errors

Warnings

The warnings field contains non-fatal issues:

for warning in &result.warnings {
    eprintln!("Warning: {}", warning.message);
}

Metrics

The metrics field contains parsing statistics:

println!("Tokens consumed: {}", result.metrics.tokens_consumed);
println!("Nodes created: {}", result.metrics.nodes_created);
println!("Parse time: {:?}", result.metrics.parse_time);

Parse Forest (GLR)

When using the GLR backend, the forest field may contain multiple parse trees for ambiguous input:

#[cfg(feature = "backend-glr")]
if let Some(forest) = result.forest {
    // Handle ambiguity
    let disambiguated = forest.disambiguate(|alternatives| {
        // Choose one alternative
        alternatives.first().cloned()
    });
}

Error Recovery

Sipha parsers attempt to recover from errors and continue parsing:

Synchronization tokens: Skip to known recovery points
Delimited recovery: Skip to matching delimiters
Best-effort parsing: Continue parsing despite errors

See Error Handling for more details.

Working with Syntax Trees

After parsing, work with the syntax tree:

use sipha::syntax::SyntaxNode;

let root = SyntaxNode::new_root(result.root.clone());

// Traverse children
for child in root.children() {
    println!("Child: {:?}", child.kind());
}

// Find specific nodes
if let Some(expr) = root.descendants().find(|n| n.kind() == MySyntaxKind::Expr) {
    println!("Found expression");
}

See Syntax Trees for more details.

Incremental Parsing

For interactive applications, use incremental parsing:

use sipha::incremental::{IncrementalParser, TextEdit};

let mut incremental_parser = IncrementalParser::new(parser);

// Initial parse
let result1 = incremental_parser.parse_incremental(
    &tokens,
    None,
    &[],
    MyNonTerminal::Expr,
    Some(&grammar),
);

// After an edit
let edits = vec![TextEdit::replace(range, "new text".into())];
let result2 = incremental_parser.parse_incremental(
    &new_tokens,
    Some(&result1.root),
    &edits,
    MyNonTerminal::Expr,
    Some(&grammar),
);

See Incremental Parsing for more details.

Backend Selection

Choose a backend based on your needs:

LL(k): Good for most grammars, supports left-recursion elimination
LR: Efficient for many grammar types, good error recovery
GLR: Handles ambiguous grammars, ideal for complex languages

See Choosing a Backend for guidance.

Complete Example

use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;
use sipha::syntax::SyntaxNode;

// Create parser
let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

// Parse
let result = parser.parse(&tokens, MyNonTerminal::Expr);

// Check for errors
if !result.errors.is_empty() {
    eprintln!("Parsing failed with {} errors", result.errors.len());
    for error in &result.errors {
        eprintln!("  {:?}", error);
    }
}

// Work with syntax tree
let root = SyntaxNode::new_root(result.root.clone());
println!("Parse successful! Root: {:?}", root.kind());
println!("Created {} nodes in {:?}", 
    result.metrics.nodes_created, 
    result.metrics.parse_time);

Next Steps

Learn about Incremental Parsing for interactive applications
Explore Parsing Backends to understand different algorithms
Check out Syntax Trees for tree manipulation

Incremental Parsing Overview

Incremental parsing is Sipha’s core strength. It allows you to efficiently update your parse tree when code changes, rather than re-parsing everything from scratch.

Why Incremental Parsing?

In interactive applications like IDEs, users edit code frequently. Traditional parsers re-parse the entire file on every change, which can be slow for large files. Incremental parsing:

Reuses unchanged subtrees from previous parses
Only re-parses affected regions instead of the entire file
Maintains parse caches for fast updates
Scales to large codebases efficiently

This makes Sipha perfect for building language servers, IDEs, and other tools that need to parse code as users type.

Benefits

Performance

Incremental parsing provides significant performance improvements:

Fast updates: Only re-parse changed regions
Memory efficient: Shared tree representation
Scalable: Handles large files efficiently
IDE-ready: Designed for interactive editing scenarios

User Experience

For interactive applications, incremental parsing means:

Responsive editing: No lag when typing
Real-time feedback: Errors and diagnostics update instantly
Smooth experience: Large files don’t slow down the editor

How It Works (High Level)

Initial parse: Parse the entire file and cache results
Edit detection: Identify which regions changed
Node reuse: Find unchanged subtrees from previous parse
Incremental re-parse: Only re-parse affected regions
Tree reconstruction: Combine reused nodes with new parse results

Current Status

Incremental parsing is fully implemented with complete node reuse and cache management:

✅ Node reuse: Unchanged subtrees are automatically identified and reused
✅ Cache population: Parse results are cached for future reuse
✅ Affected range computation: Only affected regions are re-parsed
✅ Smart invalidation: Cache entries are invalidated based on edit locations

The parser automatically integrates reusable nodes from previous parses, providing significant performance improvements for interactive editing scenarios.

When to Use Incremental Parsing

Use incremental parsing when:

Building language servers or IDEs
Creating interactive code editors
Building tools that need real-time syntax analysis
Parsing large files that change frequently
Providing live error checking and diagnostics

For batch parsing (one-time parsing of files), regular parsing is sufficient and may be faster due to less overhead.

Next Steps

Learn How It Works for detailed implementation
See Usage for API examples
Read Implementation Details for advanced topics

How Incremental Parsing Works

This chapter explains the internal mechanisms of incremental parsing in Sipha.

Overview

Incremental parsing works by:

Tracking edits: Identifying which parts of the text changed
Computing affected regions: Determining minimal regions to re-parse
Finding reusable nodes: Identifying unchanged subtrees from previous parse
Re-parsing selectively: Only parsing affected regions
Reconstructing the tree: Combining reused nodes with new parse results

Incremental Parsing Flow

flowchart TD
    Start[Text Edit] --> Track[Track Edit]
    Track --> Compute[Compute Affected Region]
    Compute --> Find[Find Reusable Nodes]
    Find --> Check{Reusable<br/>Nodes Found?}
    Check -->|Yes| Reuse[Reuse Unchanged Subtrees]
    Check -->|No| ParseAll[Parse Affected Region]
    Reuse --> ParseGaps[Parse Gaps Only]
    ParseGaps --> Combine[Combine Results]
    ParseAll --> Combine
    Combine --> Update[Update Parse Cache]
    Update --> Reconstruct[Reconstruct Tree]
    Reconstruct --> End[New Syntax Tree]
    
    style Start fill:#e1f5ff,color:#000000
    style Reuse fill:#c8e6c9,color:#000000
    style ParseGaps fill:#fff9c4,color:#000000
    style Combine fill:#ffccbc,color:#000000
    style End fill:#c8e6c9,color:#000000

Edit Tracking

When text is edited, Sipha tracks the changes using TextEdit:

use sipha::incremental::TextEdit;
use sipha::syntax::{TextRange, TextSize};

let edit = TextEdit::replace(
    TextRange::new(TextSize::from(10), TextSize::from(20)),
    "new text".into(),
);

Edits can be:

Replacements: Replace a range with new text
Insertions: Insert text at a position
Deletions: Delete a range

Affected Region Computation

Sipha computes the minimal region that needs to be re-parsed:

let affected = AffectedRegion::from_edits(&edits, Some(total_len));

The affected region:

Union: The smallest range covering all edits
Expanded: The union plus padding for context (default 16 characters)

The expansion ensures that context-dependent parsing (like operator precedence) works correctly.

Node Reuse

Sipha identifies nodes from the previous parse that can be reused:

let reusable = find_reusable_nodes(old_tree, affected.expanded(), budget);

A node is reusable if:

It doesn’t intersect with the affected region
It matches the expected syntax kind (if specified)
It’s within the reuse budget

Reuse Budget

The reuse budget limits how many nodes to consider for reuse:

use sipha::incremental::ReuseBudget;

// Fixed budget
let budget = ReuseBudget::Fixed(1000);

// Heuristic budget (default)
let budget = ReuseBudget::Heuristic {
    max_depth: 20,
    max_nodes: 1000,
};

The heuristic budget considers:

Tree depth: Deeper trees allow more candidates
File size: Larger files allow more candidates
Affected region size: Smaller affected regions allow more candidates

Incremental Session

An IncrementalSession captures everything needed for incremental parsing:

use sipha::incremental::IncrementalSession;

let session = IncrementalSession::new(old_tree, &edits);

The session provides:

Old tree: Previous parse tree
Edits: Text changes
Affected region: Computed affected range
Reusable nodes: List of nodes that can be reused
Cache access: Optional persistent parse cache

Parse Cache

Sipha maintains a persistent parse cache for cross-session node reuse:

let session = IncrementalSession::with_cache(old_tree, &edits, &cache);

The cache:

Keys: Rule name, start position, version
Values: Cached parse nodes
Versioning: Invalidates old entries based on version
Eviction: Removes old versions automatically

Cache population happens automatically when using parse_incremental with a grammar:

let result = parser.parse_incremental(
    &tokens,
    old_tree,
    &edits,
    entry_point,
    Some(&grammar), // Cache is populated automatically
);

Re-parsing Process

When re-parsing incrementally:

Find reusable nodes: Query the session for reusable nodes at each position
Parse gaps: Re-parse only the regions not covered by reusable nodes
Combine results: Merge reused nodes with newly parsed nodes
Update cache: Store new parse results in the cache

Tree Reconstruction

The final tree is reconstructed by:

Starting from root: Begin with the old tree root
Replacing affected subtrees: Replace subtrees in affected regions
Inserting new nodes: Add newly parsed nodes
Preserving structure: Maintain tree structure outside affected regions

Performance Characteristics

Incremental parsing performance depends on:

Edit size: Smaller edits = better performance
Tree structure: Deeper trees = more reuse opportunities
Cache hit rate: Higher hit rate = better performance
Affected region size: Smaller regions = faster re-parsing

Typical performance improvements:

Small edits (1-10 characters): 10-100x faster
Medium edits (10-100 characters): 5-20x faster
Large edits (100+ characters): 2-5x faster

Limitations

Incremental parsing has some limitations:

Context sensitivity: Some parsing decisions depend on distant context
Error recovery: Error recovery may require re-parsing larger regions
Ambiguity: Ambiguous grammars may require full re-parse
Cache overhead: Cache management has some overhead

Next Steps

See Usage for API examples
Read Implementation Details for advanced topics

Incremental Parsing Usage

This chapter shows how to use incremental parsing in your applications.

Basic Usage

Initial Parse

Start with an initial parse:

use sipha::incremental::IncrementalParser;
use sipha::backend::ll::{LlParser, LlConfig};

let config = LlConfig::default();
let parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

let mut incremental_parser = IncrementalParser::new(parser);

// Initial parse
let result1 = incremental_parser.parse_incremental(
    &tokens,
    None,        // No old tree
    &[],         // No edits
    MyNonTerminal::Expr,
    Some(&grammar), // Populate cache
);

After an Edit

When text is edited, re-parse incrementally:

use sipha::incremental::TextEdit;
use sipha::syntax::{TextRange, TextSize};

// User changed "42" to "100" at position 0
let edits = vec![TextEdit::replace(
    TextRange::new(TextSize::from(0), TextSize::from(2)),
    "100".into(),
)];

// Re-tokenize (or update tokens incrementally)
let new_tokens = lexer.tokenize(&new_text)
    .expect("Failed to tokenize");

// Incremental re-parse
let result2 = incremental_parser.parse_incremental(
    &new_tokens,
    Some(&result1.root), // Previous tree
    &edits,              // Text edits
    MyNonTerminal::Expr,
    Some(&grammar),      // Update cache
);

Text Edits

Creating Edits

use sipha::incremental::TextEdit;
use sipha::syntax::{TextRange, TextSize};

// Replace a range
let edit1 = TextEdit::replace(
    TextRange::new(TextSize::from(10), TextSize::from(20)),
    "new text".into(),
);

// Insert at a position
let edit2 = TextEdit::insert(
    TextSize::from(5),
    "inserted".into(),
);

// Delete a range
let edit3 = TextEdit::delete(
    TextRange::new(TextSize::from(0), TextSize::from(5)),
);

Multiple Edits

Handle multiple edits at once:

let edits = vec![
    TextEdit::replace(range1, "text1".into()),
    TextEdit::replace(range2, "text2".into()),
];

let result = incremental_parser.parse_incremental(
    &tokens,
    Some(&old_tree),
    &edits,
    entry_point,
    Some(&grammar),
);

Reuse Budget

Control how many nodes to consider for reuse:

use sipha::incremental::{IncrementalSession, ReuseBudget};

// Fixed budget
let budget = ReuseBudget::Fixed(500);
let session = IncrementalSession::new_with_budget(
    Some(&old_tree),
    &edits,
    budget,
);

// Heuristic budget (default)
let budget = ReuseBudget::Heuristic {
    max_depth: 20,
    max_nodes: 1000,
};

Cache Management

Automatic Cache Population

The cache is populated automatically when you provide a grammar:

let result = incremental_parser.parse_incremental(
    &tokens,
    old_tree,
    &edits,
    entry_point,
    Some(&grammar), // Cache populated automatically
);

Manual Cache Access

Access the cache directly:

let session = IncrementalSession::with_cache(
    Some(&old_tree),
    &edits,
    &incremental_parser.cache, // Access cache
);

Backend Integration

Incremental parsing works with all backends that support it:

// LL parser
let parser = LlParser::new(&grammar, config)?;
let mut incremental = IncrementalParser::new(parser);

// GLR parser
let parser = GlrParser::new(&grammar, config)?;
let mut incremental = IncrementalParser::new(parser);

Error Handling

Handle errors during incremental parsing:

let result = incremental_parser.parse_incremental(
    &tokens,
    Some(&old_tree),
    &edits,
    entry_point,
    Some(&grammar),
);

if !result.errors.is_empty() {
    // Handle errors
    for error in &result.errors {
        eprintln!("Error: {:?}", error);
    }
}

Best Practices

Always provide grammar: This enables cache population
Batch edits: Process multiple edits together when possible
Reuse old tree: Always pass the previous tree for best performance
Monitor metrics: Check ParseMetrics to understand performance
Handle errors gracefully: Incremental parsing may produce errors

Example: Language Server

Here’s a complete example for a language server:

use sipha::incremental::{IncrementalParser, TextEdit};
use sipha::syntax::{TextRange, TextSize};

struct LanguageServer {
    parser: IncrementalParser<LlParser, MyToken, MyNonTerminal>,
    grammar: Grammar<MyToken, MyNonTerminal>,
    lexer: CompiledLexer<MySyntaxKind>,
    current_tree: Option<GreenNode<MySyntaxKind>>,
}

impl LanguageServer {
    fn new() -> Self {
        let grammar = build_grammar();
        let parser = LlParser::new(&grammar, Default::default()).unwrap();
        let lexer = build_lexer();
        
        Self {
            parser: IncrementalParser::new(parser),
            grammar,
            lexer,
            current_tree: None,
        }
    }
    
    fn on_text_changed(&mut self, edits: Vec<TextEdit>, new_text: &str) {
        // Tokenize new text
        let tokens = self.lexer.tokenize(new_text).unwrap();
        
        // Incremental parse
        let result = self.parser.parse_incremental(
            &tokens,
            self.current_tree.as_ref(),
            &edits,
            MyNonTerminal::Expr,
            Some(&self.grammar),
        );
        
        // Update current tree
        self.current_tree = Some(result.root.clone());
        
        // Report errors
        if !result.errors.is_empty() {
            self.report_errors(&result.errors);
        }
    }
}

Next Steps

Read Implementation Details for advanced topics
See Examples for complete examples

Incremental Parsing Implementation Details

This chapter covers advanced implementation details for incremental parsing.

Architecture

Incremental parsing consists of several components:

IncrementalParser

The IncrementalParser wraps a backend parser and manages the parse cache:

pub struct IncrementalParser<P, T, N> {
    parser: P,
    cache: ParseCache<T::Kind>,
    _phantom: PhantomData<(T, N)>,
}

IncrementalSession

The IncrementalSession provides context for incremental parsing:

pub struct IncrementalSession<'a, K: SyntaxKind> {
    old_tree: Option<&'a GreenNode<K>>,
    edits: &'a [TextEdit],
    affected: AffectedRegion,
    reusable: Vec<ReuseCandidate<K>>,
    reusable_by_pos: HashMap<TextSize, Vec<usize>>,
    cache: Option<&'a ParseCache<K>>,
}

ParseCache

The ParseCache stores parse results for reuse:

pub struct ParseCache<K: SyntaxKind> {
    version: usize,
    nodes: HashMap<CacheKey, Arc<GreenNode<K>>>,
    interner: Rodeo,
}

Node Reuse Algorithm

The node reuse algorithm works as follows:

Compute affected region: Determine minimal region to re-parse
Expand region: Add padding for context
Find reusable nodes: Traverse old tree, collecting nodes outside affected region
Budget filtering: Limit candidates based on budget
Index by position: Build position index for fast lookup

Finding Reusable Nodes

fn find_reusable_nodes<K>(
    root: &GreenNode<K>,
    affected: TextRange,
    budget: usize,
) -> Vec<ReuseCandidate<K>> {
    let mut reusable = Vec::new();
    visit_green_spans(root, |span| {
        if reusable.len() >= budget {
            return;
        }
        
        if span.depth == 0 || span.range.intersect(affected).is_some() {
            return;
        }
        
        if let Some(node_arc) = span.node_arc.clone() {
            reusable.push(ReuseCandidate {
                node: node_arc,
                range: span.range,
                depth: span.depth,
            });
        }
    });
    
    reusable
}

Cache Key Structure

Cache keys consist of:

Rule name: Interned string for the non-terminal
Start position: Text position where parsing started
Version: Cache version for invalidation

struct CacheKey {
    rule: Spur,        // Interned rule name
    start: TextSize,   // Start position
    version: usize,     // Cache version
}

Version Management

The cache uses versioning for invalidation:

Increment version: On each parse, increment cache version
Evict old entries: Remove entries from old versions
Keep recent versions: Maintain last N versions (default: 2)

fn evict_old_entries(&mut self, max_versions: usize) {
    let min_version = self.version.saturating_sub(max_versions);
    self.nodes.retain(|key, _| key.version >= min_version);
}

Backend Integration

Backends implement incremental parsing via parse_with_session:

fn parse_with_session(
    &mut self,
    input: &[T],
    entry: N,
    session: &IncrementalSession<'_, T::Kind>,
) -> ParseResult<T, N> {
    // Use session to find reusable nodes
    // Re-parse only affected regions
    // Combine reused and new nodes
}

Performance Optimizations

Position Indexing

Reusable nodes are indexed by position for O(1) lookup:

let mut reusable_by_pos = HashMap::new();
for (idx, candidate) in reusable.iter().enumerate() {
    reusable_by_pos
        .entry(candidate.range.start())
        .or_insert_with(Vec::new)
        .push(idx);
}

Budget Calculation

The heuristic budget considers multiple factors:

fn calculate(&self, tree_depth: usize, file_size: TextSize, affected_size: TextSize) -> usize {
    // Depth factor: deeper trees = more candidates
    let depth_factor = (tree_depth.min(max_depth) * 10).max(50);
    
    // File size factor: larger files = more candidates
    let size_factor = ((file_size / 100) as usize).min(max_nodes / 2);
    
    // Affected region factor: smaller regions = more candidates
    let affected_ratio = affected_size as f64 / file_size as f64;
    let affected_factor = ((1.0 - affected_ratio) * 200.0) as usize;
    
    (depth_factor + size_factor + affected_factor)
        .min(max_nodes)
        .max(50)
}

Limitations and Trade-offs

Context Sensitivity

Some parsing decisions depend on distant context, requiring larger re-parse regions:

Operator precedence: May need parent context
Type inference: May need sibling context
Scoping: May need ancestor context

Error Recovery

Error recovery may require re-parsing larger regions:

Synchronization: May skip to distant recovery points
Best-effort: May need full re-parse for complex errors

Cache Overhead

Cache management has overhead:

Memory: Cache uses memory for stored nodes
Lookup: Cache lookups have some overhead
Eviction: Periodic eviction has CPU cost

Future Improvements

Potential improvements:

Incremental lexing: Extend incremental capabilities to lexer
Smarter invalidation: More precise cache invalidation
Parallel parsing: Parse multiple regions in parallel
Predictive reuse: Predict which nodes will be reusable

Next Steps

See Usage for API examples
Check Examples for complete examples

Parsing Backends Overview

Sipha supports multiple parsing algorithms via feature flags. Each backend has different characteristics and is suited for different use cases.

Available Backends

LL(k) Parser (`backend-ll`)

Top-down predictive parsing with configurable lookahead:

Supports left-recursion elimination
Configurable error recovery
Incremental parsing support
Good for most grammars

LR Parser (`backend-lr`)

Bottom-up shift-reduce parsing:

Efficient for many grammar types
Good error recovery
Supports LALR and canonical LR
Table-based parsing

GLR Parser (`backend-glr`)

Generalized LR parsing for ambiguous grammars:

Handles non-deterministic and ambiguous grammars
Parse forest representation for ambiguity tracking
Configurable disambiguation strategies
Incremental parsing support
Ideal for complex languages like C++

PEG Parser (`backend-peg`)

Parsing Expression Grammar with ordered choice and memoization:

Ordered choice semantics (first match wins)
Packrat parsing with memoization for O(n) performance
Backtracking with configurable depth limits
Incremental parsing support
Ideal for precedence-based languages and interactive tools

ParserBackend Trait

All backends implement the ParserBackend trait:

pub trait ParserBackend<T, N>: Sized + Send
where
    T: Token,
    N: NonTerminal,
{
    type Config: Default + Clone;
    type Error: std::error::Error + Send + Sync + 'static;
    type State: Send + Sync;

    fn new(grammar: &Grammar<T, N>, config: Self::Config) -> Result<Self, Self::Error>;
    fn parse(&mut self, input: &[T], entry: N) -> ParseResult<T, N>;
    fn parse_incremental(...) -> ParseResult<T, N>;
    fn parse_with_session(...) -> ParseResult<T, N>;
    fn validate(grammar: &Grammar<T, N>) -> Vec<GrammarError<T, N>>;
    fn capabilities() -> BackendCapabilities;
    fn state(&self) -> &Self::State;
}

Backend Capabilities

Each backend reports its capabilities:

pub struct BackendCapabilities {
    pub name: &'static str,
    pub algorithm: Algorithm,
    pub supports_left_recursion: bool,
    pub supports_ambiguity: bool,
    pub supports_incremental: bool,
    pub supports_error_recovery: bool,
    pub max_lookahead: Option<usize>,
}

Choosing a Backend

flowchart TD
    Start[Choose Backend] --> Ambiguous{Grammar<br/>Ambiguous?}
    Ambiguous -->|Yes| GLR[Use GLR Parser]
    Ambiguous -->|No| UseCase{Primary<br/>Use Case?}
    
    UseCase -->|Expression Parser<br/>Precedence-based| PEG1[Use PEG Parser]
    UseCase -->|Interactive Tools<br/>IDEs/Editors| Interactive{Need<br/>Memoization?}
    UseCase -->|Batch Processing| LR[Use LR Parser]
    UseCase -->|Complex Language| GLR2[Use GLR Parser]
    UseCase -->|Simple Grammar| LL[Use LL Parser]
    
    Interactive -->|Yes| PEG2[Use PEG Parser]
    Interactive -->|No| LL2[Use LL Parser]
    
    GLR --> End[Selected Backend]
    GLR2 --> End
    LR --> End
    LL --> End
    LL2 --> End
    PEG1 --> End
    PEG2 --> End
    
    style GLR fill:#ffccbc,color:#000000
    style GLR2 fill:#ffccbc,color:#000000
    style LR fill:#fff9c4,color:#000000
    style LL fill:#c8e6c9,color:#000000
    style LL2 fill:#c8e6c9,color:#000000
    style PEG1 fill:#e1bee7,color:#000000
    style PEG2 fill:#e1bee7,color:#000000
    style End fill:#e1f5ff,color:#000000

See Choosing a Backend for detailed guidance on selecting the right backend for your use case.

Next Steps

Learn about LL Parser
Explore LR Parser
Check out GLR Parser
Discover PEG Parser

LL(k) Parser

The LL(k) parser is a top-down predictive parser with configurable lookahead.

Overview

LL(k) parsing works by:

Predicting: Using lookahead to predict which production to use
Matching: Matching tokens against predicted productions
Recursing: Recursively parsing subexpressions

Configuration

Configure the LL parser with LlConfig:

use sipha::backend::ll::{LlParser, LlConfig};

let config = LlConfig {
    lookahead: 1,              // k value for LL(k)
    left_recursion: true,      // Enable left-recursion elimination
    error_recovery: true,      // Enable error recovery
    ..Default::default()
};

let mut parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

Lookahead

The lookahead parameter controls how many tokens to look ahead:

LL(1): One token lookahead (most common)
LL(k): k tokens lookahead (for ambiguous grammars)

Higher lookahead values:

Handle more ambiguous grammars
Require larger parse tables
May be slower

Left Recursion

Left recursion elimination allows parsing left-recursive grammars:

// Without elimination: Expr -> Expr + Term (left-recursive)
// With elimination: Automatically transformed

Error Recovery

Error recovery strategies:

Synchronization tokens: Skip to known recovery points
Delimited recovery: Skip to matching delimiters
Best-effort parsing: Continue parsing despite errors

Usage

use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;

let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

let result = parser.parse(&tokens, MyNonTerminal::Expr);

Incremental Parsing

The LL parser supports incremental parsing:

use sipha::incremental::IncrementalParser;

let mut incremental = IncrementalParser::new(parser);
let result = incremental.parse_incremental(
    &tokens,
    old_tree,
    &edits,
    entry_point,
    Some(&grammar),
);

Grammar Requirements

LL parsers require:

No left recursion (unless elimination is enabled)
No ambiguity (unless lookahead > 1)
FIRST/FOLLOW sets: Computed automatically

Parse Table

The LL parser builds a parse table:

Rows: Non-terminals
Columns: Terminals (FIRST sets)
Entries: Production rules to apply

Performance

LL parsing performance:

Time: O(n) where n is input length
Space: O(grammar size) for parse table
Incremental: Fast for small edits

When to Use

Use LL parser when:

Grammar is LL(k) compatible
Need predictable performance
Want simple error messages
Building interactive tools

Limitations

LL parser limitations:

Left recursion: Requires elimination (may change parse tree)
Ambiguity: Requires higher lookahead
Left associativity: May require grammar transformation

Next Steps

See Choosing a Backend for comparison
Check Examples for usage

LR Parser

The LR parser is a bottom-up shift-reduce parser that builds parse trees from leaves to root.

Overview

LR parsing works by:

Shifting: Moving tokens onto the stack
Reducing: Replacing matched rules with non-terminals
Building: Constructing the parse tree bottom-up

Configuration

Configure the LR parser with LrConfig:

use sipha::backend::lr::{LrParser, LrConfig};

let config = LrConfig {
    use_lalr: true,        // Use LALR instead of canonical LR
    error_recovery: true,  // Enable error recovery
    ..Default::default()
};

let mut parser = LrParser::new(&grammar, config)
    .expect("Failed to create parser");

LALR vs Canonical LR

LALR: Smaller tables, faster, handles most grammars
Canonical LR: Larger tables, more precise, handles edge cases

For most grammars, LALR is sufficient and faster.

Usage

use sipha::backend::lr::{LrParser, LrConfig};
use sipha::backend::ParserBackend;

let config = LrConfig::default();
let mut parser = LrParser::new(&grammar, config)
    .expect("Failed to create parser");

let result = parser.parse(&tokens, MyNonTerminal::Expr);

Parse Table

The LR parser builds a parse table:

States: Parser states (items)
Actions: Shift, reduce, accept, error
Gotos: State transitions for non-terminals

Table Construction

Build items: Create LR items from grammar
Build states: Group items into states
Compute actions: Determine shift/reduce actions
Resolve conflicts: Handle shift/reduce and reduce/reduce conflicts

Grammar Requirements

LR parsers require:

No ambiguity: Must be unambiguous (or use GLR)
No conflicts: No shift/reduce or reduce/reduce conflicts

Error Recovery

LR parsers support error recovery:

Synchronization: Skip to known recovery points
Delimited recovery: Skip to matching delimiters
Best-effort: Continue parsing despite errors

Performance

LR parsing performance:

Time: O(n) where n is input length
Space: O(grammar size) for parse table
Table size: LALR tables are smaller than canonical LR

When to Use

Use LR parser when:

Grammar is LR compatible
Need efficient parsing
Want good error recovery
Building batch parsers

Limitations

LR parser limitations:

Ambiguity: Cannot handle ambiguous grammars (use GLR)
Table size: Large grammars may have large tables
Left recursion: Must be handled in grammar

Next Steps

See GLR Parser for ambiguous grammars
Check Choosing a Backend for comparison

GLR Parser

The GLR (Generalized LR) parser extends LR parsing to handle ambiguous grammars by maintaining multiple parser stacks and forking on conflicts.

Overview

GLR parsing works by:

Building LR table: Uses LR(1) state machine
Maintaining stacks: Keeps multiple parser stacks
Forking on conflicts: Creates new stacks for ambiguous choices
Merging stacks: Merges stacks when they converge
Returning forest: Returns parse forest for ambiguous results

When to Use GLR

Use the GLR backend when:

Your grammar has inherent ambiguities (e.g., C++ template syntax)
You need to handle multiple valid parse trees
You want to disambiguate at parse time using precedence/associativity rules
Parsing complex languages with inherent ambiguities

Configuration

Configure the GLR parser with GlrConfig:

use sipha::backend::glr::{GlrParser, GlrConfig};

let config = GlrConfig {
    max_stacks: 100,           // Maximum number of stacks
    disambiguation: None,      // Disambiguation strategy
    ..Default::default()
};

let mut parser = GlrParser::new(&grammar, config)
    .expect("Failed to create GLR parser");

Usage

use sipha::backend::glr::{GlrParser, GlrConfig};
use sipha::backend::ParserBackend;

let config = GlrConfig::default();
let mut parser = GlrParser::new(&grammar, config)
    .expect("Failed to create GLR parser");

let result = parser.parse(&tokens, entry_point);

Handling Ambiguity

GLR parsers can produce parse forests when multiple valid parse trees exist:

if let Some(forest) = result.forest {
    // Multiple parse trees exist - disambiguate
    let disambiguated = forest.disambiguate(|alternatives| {
        // Custom disambiguation logic
        alternatives.first().cloned()
    });
}

Disambiguation

Sipha provides several disambiguation strategies:

Precedence-based

Resolve conflicts using operator precedence:

use sipha::backend::glr::disambiguate_by_precedence;

let disambiguated = forest.disambiguate(disambiguate_by_precedence(|op| {
    match op {
        MySyntaxKind::Multiply | MySyntaxKind::Divide => 2,
        MySyntaxKind::Plus | MySyntaxKind::Minus => 1,
        _ => 0,
    }
}));

Associativity-based

Resolve conflicts using operator associativity:

use sipha::backend::glr::disambiguate_by_associativity;

let disambiguated = forest.disambiguate(disambiguate_by_associativity(|op| {
    match op {
        MySyntaxKind::Minus => Associativity::Left,
        MySyntaxKind::Power => Associativity::Right,
        _ => Associativity::None,
    }
}));

Custom Strategies

Implement your own disambiguation logic:

let disambiguated = forest.disambiguate(|alternatives| {
    // Choose based on custom criteria
    alternatives
        .iter()
        .min_by_key(|tree| compute_complexity(tree))
        .cloned()
});

Parse Forest

A parse forest represents multiple parse trees:

pub struct ParseForest<K: SyntaxKind> {
    roots: Vec<Arc<GreenNode<K>>>,
}

The forest can be:

Queried: Find all parse trees
Disambiguated: Choose one tree
Traversed: Visit all alternatives

Incremental Parsing

The GLR parser supports incremental parsing:

use sipha::incremental::IncrementalParser;

let mut incremental = IncrementalParser::new(parser);
let result = incremental.parse_incremental(
    &tokens,
    old_tree,
    &edits,
    entry_point,
    Some(&grammar),
);

Performance

GLR parsing performance:

Time: O(n³) in worst case (ambiguous grammars)
Space: O(n²) for parse forest
Stacks: Number of stacks can grow with ambiguity

For unambiguous grammars, GLR performs similarly to LR.

Grammar Requirements

GLR parsers can handle:

Ambiguous grammars: Multiple valid parse trees
Non-deterministic grammars: Grammars with conflicts
Complex languages: Languages like C++ with inherent ambiguities

Limitations

GLR parser limitations:

Performance: Slower for highly ambiguous grammars
Memory: Parse forests can be large
Complexity: More complex than LL or LR

Example: Ambiguous Expression

// Grammar: Expr -> Expr + Expr | Expr * Expr | Number
// Input: "1 + 2 * 3"
// Ambiguous: Could be (1 + 2) * 3 or 1 + (2 * 3)

let result = glr_parser.parse(&tokens, MyNonTerminal::Expr);

if let Some(forest) = result.forest {
    // Two parse trees exist
    assert_eq!(forest.roots.len(), 2);
    
    // Disambiguate using precedence
    let disambiguated = forest.disambiguate(disambiguate_by_precedence(|op| {
        match op {
            MySyntaxKind::Multiply => 2,
            MySyntaxKind::Plus => 1,
            _ => 0,
        }
    }));
    
    // Result: 1 + (2 * 3) = 7
}

Next Steps

See Choosing a Backend for comparison
Check Examples for complete examples

PEG Parser

The PEG (Parsing Expression Grammar) parser uses ordered choice and backtracking with memoization (packrat parsing) for efficient parsing.

Overview

PEG parsing works by:

Ordered Choice: Try alternatives in order, first match wins
Backtracking: If an alternative fails, backtrack and try the next
Memoization: Cache parse results for O(n) performance (packrat parsing)
Greedy Matching: Always match as much as possible

Understanding PEG Grammars

PEG (Parsing Expression Grammar) is a type of formal grammar that describes a language in terms of parsing rules rather than generation rules. Unlike traditional CFG (Context-Free Grammar) parsers, PEG parsers use ordered choice semantics, making them deterministic and unambiguous.

Key Concepts

Ordered Choice (`A | B`)

In PEG, the choice operator | is ordered and greedy:

Alternatives are tried left to right
The first matching alternative wins
Once an alternative matches, no other alternatives are tried
This makes PEG grammars unambiguous by definition

Example:

// Grammar: Expr -> Number | Number
// In CFG: This is ambiguous (both alternatives match "42")
// In PEG: First alternative always wins, second is never tried

Expr::Choice(vec![
    Expr::token(Number("42")),  // This matches first
    Expr::token(Number("99")), // Never tried, even if input is "99"
])

Greedy Repetition (`A*`, `A+`)

PEG repetition is greedy - it matches as much as possible:

A* matches zero or more As (as many as possible)
A+ matches one or more As (as many as possible)
Repetition always tries to match more before trying less

Example:

// Grammar: Expr -> Number*
// Input: "123"
// Matches: All three digits (greedy)

Expr::Repeat {
    expr: Box::new(Expr::token(Number("1"))),
    min: 0,
    max: None,
}

Backtracking

When an alternative fails, the parser backtracks to try the next alternative:

Position is restored to where the alternative started
Previous matches are discarded
Next alternative is tried from the same position

Example:

// Grammar: Expr -> "ab" | "ac"
// Input: "ac"
// Process:
//   1. Try "ab" - matches "a", fails on "b"
//   2. Backtrack to start
//   3. Try "ac" - matches successfully

Key Differences from CFG

PEG parsers differ from traditional CFG parsers in fundamental ways:

Feature	CFG	PEG
Choice	Unordered (all alternatives considered)	Ordered (first match wins)
Ambiguity	Can be ambiguous	Always unambiguous
Repetition	May be non-greedy	Always greedy
Left Recursion	Natural	Requires special handling
Determinism	May be non-deterministic	Always deterministic

Ordered Choice Example

// CFG: Expr -> Number | Number
//      This is ambiguous - both alternatives match the same input
//      Parser must use disambiguation rules

// PEG: Expr -> Number | Number  
//      This is unambiguous - first alternative always wins
//      Second alternative is never tried if first matches

Expr::Choice(vec![
    Expr::token(Number("1")),  // Tried first
    Expr::token(Number("2")), // Only tried if first fails
])

Precedence Through Ordering

PEG makes operator precedence natural through ordering:

// Lower precedence operators tried first
// Higher precedence operators tried later
// This naturally gives * higher precedence than +

Expr::Choice(vec![
    // Addition (lower precedence, tried first)
    Expr::Seq(vec![
        Expr::Rule(Expr),
        Expr::token(Plus),
        Expr::Rule(Term),
    ]),
    // Base case (tried last)
    Expr::Rule(Term),
])

Term::Choice(vec![
    // Multiplication (higher precedence, tried first)
    Expr::Seq(vec![
        Expr::Rule(Term),
        Expr::token(Multiply),
        Expr::Rule(Factor),
    ]),
    // Base case (tried last)
    Expr::Rule(Factor),
])

Input: 1 + 2 * 3

Parse process:

Try Expr + Term - matches 1 +, then tries to match Term
Term tries Term * Factor - matches 2 * 3
Result: 1 + (2 * 3) ✓ (correct precedence)

Configuration

Configure the PEG parser with PegConfig:

use sipha::backend::peg::{PegParser, PegConfig};

let config = PegConfig {
    enable_memoization: true,    // Enable packrat parsing
    max_memo_size: 10000,        // Maximum cache size
    error_recovery: true,         // Enable error recovery
    max_errors: 100,             // Maximum errors before giving up
    max_backtrack_depth: 1000,   // Limit backtracking depth
};

let mut parser = PegParser::new(&grammar, config)
    .expect("Failed to create parser");

Memoization (Packrat Parsing)

When enable_memoization is true, the parser caches parse results:

O(n) Performance: Linear time parsing for many grammars
Memory Trade-off: Uses more memory for cache
Cache Size: Controlled by max_memo_size

Without memoization:

Exponential Worst Case: Can be slow for some grammars
Less Memory: No cache overhead

Backtracking Depth

The max_backtrack_depth parameter limits backtracking:

Prevents Infinite Loops: Stops infinite backtracking
Performance: Lower values may fail on valid inputs
Default: 1000 (usually sufficient)

Error Recovery

Error recovery strategies:

Best-effort parsing: Continue parsing despite errors
Error limits: Stop after max_errors errors
Recovery points: Use RecoveryPoint expressions in grammar

Usage

use sipha::backend::peg::{PegParser, PegConfig};
use sipha::backend::ParserBackend;

let config = PegConfig::default();
let mut parser = PegParser::new(&grammar, config)
    .expect("Failed to create parser");

let result = parser.parse(&tokens, MyNonTerminal::Expr);

Writing PEG Grammars

Basic Patterns

Sequences

Sequences match expressions in order:

// Match: "a" followed by "b"
Expr::Seq(vec![
    Expr::token(Token::A),
    Expr::token(Token::B),
])

Optional Elements

Optional expressions match zero or one occurrence:

// Match: "a" optionally followed by "b"
Expr::Seq(vec![
    Expr::token(Token::A),
    Expr::Opt(Box::new(Expr::token(Token::B))),
])

Repetition

Repetition matches zero or more (or one or more) occurrences:

// Match: One or more "a"s
Expr::Repeat {
    expr: Box::new(Expr::token(Token::A)),
    min: 1,
    max: None,
}

// Match: Zero to three "a"s
Expr::Repeat {
    expr: Box::new(Expr::token(Token::A)),
    min: 0,
    max: Some(3),
}

Separated Lists

Separated lists match items separated by a delimiter:

// Match: "1, 2, 3" or "1, 2, 3,"
Expr::Separated {
    item: Box::new(Expr::token(Number)),
    separator: Box::new(Expr::token(Comma)),
    min: 1,
    trailing: TrailingSeparator::Allow,
}

Common Patterns

Operator Precedence

Express precedence by ordering alternatives (lower precedence first):

// Expression with precedence: + < * < ( )
Expr::Choice(vec![
    // Addition (lowest precedence, tried first)
    Expr::Seq(vec![
        Expr::Rule(Expr),
        Expr::token(Plus),
        Expr::Rule(Term),
    ]),
    // Subtraction
    Expr::Seq(vec![
        Expr::Rule(Expr),
        Expr::token(Minus),
        Expr::Rule(Term),
    ]),
    // Base case (tried last)
    Expr::Rule(Term),
])

Term::Choice(vec![
    // Multiplication (higher precedence)
    Expr::Seq(vec![
        Expr::Rule(Term),
        Expr::token(Multiply),
        Expr::Rule(Factor),
    ]),
    // Base case
    Expr::Rule(Factor),
])

Factor::Choice(vec![
    // Parentheses (highest precedence)
    Expr::Delimited {
        open: Box::new(Expr::token(LParen)),
        content: Box::new(Expr::Rule(Expr)),
        close: Box::new(Expr::token(RParen)),
        recover: false,
    },
    // Numbers
    Expr::token(Number),
])

Left Associativity

Left associativity is natural in PEG (left-recursive rules work correctly):

// Left-associative: 1 - 2 - 3 = (1 - 2) - 3
Expr::Choice(vec![
    Expr::Seq(vec![
        Expr::Rule(Expr),  // Left-recursive
        Expr::token(Minus),
        Expr::Rule(Term),
    ]),
    Expr::Rule(Term),
])

Note: Left recursion is detected and handled, but may require special consideration.

Right Associativity

Right associativity requires right-recursive rules:

// Right-associative: 2^3^4 = 2^(3^4)
Expr::Choice(vec![
    Expr::Seq(vec![
        Expr::Rule(Term),
        Expr::token(Power),
        Expr::Rule(Expr),  // Right-recursive
    ]),
    Expr::Rule(Term),
])

Optional Trailing Elements

Handle optional trailing separators:

// Match: "a, b, c" or "a, b, c,"
Expr::Separated {
    item: Box::new(Expr::token(Ident)),
    separator: Box::new(Expr::token(Comma)),
    min: 1,
    trailing: TrailingSeparator::Allow,  // or Forbid, or Require
}

Advanced Patterns

Lookahead

Positive lookahead checks if expression matches without consuming:

// Match "a" only if followed by "b" (but don't consume "b")
Expr::Seq(vec![
    Expr::token(Token::A),
    Expr::Lookahead(Box::new(Expr::token(Token::B))),
])

Negative lookahead checks if expression doesn’t match:

// Match "a" only if NOT followed by "b"
Expr::Seq(vec![
    Expr::token(Token::A),
    Expr::NotLookahead(Box::new(Expr::token(Token::B))),
])

Error Recovery

Use recovery points to skip to known tokens on error:

// On error, skip to semicolon or brace
Expr::RecoveryPoint {
    expr: Box::new(Expr::Rule(Statement)),
    sync_tokens: vec![Semicolon, RBrace].into(),
}

Delimited Recovery

Recover from missing closing delimiters:

// Recover from missing closing parenthesis
Expr::Delimited {
    open: Box::new(Expr::token(LParen)),
    content: Box::new(Expr::Rule(Expr)),
    close: Box::new(Expr::token(RParen)),
    recover: true,  // Skip tokens until closing delimiter found
}

Incremental Parsing

The PEG parser supports incremental parsing with memoization:

use sipha::incremental::IncrementalParser;

let mut incremental = IncrementalParser::new(parser);
let result = incremental.parse_incremental(
    &tokens,
    old_tree,
    &edits,
    entry_point,
    Some(&grammar),
);

The memoization cache works with incremental parsing:

Node Reuse: Cached nodes are reused when possible
Cache Invalidation: Affected regions invalidate cache entries
Performance: Significant speedup for small edits

Grammar Requirements and Best Practices

What PEG Handles Well

PEG parsers excel at:

Precedence-based Languages: Operator precedence is natural through ordering
Expression Parsers: Arithmetic, logical, and comparison expressions
Structured Data: JSON, XML, configuration files
Interactive Tools: Fast parsing with memoization for IDEs and editors
Deterministic Parsing: Always produces one parse tree (no ambiguity)

Left Recursion

PEG parsers detect left recursion and handle it, but with limitations:

// Left-recursive rule
Expr -> Expr + Term | Term

// PEG detects this and adds a warning
// The parser will fail on left-recursive rules to prevent infinite loops

Best Practice: For left-recursive grammars, consider:

Using right-recursive alternatives where possible
Restructuring to avoid left recursion
Using iterative parsing for specific cases

Example - Avoiding Left Recursion:

// Instead of: Expr -> Expr + Term | Term
// Use iterative approach with repetition:

Expr::Seq(vec![
    Expr::Rule(Term),
    Expr::Repeat {
        expr: Box::new(Expr::Seq(vec![
            Expr::token(Plus),
            Expr::Rule(Term),
        ])),
        min: 0,
        max: None,
    },
])

Ambiguity Resolution

PEG grammars are always unambiguous because ordered choice resolves ambiguity:

// CFG: Expr -> Number | Number  (ambiguous)
// PEG: Expr -> Number | Number   (unambiguous - first wins)

// If you need different behavior, reorder:
Expr::Choice(vec![
    Expr::token(Number("42")),  // Specific case first
    Expr::token(Number("n")),   // General case second
])

Best Practice: Order alternatives from most specific to least specific.

Common Pitfalls

1. Order Matters

// Wrong: General case first
Expr::Choice(vec![
    Expr::token(Number),        // Matches everything
    Expr::token(Number("42")),  // Never reached
])

// Correct: Specific case first
Expr::Choice(vec![
    Expr::token(Number("42")),  // Specific case first
    Expr::token(Number),        // General case second
])

2. Greedy Repetition

// This matches as many as possible
Expr::Repeat {
    expr: Box::new(Expr::token(Number)),
    min: 0,
    max: None,
}
// Input: "123" matches all three digits

3. Backtracking Cost

Excessive backtracking can be slow. Use lookahead to avoid unnecessary backtracking:

// Expensive: Tries "ab", backtracks, tries "ac"
Expr::Choice(vec![
    Expr::Seq(vec![Expr::token(A), Expr::token(B)]),
    Expr::Seq(vec![Expr::token(A), Expr::token(C)]),
])

// Better: Use lookahead to avoid backtracking
Expr::Seq(vec![
    Expr::token(A),
    Expr::Choice(vec![
        Expr::Lookahead(Box::new(Expr::token(B))),
        Expr::token(C),
    ]),
])

Memoization Details

Packrat parsing caches results at each position:

Cache Key: (rule, position)
Cache Value: Parse result (success with node, or failure)
Cache Size: Bounded by max_memo_size
Eviction: Old entries evicted when cache is full

Benefits:

Linear Time: O(n) for many grammars
Node Reuse: Cached nodes can be reused
Incremental: Works well with incremental parsing

Ordered Choice Details

Ordered choice (A | B) semantics:

Try A
If A succeeds, return success (never try B)
If A fails, backtrack and try B
If both fail, return failure

This differs from CFG where both alternatives are considered.

Performance

PEG parsing performance:

With Memoization: O(n) time, O(n) space (packrat parsing)
Without Memoization: O(n) to exponential time, O(1) space
Incremental: Fast for small edits (benefits from memoization)
Memory: Cache size depends on grammar and input

When to Use PEG

Ideal Use Cases

1. Expression Parsers

PEG excels at parsing expressions with operator precedence:

// Natural expression parsing
// Input: "1 + 2 * 3"
// Result: 1 + (2 * 3)  (correct precedence)

// Grammar is intuitive - lower precedence tried first
Expr -> Expr + Term | Term
Term -> Term * Factor | Factor

Why PEG: Precedence is expressed naturally through ordering, no need for precedence tables.

2. Interactive Tools (IDEs, Editors)

PEG with memoization provides fast incremental parsing:

Fast re-parsing: Memoization caches results
Incremental updates: Only affected regions re-parsed
Responsive: Good performance for real-time editing

Why PEG: Memoization makes repeated parsing very fast.

3. Structured Data Formats

PEG handles delimited and separated structures well:

// JSON-like structures
Object -> "{" (Pair ("," Pair)*)? "}"
Array -> "[" (Value ("," Value)*)? "]"

Why PEG: Delimited and separated expressions are first-class.

4. Deterministic Parsing

When you need guaranteed unambiguous parsing:

No ambiguity: Ordered choice ensures one parse tree
Predictable: Same input always produces same output
No disambiguation needed: Parser makes choices automatically

Why PEG: Unambiguous by design.

When NOT to Use PEG

1. Truly Ambiguous Grammars

If you need to track or explore ambiguity:

Use GLR instead: GLR produces parse forests for ambiguous inputs
PEG hides ambiguity: Only one parse tree is produced

2. Very Large Files Without Memoization

Without memoization, some PEG grammars can be exponential:

Enable memoization: Use enable_memoization: true
Or use LR: LR parsers are consistently O(n) without memoization

3. Simple LL(1) Grammars

For simple, unambiguous grammars, LL(k) may be simpler:

LL is simpler: Table-based, easier to understand
PEG is more flexible: But adds complexity for simple cases

Limitations

PEG parser limitations:

Exponential Without Memoization: Can be slow for some grammars
Memory Usage: Memoization requires significant memory
Left Recursion: May require special handling
No Ambiguity Tracking: Ambiguity resolved by ordering (may hide issues)

Comparison with Other Backends

PEG vs LL Parser

Aspect	PEG	LL(k)
Approach	Top-down with backtracking	Top-down predictive
Choice	Ordered (first match wins)	Based on lookahead
Left Recursion	Detected, may need handling	Can eliminate
Ambiguity	Resolved by ordering	Requires lookahead > 1
Performance	O(n) with memoization	O(n) always
Memory	O(n) with memoization	O(grammar size)
Best For	Precedence, interactive tools	Simple grammars, IDEs

Choose PEG when: You need ordered choice semantics or memoization benefits.

Choose LL when: Your grammar is LL(k) compatible and you want simplicity.

PEG vs LR Parser

Aspect	PEG	LR
Approach	Top-down with backtracking	Bottom-up shift-reduce
Choice	Ordered	Based on parse table
Left Recursion	Detected, may need handling	Natural
Backtracking	Yes (with limits)	No
Performance	O(n) with memoization	O(n) always
Error Recovery	Good	Excellent
Best For	Precedence, expressions	Batch parsing, languages

Choose PEG when: You need ordered choice or top-down semantics.

Choose LR when: You need excellent error recovery or batch processing.

PEG vs GLR Parser

Aspect	PEG	GLR
Approach	Top-down deterministic	Bottom-up with ambiguity
Choice	Ordered (unambiguous)	All alternatives (ambiguous)
Ambiguity	Resolved by ordering	Tracked in parse forest
Performance	O(n) with memoization	O(n) to O(n³)
Best For	Deterministic parsing	Ambiguous grammars

Choose PEG when: You want guaranteed unambiguous parsing.

Choose GLR when: Your grammar has inherent ambiguities you need to explore.

Practical Examples

Example 1: Arithmetic Expressions

// Grammar for: 1 + 2 * 3
let grammar = GrammarBuilder::new()
    .entry_point(Expr)
    .rule(Expr, Expr::Choice(vec![
        // Addition (lowest precedence, tried first)
        Expr::Seq(vec![
            Expr::Rule(Expr),
            Expr::token(Plus),
            Expr::Rule(Term),
        ]),
        // Base case
        Expr::Rule(Term),
    ]))
    .rule(Term, Expr::Choice(vec![
        // Multiplication (higher precedence)
        Expr::Seq(vec![
            Expr::Rule(Term),
            Expr::token(Multiply),
            Expr::Rule(Factor),
        ]),
        // Base case
        Expr::Rule(Factor),
    ]))
    .rule(Factor, Expr::Choice(vec![
        // Parentheses
        Expr::Delimited {
            open: Box::new(Expr::token(LParen)),
            content: Box::new(Expr::Rule(Expr)),
            close: Box::new(Expr::token(RParen)),
            recover: false,
        },
        // Numbers
        Expr::token(Number),
    ]))
    .build()?;

Example 2: JSON-like Lists

// Grammar for: [1, 2, 3] or [1, 2, 3,]
let grammar = GrammarBuilder::new()
    .entry_point(Array)
    .rule(Array, Expr::Delimited {
        open: Box::new(Expr::token(LBracket)),
        content: Box::new(Expr::Separated {
            item: Box::new(Expr::Rule(Value)),
            separator: Box::new(Expr::token(Comma)),
            min: 0,
            trailing: TrailingSeparator::Allow,
        }),
        close: Box::new(Expr::token(RBracket)),
        recover: true,
    })
    .build()?;

Example 3: Error Recovery

// Grammar with error recovery
let grammar = GrammarBuilder::new()
    .entry_point(Program)
    .rule(Program, Expr::Repeat {
        expr: Box::new(Expr::RecoveryPoint {
            expr: Box::new(Expr::Rule(Statement)),
            sync_tokens: vec![Semicolon, RBrace].into(),
        }),
        min: 0,
        max: None,
    })
    .build()?;

Next Steps

See Choosing a Backend for comparison
Check Examples for usage
Learn about Incremental Parsing

Choosing a Backend

This guide helps you choose the right parsing backend for your use case.

Quick Comparison

Feature	LL(k)	LR	GLR	PEG
Grammar compatibility	LL(k)	LR	Any	Most
Ambiguity handling	No	No	Yes	No (ordered choice)
Left recursion	With elimination	Yes	Yes	Yes
Error recovery	Good	Excellent	Good	Good
Incremental parsing	Yes	Yes	Yes	Yes
Performance	Fast	Fast	Slower (ambiguous)	Fast (with memo)
Memory usage	Small	Medium	Medium	Medium (with memo)
Complexity	Low	Medium	High	Medium

Decision Tree

flowchart TD
    Start[Choose Backend] --> Ambiguous{Grammar<br/>Ambiguous?}
    Ambiguous -->|Yes| GLR[Use GLR Parser]
    Ambiguous -->|No| UseCase{Primary<br/>Use Case?}
    
    UseCase -->|Expression Parser<br/>Precedence-based| PEG1[Use PEG Parser]
    UseCase -->|Interactive Tools<br/>IDEs/Editors| Interactive{Need<br/>Memoization?}
    UseCase -->|Batch Processing| LR[Use LR Parser]
    UseCase -->|Complex Language| GLR2[Use GLR Parser]
    UseCase -->|Simple Grammar| LL[Use LL Parser]
    
    Interactive -->|Yes| PEG2[Use PEG Parser]
    Interactive -->|No| LL2[Use LL Parser]
    
    GLR --> End[Selected Backend]
    GLR2 --> End
    LR --> End
    LL --> End
    LL2 --> End
    PEG1 --> End
    PEG2 --> End
    
    style GLR fill:#ffccbc,color:#000000
    style GLR2 fill:#ffccbc,color:#000000
    style LR fill:#fff9c4,color:#000000
    style LL fill:#c8e6c9,color:#000000
    style LL2 fill:#c8e6c9,color:#000000
    style PEG1 fill:#e1bee7,color:#000000
    style PEG2 fill:#e1bee7,color:#000000
    style End fill:#e1f5ff,color:#000000

Step-by-Step Decision Process

1. Is your grammar ambiguous?

Yes → Use GLR
No → Continue to step 2

2. What’s your primary use case?

Expression parser / Precedence-based → Use PEG (natural ordered choice)
Interactive tools (IDEs, editors) → Continue to step 3
Batch parsing → Use LR (efficient, good error recovery)
Complex language (C++, etc.) → Use GLR (handles ambiguities)
Simple grammar → Use LL(k) (simplest)

3. For Interactive Tools: Need memoization?

Yes → Use PEG (memoization provides fast re-parsing)
No → Use LL(k) (simpler, table-based)

Detailed Scenarios

Scenario 1: Simple Expression Parser

Grammar: Arithmetic expressions with precedence (e.g., 1 + 2 * 3)

Recommendation: PEG or LL(k) with left-recursion elimination

Reasoning:

Grammar is unambiguous
PEG’s ordered choice makes precedence natural - lower precedence operators tried first
Left recursion can be eliminated (LL) or handled (PEG)
Simple and fast
Good for interactive tools

PEG Grammar Example:

// Precedence: + < * < ( )
// Lower precedence tried first in PEG
Expr -> Expr + Term | Term      // + tried first (lower precedence)
Term -> Term * Factor | Factor  // * tried later (higher precedence)
Factor -> ( Expr ) | Number     // () tried last (highest precedence)

Why PEG works well: The ordering naturally expresses precedence without needing precedence tables.

Scenario 2: Language Server

Grammar: Full programming language

Recommendation: LL(k), LR, or PEG

Reasoning:

Need incremental parsing (all support it)
LL(k) is simpler for error messages
LR has better error recovery
PEG has memoization for fast parsing
Choose based on grammar characteristics

Scenario 3: C++ Parser

Grammar: C++ with template syntax ambiguities

Recommendation: GLR

Reasoning:

Grammar has inherent ambiguities
Need to handle multiple parse trees
Can disambiguate using precedence/associativity

Scenario 4: Batch Parser

Grammar: Any unambiguous grammar

Recommendation: LR

Reasoning:

Efficient for batch processing
Good error recovery
Handles left recursion naturally

Performance Considerations

LL(k)

Fast: O(n) parsing time
Small tables: Efficient memory usage
Good for: Interactive tools, small to medium files

LR

Fast: O(n) parsing time
Medium tables: Reasonable memory usage
Good for: Batch processing, large files

GLR

Variable: O(n) to O(n³) depending on ambiguity
Medium to large tables: More memory for forests
Good for: Ambiguous grammars, complex languages

PEG

With memoization: O(n) parsing time, O(n) space
Without memoization: O(n) to exponential time, O(1) space
Medium memory: Cache size depends on grammar
Good for: Precedence-based languages, interactive tools

Grammar Compatibility

LL(k) Compatible

No left recursion (or can be eliminated)
No ambiguity (or k > 1)
Can compute FIRST/FOLLOW sets

LR Compatible

No ambiguity
No shift/reduce conflicts
No reduce/reduce conflicts

GLR Compatible

Any grammar
Handles all conflicts
Handles all ambiguities

PEG Compatible

Most grammars: Very flexible
Left recursion: Detected and handled (may require restructuring)
Ordered choice: Resolves ambiguity automatically
Best for: Precedence-based expressions, structured data, interactive tools

Writing PEG Grammars:

Order alternatives from most specific to least specific
Lower precedence operators should be tried first
Use repetition for lists and sequences
Consider memoization for performance

Example - Precedence:

// Correct: Lower precedence first
Expr -> Expr + Term | Term      // + tried first
Term -> Term * Factor | Factor  // * tried later (higher precedence)

// This naturally gives * higher precedence than +

Migration Path

From LL to LR

If you need better error recovery or left recursion:

Update grammar (if needed)
Change parser type
Update configuration
Test thoroughly

From LR to GLR

If you encounter ambiguities:

Keep same grammar
Change parser type
Add disambiguation logic
Handle parse forests

From LL/LR to PEG

If you want ordered choice semantics or memoization:

Reorder alternatives: Most specific first, least specific last
Adjust precedence: Lower precedence operators tried first
Enable memoization: Set enable_memoization: true for performance
Handle left recursion: Restructure if needed (PEG detects it)
Test thoroughly - ordered choice may change parse results

Example Migration:

// LL/LR: Precedence via grammar structure or tables
// PEG: Precedence via ordering

// Before (LL/LR style - may need precedence hints):
Expr -> Term | Expr + Term
Term -> Factor | Term * Factor

// After (PEG style - ordering expresses precedence):
Expr -> Expr + Term | Term      // + tried first (lower precedence)
Term -> Term * Factor | Factor  // * tried later (higher precedence)

Recommendations by Use Case

Interactive Tools (IDEs, Editors)

Primary: LL(k) or PEG Alternative: LR if grammar requires it

Reasoning:

Both LL(k) and PEG are fast and support incremental parsing
PEG’s memoization can provide significant speedup for repeated parsing
PEG’s ordered choice makes precedence natural for expression languages
LL(k) is simpler for straightforward grammars

PEG Advantages for IDEs:

Memoization caches parse results, making re-parsing very fast
Incremental parsing benefits from cached nodes
Ordered choice makes grammar writing intuitive

Batch Processing

Primary: LR Alternative: LL(k) if grammar is simpler

Complex Languages

Primary: GLR Alternative: Transform grammar to be unambiguous, or use PEG if ambiguity can be resolved by ordering

PEG for Complex Languages:

Use PEG if ambiguities can be resolved by ordering alternatives
Use GLR if you need to explore or track multiple parse trees
PEG is deterministic - always produces one parse tree

Learning/Prototyping

Primary: LL(k) or PEG

Reasoning:

LL(k): Simplest to understand and use
PEG: Intuitive grammar writing (ordered choice is natural), good for learning parsing concepts

PEG for Learning:

Ordered choice is easy to understand
Precedence is expressed naturally
No need for precedence tables or complex grammar analysis

Next Steps

Read backend-specific documentation:
Check Examples for real-world usage

Green/Red Trees

Sipha uses an immutable green/red tree representation for syntax trees. This design provides efficient memory usage and fast tree operations.

Overview

The green/red tree architecture separates:

Green trees: Compact, shared representation stored in an arena
Red trees: Convenient API for traversing and querying the tree

This separation provides:

Efficient memory usage: Shared subtrees reduce memory footprint
Fast traversal: Optimized for common tree operations
Immutability: Trees are immutable, enabling safe sharing

Green Trees

Green trees are the storage layer:

Compact: Minimal memory overhead
Shared: Subtrees can be shared across multiple trees
Arena-allocated: Stored in an arena for efficient allocation
Immutable: Once created, cannot be modified

GreenNode

A GreenNode represents a node in the green tree:

pub struct GreenNode<K: SyntaxKind> {
    kind: K,
    children: Vec<SyntaxElement<K>>,
    // ... internal fields
}

Green nodes contain:

Kind: The syntax kind (terminal or non-terminal)
Children: Child elements (nodes or tokens)
Metadata: Internal metadata for tree operations

Red Trees

Red trees are the API layer:

Convenient: Easy-to-use API for traversal
Lazy: Created on-demand from green trees
Navigable: Rich navigation API (parent, siblings, etc.)
Queryable: Supports queries and visitors

SyntaxNode

A SyntaxNode represents a node in the red tree:

let root = SyntaxNode::new_root(green_root.clone());

// Navigate
for child in root.children() {
    println!("{:?}", child.kind());
}

// Find nodes
if let Some(expr) = root.descendants().find(|n| n.kind() == MySyntaxKind::Expr) {
    println!("Found expression");
}

Tree Structure

Elements

Tree elements can be:

Nodes: Non-terminal nodes (have children)
Tokens: Terminal nodes (leaf nodes)

pub enum SyntaxElement<K: SyntaxKind> {
    Node(SyntaxNode),
    Token(SyntaxToken),
}

Hierarchy

Trees form a hierarchy:

Root (NonTerminal)
├── Child1 (NonTerminal)
│   ├── Token1 (Terminal)
│   └── Token2 (Terminal)
└── Child2 (NonTerminal)
    └── Token3 (Terminal)

Tree Structure Visualization

graph TD
    subgraph Green["Green Tree (Storage)"]
        G1[GreenNode: Expr]
        G2[GreenNode: Term]
        G3[GreenToken: Number]
        G4[GreenToken: Plus]
        G5[GreenNode: Term]
        G6[GreenToken: Number]
        
        G1 --> G2
        G1 --> G4
        G1 --> G5
        G2 --> G3
        G5 --> G6
    end
    
    subgraph Red["Red Tree (API)"]
        R1[SyntaxNode: Expr]
        R2[SyntaxNode: Term]
        R3[SyntaxToken: Number]
        R4[SyntaxToken: Plus]
        R5[SyntaxNode: Term]
        R6[SyntaxToken: Number]
        
        R1 --> R2
        R1 --> R4
        R1 --> R5
        R2 --> R3
        R5 --> R6
    end
    
    G1 -.->|"new_root()"| R1
    
    style G1 fill:#c8e6c9,color:#000000
    style G2 fill:#c8e6c9,color:#000000
    style G3 fill:#ffccbc,color:#000000
    style G4 fill:#ffccbc,color:#000000
    style G5 fill:#c8e6c9,color:#000000
    style G6 fill:#ffccbc,color:#000000
    
    style R1 fill:#e1f5ff,color:#000000
    style R2 fill:#e1f5ff,color:#000000
    style R3 fill:#fff9c4,color:#000000
    style R4 fill:#fff9c4,color:#000000
    style R5 fill:#e1f5ff,color:#000000
    style R6 fill:#fff9c4,color:#000000

Memory Efficiency

Green/red trees are memory-efficient:

Sharing: Unchanged subtrees are shared across edits
Arena allocation: Nodes are allocated in an arena
Compact representation: Minimal overhead per node
Incremental updates: Only changed regions are recreated

Design Rationale

The green/red tree design is inspired by:

rust-analyzer: Uses similar architecture
rowan: Provides the green/red tree foundation
IDE requirements: Optimized for interactive editing

Benefits:

Fast incremental parsing: Reuse green nodes
Efficient memory: Share unchanged subtrees
Safe sharing: Immutability enables safe sharing
Rich API: Red trees provide convenient navigation

Conversion

Convert between green and red trees:

// Green to Red
let red_root = SyntaxNode::new_root(green_node.clone());

// Red to Green
let green_node = red_node.green();

Next Steps

Learn Working with Trees for traversal and queries
See Tree Manipulation for building and modifying trees

Working with Syntax Trees

This chapter shows how to traverse, query, and work with syntax trees in Sipha.

Creating a Red Tree

Create a red tree from a parse result:

use sipha::syntax::SyntaxNode;

let root = SyntaxNode::new_root(result.root.clone());

Traversal

Children

Iterate over direct children:

for child in root.children() {
    println!("Child: {:?}", child.kind());
}

Descendants

Iterate over all descendants (depth-first):

for descendant in root.descendants() {
    println!("Descendant: {:?}", descendant.kind());
}

Ancestors

Iterate over ancestors:

if let Some(node) = root.descendants().find(|n| n.kind() == MySyntaxKind::Expr) {
    for ancestor in node.ancestors() {
        println!("Ancestor: {:?}", ancestor.kind());
    }
}

Siblings

Access siblings:

if let Some(node) = root.descendants().find(|n| n.kind() == MySyntaxKind::Expr) {
    if let Some(next) = node.next_sibling() {
        println!("Next sibling: {:?}", next.kind());
    }
    if let Some(prev) = node.prev_sibling() {
        println!("Previous sibling: {:?}", prev.kind());
    }
}

Querying

Find by Kind

Find nodes by syntax kind:

let exprs: Vec<_> = root.descendants()
    .filter(|n| n.kind() == MySyntaxKind::Expr)
    .collect();

Find by Position

Find nodes at a specific position:

use sipha::syntax::TextSize;

let pos = TextSize::from(10);
if let Some(node) = root.token_at_offset(pos) {
    println!("Token at position {}: {:?}", pos, node.kind());
}

Find Parent

Find the parent of a node:

if let Some(parent) = node.parent() {
    println!("Parent: {:?}", parent.kind());
}

Text Access

Node Text

Get the text covered by a node:

let text = node.text();
println!("Node text: {}", text);

Token Text

Get the text of a token:

if let Some(token) = node.first_token() {
    println!("Token text: {}", token.text());
}

Text Range

Get the text range of a node:

let range = node.text_range();
println!("Range: {:?}", range);

Visitors

Use visitors to traverse trees:

#[cfg(feature = "visitor")]
use sipha::syntax::{SyntaxVisitor, SyntaxWalker};

struct MyVisitor;

impl SyntaxVisitor for MyVisitor {
    fn visit_node(&mut self, node: &SyntaxNode) {
        println!("Visiting: {:?}", node.kind());
    }
    
    fn visit_token(&mut self, token: &SyntaxToken) {
        println!("Token: {:?}", token.kind());
    }
}

let mut visitor = MyVisitor;
root.walk(&mut visitor);

Queries

Use XPath-like queries (requires query feature):

#[cfg(feature = "query")]
use sipha::syntax::{QueryBuilder, XPathQuery};

let query = QueryBuilder::new()
    .descendant(MySyntaxKind::Expr)
    .child(MySyntaxKind::Number)
    .build();

let results: Vec<_> = query.matches(&root).collect();

Pattern Matching

Match tree patterns:

// Match specific structure
if let Some(expr) = root.descendants().find(|n| {
    n.kind() == MySyntaxKind::Expr &&
    n.children().any(|c| c.kind() == MySyntaxKind::Number)
}) {
    println!("Found expression with number");
}

Collecting Information

Count Nodes

Count nodes by kind:

let expr_count = root.descendants()
    .filter(|n| n.kind() == MySyntaxKind::Expr)
    .count();

Collect Text

Collect text from nodes:

let all_text: String = root.descendants()
    .filter_map(|n| n.first_token())
    .map(|t| t.text().to_string())
    .collect();

Best Practices

Use iterators: Prefer iterator methods over manual traversal
Cache results: Cache frequently accessed nodes
Filter early: Filter nodes as early as possible
Use visitors: Use visitors for complex traversals
Leverage queries: Use queries for pattern matching

Next Steps

See Tree Manipulation for building and modifying trees
Check Examples for real-world usage

Tree Manipulation

This chapter shows how to build and modify syntax trees in Sipha.

Building Trees

GreenNodeBuilder

Use GreenNodeBuilder to build green trees:

use sipha::syntax::{GreenNodeBuilder, SyntaxKind};

let mut builder = GreenNodeBuilder::new();

// Start a node
builder.start_node(MySyntaxKind::Expr);

// Add tokens
builder.token(MySyntaxKind::Number, "42").unwrap();
builder.token(MySyntaxKind::Plus, "+").unwrap();
builder.token(MySyntaxKind::Number, "10").unwrap();

// Finish the node
let expr_node = builder.finish_node().unwrap();

// Build root
builder.start_node(MySyntaxKind::Root);
builder.finish_node().unwrap();
let root = builder.finish().unwrap();

Nested Structures

Build nested structures:

let mut builder = GreenNodeBuilder::new();

builder.start_node(MySyntaxKind::Expr);

// Nested expression
builder.start_node(MySyntaxKind::Term);
builder.token(MySyntaxKind::Number, "1").unwrap();
builder.finish_node().unwrap();

builder.token(MySyntaxKind::Plus, "+").unwrap();

// Another nested expression
builder.start_node(MySyntaxKind::Term);
builder.token(MySyntaxKind::Number, "2").unwrap();
builder.finish_node().unwrap();

builder.finish_node().unwrap();
let root = builder.finish().unwrap();

Modifying Trees

Trees are immutable, so modifications create new trees:

Replacing Nodes

Replace a node by rebuilding:

fn replace_node(old_node: &SyntaxNode, new_kind: MySyntaxKind) -> GreenNode<MySyntaxKind> {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(new_kind);
    
    // Copy children
    for child in old_node.children() {
        match child {
            SyntaxElement::Node(n) => {
                // Recursively replace
                let new_child = replace_node(&n, new_kind);
                // Add new child (simplified - actual API may differ)
            }
            SyntaxElement::Token(t) => {
                builder.token(t.kind(), t.text()).unwrap();
            }
        }
    }
    
    builder.finish_node().unwrap();
    builder.finish().unwrap()
}

Inserting Nodes

Insert nodes by rebuilding:

fn insert_child(parent: &SyntaxNode, new_child: GreenNode<MySyntaxKind>, index: usize) -> GreenNode<MySyntaxKind> {
    let mut builder = GreenNodeBuilder::new();
    builder.start_node(parent.kind());
    
    // Copy children before index
    for (i, child) in parent.children().enumerate() {
        if i == index {
            // Insert new child
            // (simplified - actual API may differ)
        }
        // Copy existing child
    }
    
    builder.finish_node().unwrap();
    builder.finish().unwrap()
}

Tree Transformation

Transform trees using visitors:

struct Transformer;

impl SyntaxVisitor for Transformer {
    fn visit_node(&mut self, node: &SyntaxNode) {
        // Transform node
        if node.kind() == MySyntaxKind::OldKind {
            // Replace with new kind
        }
    }
}

Incremental Updates

For incremental updates, reuse unchanged nodes:

use sipha::incremental::IncrementalParser;

// Initial parse
let result1 = parser.parse_incremental(&tokens, None, &[], entry, Some(&grammar));

// After edit, reuse unchanged nodes
let result2 = parser.parse_incremental(
    &new_tokens,
    Some(&result1.root),
    &edits,
    entry,
    Some(&grammar),
);

Best Practices

Use builders: Use GreenNodeBuilder for building trees
Reuse nodes: Reuse unchanged nodes in incremental updates
Immutable operations: All tree operations create new trees
Cache results: Cache frequently accessed nodes
Batch modifications: Batch multiple modifications together

Next Steps

See Working with Trees for traversal
Check Incremental Parsing for efficient updates

Error Handling Overview

Sipha provides comprehensive error handling with configurable error recovery strategies. This section covers error types, recovery mechanisms, and best practices for handling errors in your parsers.

Overview

Sipha’s error handling system is designed to:

Provide detailed error information with precise source locations
Recover from errors gracefully to continue parsing
Support rich diagnostics with context and suggestions (via diagnostics feature)
Distinguish between errors and warnings for different severity levels

Error Handling Flow

flowchart TD
    Start[Parse Input] --> Parse[Parser Processes Tokens]
    Parse --> Error{Error<br/>Encountered?}
    Error -->|No| Success[Parse Success]
    Error -->|Yes| Recover{Recovery<br/>Enabled?}
    Recover -->|Yes| Attempt[Attempt Recovery]
    Recover -->|No| Fail[Report Error]
    Attempt --> Success2[Continue Parsing]
    Attempt --> Fail2[Recovery Failed]
    Success --> Result[Return ParseResult]
    Success2 --> Result
    Fail --> Result
    Fail2 --> Result
    
    style Success fill:#c8e6c9,color:#000000
    style Success2 fill:#c8e6c9,color:#000000
    style Fail fill:#ffccbc,color:#000000
    style Fail2 fill:#ffccbc,color:#000000
    style Result fill:#e1f5ff,color:#000000

Key Concepts

Errors vs Warnings

Errors: Prevent successful parsing or indicate syntax violations
Warnings: Non-fatal issues that don’t prevent parsing

Error Recovery

Sipha parsers can automatically recover from errors to continue parsing, which is essential for:

IDEs: Providing real-time feedback as users type
Language Servers: Maintaining parse state despite errors
Interactive Tools: Allowing partial parsing results

Rich Diagnostics

With the diagnostics feature enabled, errors include:

Source code snippets
Highlighted error locations
Suggestions and hints
Related code locations

What’s Covered

This section covers:

Error Types: Understanding ParseError, LexerError, and ParseWarning
Working with Errors: Checking, reporting, and extracting error information
Error Recovery: Configuring and understanding error recovery strategies
Rich Diagnostics: Using miette for beautiful error reporting
Best Practices: Guidelines and common patterns

Quick Start

The most basic error handling:

use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;

let result = parser.parse(&tokens, entry_point);

// Always check for errors
if !result.errors.is_empty() {
    for error in &result.errors {
        eprintln!("Error: {}", error);
    }
    return Err("Parsing failed".into());
}

// Check for warnings
for warning in &result.warnings {
    eprintln!("Warning: {}", warning.message);
}

Next Steps

Learn about Error Types to understand what errors you might encounter
See Working with Errors for practical error handling patterns
Explore Error Recovery to configure recovery strategies

Error Types

Sipha defines several error types to represent different kinds of problems that can occur during parsing and lexing.

ParseError

ParseError represents errors that occur during parsing. It’s an enum with four variants:

use sipha::error::ParseError;
use sipha::syntax::{TextRange, TextSize};

// Unexpected token error
let error = ParseError::UnexpectedToken {
    span: TextRange::new(TextSize::from(10), TextSize::from(15)),
    expected: vec!["identifier".to_string(), "number".to_string()],
};

// Invalid syntax error
let error = ParseError::InvalidSyntax {
    span: TextRange::new(TextSize::from(0), TextSize::from(5)),
    message: "Invalid expression syntax".to_string(),
};

// Unexpected end of file
let error = ParseError::UnexpectedEof {
    span: TextRange::new(TextSize::from(100), TextSize::from(100)),
    expected: vec!["}".to_string(), ";".to_string()],
};

// Ambiguity error (GLR parser)
let error = ParseError::Ambiguity {
    span: TextRange::new(TextSize::from(20), TextSize::from(25)),
    alternatives: vec!["expression1".to_string(), "expression2".to_string()],
};

ParseError Variants

UnexpectedToken

Occurs when the parser encounters a token that doesn’t match any expected production.

Most common parse error
Contains a list of expected token types
Parser can often recover by skipping the unexpected token

Example:

// Input: "let x = 42 +"
// Error: UnexpectedToken with expected: ["number", "identifier", "("]

InvalidSyntax

Occurs when the input violates grammar rules in a way that can’t be attributed to a single unexpected token.

Contains a descriptive error message
Used for complex syntax violations
May require more context to understand

Example:

// Input: "function() { return }"
// Error: InvalidSyntax with message: "Missing return value"

UnexpectedEof

Occurs when the parser reaches the end of input but expects more tokens.

Contains a list of expected tokens
Common when delimiters are missing (e.g., unclosed parentheses)
Indicates incomplete input

Example:

// Input: "if (condition { ... }"
// Error: UnexpectedEof with expected: [")"]

Ambiguity (GLR backend only)

Occurs when the GLR parser finds multiple valid parse trees.

Contains a list of alternative interpretations
Requires disambiguation to resolve
Only occurs with GLR parser backend

Example:

// Input: "1 + 2 * 3" (ambiguous without precedence)
// Error: Ambiguity with alternatives: ["(1 + 2) * 3", "1 + (2 * 3)"]

LexerError

LexerError represents errors that occur during tokenization (lexing):

use sipha::error::{LexerError, LexerErrorKind};
use sipha::syntax::{TextRange, TextSize};

let error = LexerError::new(
    TextRange::new(TextSize::from(5), TextSize::from(6)),
    LexerErrorKind::UnexpectedChar { char: '!' },
);

LexerErrorKind Variants

UnexpectedChar

An invalid character was encountered that doesn’t match any token pattern.

Contains the unexpected character
Usually indicates a typo or unsupported character

Example:

// Input: "let x = 42#"
// Error: UnexpectedChar { char: '#' }

UnterminatedString

A string literal was started but never closed.

Common when quotes are mismatched
Indicates incomplete string literal

Example:

// Input: "let s = \"hello world"
// Error: UnterminatedString

InvalidEscape

An invalid escape sequence was found in a string.

Contains the invalid escape sequence
Common with malformed escape sequences

Example:

// Input: "let s = \"hello\\zworld\""
// Error: InvalidEscape { escape: "\\z" }

InvalidNumber

A number literal has invalid format.

Contains a reason explaining the issue
Common with malformed numeric literals

Example:

// Input: "let x = 123abc"
// Error: InvalidNumber { reason: "Invalid digit 'a' in number" }

UnexpectedEof

End of file was reached unexpectedly during tokenization.

Usually indicates incomplete input
Common when input is cut off mid-token

Example:

// Input: "let x = 4" (cut off)
// Error: UnexpectedEof

ParseWarning

ParseWarning represents non-fatal issues that don’t prevent parsing:

use sipha::error::{ParseWarning, Severity};
use sipha::syntax::{TextRange, TextSize};

// Create warnings with different severity levels
let warning = ParseWarning::warning(
    TextRange::new(TextSize::from(0), TextSize::from(10)),
    "Deprecated syntax used".to_string(),
);

let info = ParseWarning::info(
    TextRange::new(TextSize::from(10), TextSize::from(20)),
    "Consider using newer syntax".to_string(),
);

let hint = ParseWarning::hint(
    TextRange::new(TextSize::from(20), TextSize::from(30)),
    "This can be simplified".to_string(),
);

Severity Levels

Warning

Important issues that should be addressed.

Indicates potential problems
Should be shown to users
May indicate deprecated or problematic code

Info

Informational messages.

Provides context or suggestions
Less urgent than warnings
Can be shown optionally

Hint

Suggestions for improvement.

Optimization opportunities
Style suggestions
Can be shown in IDE tooltips

Converting Between Error Types

Lexer errors can be converted to parse errors:

use sipha::error::{LexerError, ParseError};

let lexer_error: LexerError = /* ... */;
let parse_error: ParseError = lexer_error.into();

This is useful when you want to handle all errors uniformly.

Error Methods

All error types provide useful methods:

ParseError Methods

use sipha::error::ParseError;

// Get the error location
let span = error.span();

// Format expected tokens as a readable string
let expected_str = error.format_expected();
// Returns: "identifier or number" or "identifier, number, or string"

// Get "did you mean" suggestions
if let Some(suggestion) = error.did_you_mean("identifer") {
    println!("{}", suggestion); // "Did you mean 'identifier'?"
}

// Get context around the error
if let Some((before, error_span, after)) = error.get_context(source_code) {
    println!("Context: ...{}[{}]{}...", before, error_span, after);
}

// Format error with full context
let formatted = error.format_with_context(source_code, Some("actual_token"));
println!("{}", formatted);

LexerError Methods

use sipha::error::LexerError;

// Get the error location
let span = error.span();

// Get the error kind
let kind = error.kind();

Next Steps

Learn about Working with Errors to see how to use these error types
See Error Recovery to understand how errors are handled
Check Best Practices for guidelines on error handling

Working with Errors

This chapter covers how to check, report, and extract information from errors in your parsers.

Working with Parse Results

After parsing, check the ParseResult for errors and warnings:

use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;
use sipha::syntax::SyntaxNode;
use sipha::error::Severity;

let result = parser.parse(&tokens, entry_point);

// Check for errors
if !result.errors.is_empty() {
    eprintln!("Parsing failed with {} errors:", result.errors.len());
    for error in &result.errors {
        eprintln!("  Error: {}", error);
        eprintln!("    Location: {:?}", error.span());
    }
}

// Check for warnings
if !result.warnings.is_empty() {
    eprintln!("Found {} warnings:", result.warnings.len());
    for warning in &result.warnings {
        eprintln!("  [{}] {} at {:?}", 
            match warning.severity {
                Severity::Warning => "WARNING",
                Severity::Info => "INFO",
                Severity::Hint => "HINT",
            },
            warning.message,
            warning.span
        );
    }
}

// Even with errors, a tree may be available
let root = SyntaxNode::new_root(result.root.clone());

Error Information

Getting Error Details

ParseError provides several methods for extracting information:

use sipha::error::ParseError;

// Get the error location
let span = error.span();

// Format expected tokens as a readable string
let expected_str = error.format_expected();
// Returns: "identifier or number" or "identifier, number, or string"

// Get "did you mean" suggestions
if let Some(suggestion) = error.did_you_mean("identifer") {
    println!("{}", suggestion); // "Did you mean 'identifier'?"
}

// Get context around the error
if let Some((before, error_span, after)) = error.get_context(source_code) {
    println!("Context: ...{}[{}]{}...", before, error_span, after);
}

// Format error with full context
let formatted = error.format_with_context(source_code, Some("actual_token"));
println!("{}", formatted);

Example: Comprehensive Error Reporting

use sipha::error::ParseError;

fn report_parse_errors(errors: &[ParseError], source: &str) {
    for (i, error) in errors.iter().enumerate() {
        println!("\nError {}: {}", i + 1, error);
        println!("  Location: {:?}", error.span());
        
        // Show context
        if let Some((before, error_span, after)) = error.get_context(source) {
            println!("  Context: ...{}[{}]{}...", before, error_span, after);
        }
        
        // Show expected tokens
        match error {
            ParseError::UnexpectedToken { expected, .. } 
            | ParseError::UnexpectedEof { expected, .. } => {
                if !expected.is_empty() {
                    println!("  Expected: {}", error.format_expected());
                }
            }
            ParseError::InvalidSyntax { message, .. } => {
                println!("  Details: {}", message);
            }
            ParseError::Ambiguity { alternatives, .. } => {
                println!("  Alternatives: {}", alternatives.join(", "));
            }
        }
    }
}

Handling Lexer Errors

Lexer errors occur during tokenization and can be handled separately:

use sipha::lexer::LexerBuilder;
use sipha::error::{LexerError, LexerErrorKind};

let lexer = LexerBuilder::new()
    // ... configure lexer
    .build(eof_kind, default_kind)?;

match lexer.tokenize(input) {
    Ok(tokens) => {
        // Tokenization succeeded
    }
    Err(errors) => {
        // Handle lexer errors
        for error in errors {
            eprintln!("Lexer error at {:?}: {}", error.span(), error.kind());
            
            match error.kind() {
                LexerErrorKind::UnexpectedChar { char } => {
                    eprintln!("  Unexpected character: '{}'", char);
                }
                LexerErrorKind::UnterminatedString => {
                    eprintln!("  String literal not closed");
                }
                LexerErrorKind::InvalidEscape { escape } => {
                    eprintln!("  Invalid escape sequence: {}", escape);
                }
                LexerErrorKind::InvalidNumber { reason } => {
                    eprintln!("  Invalid number: {}", reason);
                }
                LexerErrorKind::UnexpectedEof => {
                    eprintln!("  Unexpected end of file");
                }
            }
        }
    }
}

Lexer errors can also be converted to parse errors:

use sipha::error::{LexerError, ParseError};

let lexer_error: LexerError = /* ... */;
let parse_error: ParseError = lexer_error.into();

Error Metrics

ParseResult includes metrics about the parsing process:

let result = parser.parse(&tokens, entry_point);

println!("Parsing metrics:");
println!("  Tokens consumed: {}", result.metrics.tokens_consumed);
println!("  Nodes created: {}", result.metrics.nodes_created);
println!("  Errors recovered: {}", result.metrics.errors_recovered);
println!("  Cache hits: {}", result.metrics.cache_hits);
println!("  Parse time: {:?}", result.metrics.parse_time);

These metrics are useful for:

Performance analysis: Understanding parse performance
Debugging: Identifying problematic grammar rules
Optimization: Finding bottlenecks in parsing

Error Handling in Incremental Parsing

Incremental parsing maintains error information across edits:

use sipha::incremental::IncrementalParser;

let mut incremental = IncrementalParser::new(parser);

// Initial parse
let result1 = incremental.parse_incremental(/* ... */);

// After edit, errors are updated for affected regions
let result2 = incremental.parse_incremental(/* ... */);

// Errors from previous parse are preserved if not affected by edit

Next Steps

Learn about Error Recovery to understand how errors are handled automatically
See Rich Diagnostics for beautiful error reporting
Check Best Practices for guidelines on error handling

Error Recovery

Sipha parsers attempt to recover from errors and continue parsing. This is essential for providing good user experience in IDEs and language servers.

Overview

Error recovery allows parsers to:

Continue parsing despite errors
Provide partial results even when input has errors
Maintain parse state for interactive applications
Report multiple errors in a single pass

Automatic Recovery

By default, parsers use automatic error recovery:

use sipha::backend::ll::{LlParser, LlConfig};

// Default config includes error recovery
let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)?;

// Parser will attempt to recover from errors
let result = parser.parse(&tokens, entry_point);

// Check how many errors were recovered from
println!("Errors recovered: {}", result.metrics.errors_recovered);

Even when errors occur, the parser will:

Report the error
Attempt to recover
Continue parsing
Return a parse tree (possibly incomplete)

Recovery Strategies

Parsers use several strategies to recover:

1. Token Skipping

Skip unexpected tokens and continue parsing.

Example:

// Input: "let x = 42 + + 10"
// Parser skips the second "+" and continues
// Error reported, but parsing continues

2. Token Insertion

Insert missing tokens (e.g., semicolons, closing delimiters).

Example:

// Input: "let x = 42 let y = 10"
// Parser may insert a semicolon after "42"
// Error reported, but both statements parsed

3. Rule Skipping

Skip to next statement or block when current rule fails.

Example:

// Input: "let x = invalid; let y = 10;"
// Parser skips the first statement and continues with the second
// Error reported for first statement, second statement parsed successfully

4. Best Effort

Continue parsing despite errors, producing the best possible result.

Example:

// Input with multiple errors
// Parser attempts to parse as much as possible
// All errors reported, partial tree returned

The specific strategy depends on the parser backend and configuration.

Configuring Recovery

Different backends support different recovery options:

use sipha::backend::ll::{LlParser, LlConfig};

let config = LlConfig {
    error_recovery: true,  // Enable error recovery
    max_recovery_attempts: 10,  // Limit recovery attempts
    ..Default::default()
};

let mut parser = LlParser::new(&grammar, config)?;

Recovery Options

error_recovery (bool)

Enable or disable error recovery
Default: true
When disabled, parser stops at first error

max_recovery_attempts (usize)

Maximum number of recovery attempts per parse
Default: varies by backend
Prevents infinite recovery loops

When to Use Recovery

Use Recovery For:

IDEs and Language Servers: Need to parse incomplete or erroneous code
Interactive Tools: Users are actively editing code
Error Reporting: Want to find all errors in one pass
Partial Results: Need parse tree even with errors

Disable Recovery For:

Strict Validation: Want to fail fast on errors
Batch Processing: Errors should stop processing
Testing: Want to catch all errors immediately
Performance: Recovery has overhead

Recovery Examples

Example 1: Missing Delimiter

// Input: "if (condition { ... }"
// Parser behavior:
// 1. Reports UnexpectedEof error expecting ")"
// 2. May insert ")" and continue
// 3. Parses the block successfully
// Result: Error reported, but parse tree includes the block

Example 2: Unexpected Token

// Input: "let x = 42 +"
// Parser behavior:
// 1. Reports UnexpectedToken error
// 2. Skips the "+" or inserts placeholder
// 3. Continues parsing
// Result: Error reported, variable declaration partially parsed

Example 3: Invalid Syntax

// Input: "function() { return }"
// Parser behavior:
// 1. Reports InvalidSyntax error
// 2. Marks statement as incomplete
// 3. Continues parsing rest of function
// Result: Error reported, function body partially parsed

Recovery Quality

Recovery quality depends on:

Grammar structure: Well-structured grammars recover better
Error location: Errors at statement boundaries recover better
Parser backend: Different backends have different recovery strategies
Configuration: Recovery settings affect behavior

Limitations

Error recovery has some limitations:

May produce incorrect trees: Recovered trees may not match intended structure
Can mask errors: Multiple errors may be reduced to fewer reported errors
Performance overhead: Recovery adds processing time
Not always possible: Some errors can’t be recovered from

Best Practices

Always check errors: Even with recovery, check result.errors
Validate recovered trees: Don’t trust recovered parse trees completely
Limit recovery attempts: Use max_recovery_attempts to prevent loops
Consider context: Recovery works better with more context
Test recovery: Test your grammar with various error scenarios

Next Steps

See Working with Errors for how to handle recovered errors
Learn about Rich Diagnostics for better error reporting
Check Best Practices for guidelines on error recovery

Rich Diagnostics

When the diagnostics feature is enabled, errors integrate with miette for beautiful error reporting with source code snippets, highlighted locations, and helpful suggestions.

Enabling Diagnostics

Add the diagnostics feature to your Cargo.toml:

[dependencies]
sipha = { version = "0.5.0", features = ["diagnostics"] }

This enables:

Rich error formatting with source code snippets
Highlighted error locations
Suggestions and hints
Integration with editor tooling

Using Diagnostics

Errors automatically implement the Diagnostic trait from miette:

#[cfg(feature = "diagnostics")]
use sipha::error::ParseError;

// Errors automatically implement Diagnostic trait
let error: ParseError = /* ... */;

// Print with rich formatting
eprintln!("{:#}", error);

// Errors include:
// - Source code snippets
// - Highlighted error locations
// - Suggestions and hints
// - Related code locations

Example Output

With diagnostics enabled, errors are displayed with rich formatting:

Error: Unexpected token
  ┌─ example.txt:10:15
  │
10 │     let x = 42 +
  │               ^ Unexpected token
  │
  = Expected: identifier, number, or operator
  = Help: Did you mean to add an expression after '+'?

Diagnostic Features

Source Code Snippets

Diagnostics automatically include source code around the error:

let error: ParseError = /* ... */;
eprintln!("{:#}", error);
// Shows:
//   ┌─ file.rs:42:10
//   │
//42 │     let x = invalid
//   │            ^^^^^^^^ Error here

Highlighted Locations

Error locations are highlighted in the source code:

// The error span is automatically highlighted
// Multiple related errors can be shown together

Suggestions

Diagnostics can include suggestions:

// "Did you mean" suggestions
// Fix suggestions
// Helpful hints

Multiple related errors can be shown together:

// Errors at different locations
// Related errors in the same context
// Error chains

Integration with Editors

Diagnostics integrate with editor tooling:

Language Servers: Errors are reported via LSP
IDE Integration: Errors shown in editor
Terminal Output: Beautiful terminal output
JSON Output: Structured error data

Custom Diagnostic Messages

You can customize diagnostic messages:

use sipha::error::ParseError;

// Create errors with custom messages
let error = ParseError::invalid_syntax_with_suggestion(
    span,
    "Invalid expression",
    "Try adding parentheses around the expression"
);

Diagnostic Severity

Diagnostics support different severity levels:

Error: Fatal errors that prevent parsing
Warning: Non-fatal issues (via ParseWarning)
Info: Informational messages
Hint: Suggestions for improvement

Example: Full Diagnostic Output

#[cfg(feature = "diagnostics")]
use sipha::error::ParseError;
use miette::Diagnostic;

fn report_with_diagnostics(error: &ParseError, source: &str) {
    // Print with full diagnostic formatting
    eprintln!("{:#}", error);
    
    // Or use miette's reporting features
    // miette::set_hook(/* ... */);
}

Performance Considerations

Diagnostics have minimal performance impact:

Compile-time: Feature is gated, no overhead when disabled
Runtime: Small overhead for formatting
Memory: Source code snippets are included in errors

When to Use Diagnostics

Use diagnostics for:

IDEs and Language Servers: Better user experience
Development Tools: Clearer error messages
Documentation: Better error examples
Debugging: Easier to understand errors

Next Steps

See Working with Errors for basic error handling
Learn about Error Recovery for recovery strategies
Check Best Practices for guidelines

Best Practices

This chapter provides guidelines and common patterns for error handling in Sipha parsers.

Always Check for Errors

Always check for errors, even if you expect success:

let result = parser.parse(&tokens, entry_point);

// Always check errors, even if you expect success
if !result.errors.is_empty() {
    // Handle errors appropriately
    return Err(format!("Parsing failed: {} errors", result.errors.len()));
}

Why: Errors can occur even in valid-looking input due to:

Grammar ambiguities
Edge cases
Recovery attempts
Incomplete input

Provide Context in Error Messages

When reporting errors to users, include:

Source location: Line and column numbers
Context: Surrounding code
Suggestions: What was expected or how to fix it

use sipha::error::ParseError;

fn report_error(error: &ParseError, source: &str) {
    println!("Error: {}", error);
    println!("Location: {:?}", error.span());
    
    // Show context
    if let Some((before, error_span, after)) = error.get_context(source) {
        println!("Context: ...{}[{}]{}...", before, error_span, after);
    }
    
    // Show suggestions
    if let Some(suggestion) = error.did_you_mean("actual_token") {
        println!("Suggestion: {}", suggestion);
    }
}

Handle Warnings Appropriately

Warnings don’t prevent parsing, but should be addressed:

use sipha::error::Severity;

// Warnings don't prevent parsing, but should be addressed
for warning in &result.warnings {
    match warning.severity {
        Severity::Warning => {
            // Log or report warnings
            eprintln!("Warning: {}", warning.message);
        }
        Severity::Info => {
            // Optionally show info messages
            // Useful for development tools
        }
        Severity::Hint => {
            // Hints can be shown in IDE tooltips
            // Less urgent, more suggestions
        }
    }
}

Use Error Recovery Wisely

Error recovery is great for IDEs, but consider disabling it for:

Strict validation: When you want to fail fast
Batch processing: When errors should stop processing
Testing: When you want to catch all errors

use sipha::backend::ll::{LlParser, LlConfig};

// For strict validation
let strict_config = LlConfig {
    error_recovery: false,  // Disable recovery
    ..Default::default()
};

// For IDE/language server
let ide_config = LlConfig {
    error_recovery: true,  // Enable recovery
    max_recovery_attempts: 100,  // Allow many attempts
    ..Default::default()
};

Leverage Diagnostics

If building an IDE or language server, enable the diagnostics feature for:

Better error messages
Source code snippets
Helpful suggestions
Integration with editor tooling

[dependencies]
sipha = { version = "0.5.0", features = ["diagnostics"] }

Common Error Patterns

Pattern 1: Missing Delimiter

// Input: "if (condition { ... }"
// Error: UnexpectedEof with expected: [")"]

// Handling:
// 1. Report the error with context
// 2. Show what delimiter is missing
// 3. Suggest where to add it

Pattern 2: Unexpected Token

// Input: "let x = 42 +"
// Error: UnexpectedToken with expected: [number, identifier, "("]

// Handling:
// 1. Show the unexpected token
// 2. List expected alternatives
// 3. Provide "did you mean" suggestions

Pattern 3: Invalid Syntax

// Input: "function() { return }"
// Error: InvalidSyntax with message: "Missing return value"

// Handling:
// 1. Explain the syntax violation
// 2. Show the problematic code
// 3. Suggest how to fix it

Error Handling Checklist

When implementing error handling:

Check result.errors after every parse
Check result.warnings for non-fatal issues
Provide context in error messages
Use appropriate severity levels for warnings
Enable diagnostics for better error reporting
Test error recovery behavior
Handle lexer errors separately if needed
Consider error metrics for debugging

Performance Considerations

Error handling has minimal performance impact:

Error checking: Negligible overhead
Error recovery: Small overhead, but enables partial results
Diagnostics: Small overhead for formatting
Metrics: Minimal overhead for tracking

Testing Error Handling

Test your error handling:

#[test]
fn test_error_handling() {
    let result = parser.parse(&invalid_tokens, entry_point);
    
    // Verify errors are reported
    assert!(!result.errors.is_empty());
    
    // Verify error details
    let error = &result.errors[0];
    assert!(error.span().start() > TextSize::zero());
    
    // Verify recovery (if enabled)
    if config.error_recovery {
        assert!(result.metrics.errors_recovered > 0);
    }
}

Error Handling in Production

For production code:

Log errors: Use proper logging framework
Monitor metrics: Track error rates and recovery success
User-friendly messages: Translate technical errors to user messages
Graceful degradation: Continue operation when possible
Error reporting: Collect error data for improvement

Next Steps

Review Error Types to understand error variants
See Working with Errors for practical patterns
Learn about Error Recovery for recovery strategies
Explore Rich Diagnostics for better error reporting

Performance

This chapter covers performance optimization tips and benchmarking for Sipha.

General Tips

Use Incremental Parsing

For interactive applications, always use incremental parsing:

let mut incremental = IncrementalParser::new(parser);
// Much faster for small edits

Choose the Right Backend

LL(k): Fast, good for most grammars
LR: Fast, good for batch parsing
GLR: Slower, only use for ambiguous grammars

Optimize Grammar

Factor common patterns: Extract common subexpressions
Avoid deep nesting: Flatten deeply nested structures
Use appropriate lookahead: Don’t use more lookahead than needed

Lexer Performance

Pattern Ordering

Order patterns from most specific to least specific:

// Good: Specific patterns first
.token(MySyntaxKind::Keyword, Pattern::Literal("if".into()))
.token(MySyntaxKind::Ident, Pattern::CharClass(CharSet::letters()))

// Bad: General patterns first
.token(MySyntaxKind::Ident, Pattern::CharClass(CharSet::letters()))
.token(MySyntaxKind::Keyword, Pattern::Literal("if".into()))

Use Character Classes

Prefer character classes over regex when possible:

// Good: Character class
.token(MySyntaxKind::Number, Pattern::CharClass(CharSet::digits()))

// Slower: Regex
.token(MySyntaxKind::Number, Pattern::Regex(r"\d+".into()))

Keyword Trie

Keywords are matched using a trie for O(m) performance where m is keyword length.

Parser Performance

Cache Population

Always provide grammar for cache population:

// Good: Cache populated
parser.parse_incremental(&tokens, old_tree, &edits, entry, Some(&grammar))

// Bad: No cache
parser.parse_incremental(&tokens, old_tree, &edits, entry, None)

Reuse Budget

Adjust reuse budget based on use case:

// For large files
let budget = ReuseBudget::Heuristic {
    max_depth: 30,
    max_nodes: 2000,
};

// For small files
let budget = ReuseBudget::Fixed(100);

Memory Optimization

Arena Allocation

Green nodes are arena-allocated for efficient memory usage.

Incremental parsing shares unchanged nodes, reducing memory usage.

Cache Eviction

Cache automatically evicts old versions:

// Keep last 2 versions (default)
cache.evict_old_entries(2);

Benchmarking

Measure Parse Time

let start = std::time::Instant::now();
let result = parser.parse(&tokens, entry);
let duration = start.elapsed();

println!("Parse time: {:?}", duration);
println!("Metrics: {:?}", result.metrics);

Compare Backends

// LL parser
let ll_start = Instant::now();
ll_parser.parse(&tokens, entry);
let ll_time = ll_start.elapsed();

// LR parser
let lr_start = Instant::now();
lr_parser.parse(&tokens, entry);
let lr_time = lr_start.elapsed();

Profiling

Use a Profiler

Profile your parser to find bottlenecks:

# With perf
perf record --call-graph=dwarf cargo run --release
perf report

# With valgrind
valgrind --tool=callgrind cargo run --release

Common Bottlenecks

Lexer: Pattern matching, especially regex
Parser: Table lookups, especially for large grammars
Tree construction: Node allocation and linking

Best Practices

Profile first: Profile before optimizing
Measure changes: Benchmark before and after changes
Use incremental parsing: Always use for interactive apps
Optimize grammar: Factor and simplify grammar
Choose right backend: Use appropriate backend for use case

Next Steps

See Grammar Analysis for grammar optimization
Check Architecture for design considerations

Custom Patterns

This chapter shows how to extend lexer patterns with custom matching logic.

Custom Matchers

For complex tokenization logic, use custom matchers:

let lexer = LexerBuilder::new()
    .custom_token(MySyntaxKind::String, |text, pos| {
        // Custom string matching
        if text[pos..].starts_with('"') {
            let mut end = pos + 1;
            while end < text.len() && text.as_bytes()[end] != b'"' {
                if text.as_bytes()[end] == b'\\' {
                    end += 2; // Skip escape sequence
                } else {
                    end += 1;
                }
            }
            if end < text.len() {
                let value = text[pos+1..end].to_string();
                Some((end - pos + 1, TokenValue::String(value)))
            } else {
                None // Unterminated string
            }
        } else {
            None
        }
    })
    .build(MySyntaxKind::Eof, MySyntaxKind::Ident)
    .expect("Failed to build lexer");

Custom Matcher Signature

Custom matchers have this signature:

Fn(&str, usize) -> Option<(usize, TokenValue)>

Where:

Input: &str is the text, usize is the starting position
Output: Option<(usize, TokenValue)> where:
- usize is the number of characters matched
- TokenValue is the token’s value (if any)

TokenValue

Token values can be:

pub enum TokenValue {
    Integer(i64),
    Float(f64),
    String(String),
    None,
}

Examples

String Literal

.custom_token(MySyntaxKind::String, |text, pos| {
    if !text[pos..].starts_with('"') {
        return None;
    }
    
    let mut end = pos + 1;
    while end < text.len() {
        match text.as_bytes()[end] {
            b'"' => {
                let value = text[pos+1..end].to_string();
                return Some((end - pos + 1, TokenValue::String(value)));
            }
            b'\\' if end + 1 < text.len() => {
                end += 2; // Skip escape sequence
            }
            _ => end += 1,
        }
    }
    
    None // Unterminated string
})

Floating Point Number

.custom_token(MySyntaxKind::Float, |text, pos| {
    let mut end = pos;
    let mut has_dot = false;
    
    // Optional sign
    if end < text.len() && (text.as_bytes()[end] == b'+' || text.as_bytes()[end] == b'-') {
        end += 1;
    }
    
    // Digits before dot
    while end < text.len() && text.as_bytes()[end].is_ascii_digit() {
        end += 1;
    }
    
    // Dot
    if end < text.len() && text.as_bytes()[end] == b'.' {
        has_dot = true;
        end += 1;
    }
    
    // Digits after dot
    while end < text.len() && text.as_bytes()[end].is_ascii_digit() {
        end += 1;
    }
    
    if has_dot && end > pos {
        let value = text[pos..end].parse::<f64>().ok()?;
        Some((end - pos, TokenValue::Float(value)))
    } else {
        None
    }
})

Multiline Comment

.custom_token(MySyntaxKind::Comment, |text, pos| {
    if !text[pos..].starts_with("/*") {
        return None;
    }
    
    let mut end = pos + 2;
    while end + 1 < text.len() {
        if text.as_bytes()[end] == b'*' && text.as_bytes()[end + 1] == b'/' {
            let value = text[pos+2..end].to_string();
            return Some((end - pos + 2, TokenValue::String(value)));
        }
        end += 1;
    }
    
    None // Unterminated comment
})

Best Practices

Return early: Return None quickly if pattern doesn’t match
Handle edge cases: Handle EOF, invalid sequences, etc.
Escape sequences: Properly handle escape sequences
Performance: Keep matching logic efficient
Error handling: Return None for invalid input

Combining with Regular Patterns

Custom matchers work alongside regular patterns:

let lexer = LexerBuilder::new()
    // Regular patterns
    .token(MySyntaxKind::Number, Pattern::CharClass(CharSet::digits()))
    .token(MySyntaxKind::Plus, Pattern::Literal("+".into()))
    
    // Custom matcher
    .custom_token(MySyntaxKind::String, |text, pos| {
        // Custom logic
    })
    
    .build(MySyntaxKind::Eof, MySyntaxKind::Ident)
    .expect("Failed to build lexer");

Priority

Custom matchers have the same priority as regular patterns based on definition order.

Next Steps

See Lexers for basic pattern usage
Check Examples for real-world patterns

Grammar Analysis

Sipha provides tools for analyzing and optimizing grammars.

Grammar Validation

Validate grammars before use:

use sipha::backend::ll::LlParser;

let errors = LlParser::validate(&grammar);
if !errors.is_empty() {
    for error in errors {
        eprintln!("Grammar error: {:?}", error);
    }
}

Left Recursion Detection

Detect left-recursive rules:

use sipha::grammar::analysis::detect_left_recursion;

let left_recursive = detect_left_recursion(&grammar);
for rule in left_recursive {
    eprintln!("Left-recursive rule: {:?}", rule);
}

Reachability Analysis

Find unreachable rules:

use sipha::grammar::analysis::find_unreachable_rules;

let unreachable = find_unreachable_rules(&grammar);
for rule in unreachable {
    eprintln!("Unreachable rule: {:?}", rule);
}

FIRST and FOLLOW Sets

Compute FIRST and FOLLOW sets for LL parsing:

use sipha::grammar::analysis::compute_first_sets;
use sipha::grammar::analysis::compute_follow_sets;

let first_sets = compute_first_sets(&grammar);
let follow_sets = compute_follow_sets(&grammar);

Conflict Detection

Detect shift/reduce and reduce/reduce conflicts:

use sipha::backend::lr::LrParser;

let conflicts = LrParser::detect_conflicts(&grammar);
for conflict in conflicts {
    eprintln!("Conflict: {:?}", conflict);
}

Grammar Optimization

Factor Common Patterns

Extract common subexpressions:

// Before: Duplicated pattern
.rule(MyNonTerminal::Expr, Expr::choice(vec![
    Expr::seq(vec![Expr::non_terminal(MyNonTerminal::Term), Expr::token(plus_token)]),
    Expr::seq(vec![Expr::non_terminal(MyNonTerminal::Term), Expr::token(minus_token)]),
]))

// After: Factored
.rule(MyNonTerminal::Expr, Expr::seq(vec![
    Expr::non_terminal(MyNonTerminal::Term),
    Expr::choice(vec![
        Expr::token(plus_token),
        Expr::token(minus_token),
    ]),
]))

Eliminate Left Recursion

Transform left-recursive rules:

// Before: Left-recursive
.rule(MyNonTerminal::Expr, Expr::choice(vec![
    Expr::seq(vec![
        Expr::non_terminal(MyNonTerminal::Expr),
        Expr::token(plus_token),
        Expr::non_terminal(MyNonTerminal::Term),
    ]),
    Expr::non_terminal(MyNonTerminal::Term),
]))

// After: Right-recursive
.rule(MyNonTerminal::Expr, Expr::seq(vec![
    Expr::non_terminal(MyNonTerminal::Term),
    Expr::opt(Box::new(Expr::seq(vec![
        Expr::token(plus_token),
        Expr::non_terminal(MyNonTerminal::Expr),
    ]))),
]))

Grammar Visualization

Visualize grammar structure (planned feature):

// Future API
let dot = grammar.to_dot();
std::fs::write("grammar.dot", dot).unwrap();

Best Practices

Validate early: Validate grammars before use
Check for conflicts: Detect and resolve conflicts
Optimize structure: Factor and simplify grammar
Test thoroughly: Test with various inputs
Analyze performance: Profile grammar performance

Next Steps

See Grammars for grammar definition
Check Performance for optimization tips

Unicode Support

Sipha provides full Unicode support for identifiers and text when the unicode feature is enabled.

Enabling Unicode

Enable Unicode support in Cargo.toml:

[dependencies]
sipha = { version = "0.5.0", features = ["unicode"] }

Unicode Character Classes

Use Unicode character classes in patterns:

use sipha::lexer::{CharSet, Pattern};

// Unicode letters
let unicode_letters = CharSet::unicode_letters();

// Unicode digits
let unicode_digits = CharSet::unicode_digits();

// Unicode identifiers (letters, digits, and connectors)
let unicode_ident = CharSet::unicode_ident_start();

Unicode Identifiers

Match Unicode identifiers:

let lexer = LexerBuilder::new()
    .token(MySyntaxKind::Ident, Pattern::CharClass(CharSet::unicode_ident_start()))
    .build(MySyntaxKind::Eof, MySyntaxKind::Ident)
    .expect("Failed to build lexer");

Unicode Normalization

Sipha handles Unicode normalization automatically:

NFC: Normalized Form Canonical Composition
NFD: Normalized Form Canonical Decomposition

Examples

Japanese Identifiers

let lexer = LexerBuilder::new()
    .token(MySyntaxKind::Ident, Pattern::CharClass(CharSet::unicode_letters()))
    .build(MySyntaxKind::Eof, MySyntaxKind::Ident)
    .expect("Failed to build lexer");

// Can tokenize: 変数名, 関数名, etc.

Emoji in Strings

// Emoji are handled correctly in string literals
let input = r#""Hello 👋 World 🌍""#;
let tokens = lexer.tokenize(input).unwrap();

Performance

Unicode support has minimal performance impact:

Character classes: Efficient Unicode property lookups
Normalization: Cached normalization results
Identifiers: Fast Unicode property checks

Best Practices

Enable when needed: Only enable unicode feature if needed
Use appropriate classes: Use Unicode character classes for identifiers
Handle normalization: Be aware of Unicode normalization
Test with Unicode: Test with various Unicode inputs

Next Steps

See Lexers for pattern usage
Check Examples for Unicode examples

Adding a Backend

This chapter shows how to implement a new parsing backend for Sipha.

ParserBackend Trait

All backends implement the ParserBackend trait:

pub trait ParserBackend<T, N>: Sized + Send
where
    T: Token,
    N: NonTerminal,
{
    type Config: Default + Clone;
    type Error: std::error::Error + Send + Sync + 'static;
    type State: Send + Sync;

    fn new(grammar: &Grammar<T, N>, config: Self::Config) -> Result<Self, Self::Error>;
    fn parse(&mut self, input: &[T], entry: N) -> ParseResult<T, N>;
    fn parse_with_session(...) -> ParseResult<T, N>;
    fn validate(grammar: &Grammar<T, N>) -> Vec<GrammarError<T, N>>;
    fn capabilities() -> BackendCapabilities;
    fn state(&self) -> &Self::State;
}

Implementation Steps

1. Define Backend Type

pub struct MyParser<T, N>
where
    T: Token,
    N: NonTerminal,
{
    grammar: Grammar<T, N>,
    config: MyConfig,
    state: MyState,
}

2. Implement ParserBackend

impl<T, N> ParserBackend<T, N> for MyParser<T, N>
where
    T: Token,
    N: NonTerminal,
{
    type Config = MyConfig;
    type Error = MyError;
    type State = MyState;

    fn new(grammar: &Grammar<T, N>, config: Self::Config) -> Result<Self, Self::Error> {
        // Initialize parser
        Ok(Self {
            grammar: grammar.clone(),
            config,
            state: MyState::new(),
        })
    }

    fn parse(&mut self, input: &[T], entry: N) -> ParseResult<T, N> {
        // Implement parsing logic
        // ...
    }

    fn parse_with_session(
        &mut self,
        input: &[T],
        entry: N,
        session: &IncrementalSession<'_, T::Kind>,
    ) -> ParseResult<T, N> {
        // Implement incremental parsing
        // ...
    }

    fn validate(grammar: &Grammar<T, N>) -> Vec<GrammarError<T, N>> {
        // Validate grammar compatibility
        // ...
    }

    fn capabilities() -> BackendCapabilities {
        BackendCapabilities {
            name: "MyParser",
            algorithm: Algorithm::PEG, // Or other
            supports_left_recursion: true,
            supports_ambiguity: false,
            supports_incremental: true,
            supports_error_recovery: true,
            max_lookahead: None,
        }
    }

    fn state(&self) -> &Self::State {
        &self.state
    }
}

3. Implement Parsing Logic

Implement the core parsing algorithm:

fn parse(&mut self, input: &[T], entry: N) -> ParseResult<T, N> {
    let mut builder = GreenNodeBuilder::new();
    let mut errors = Vec::new();
    let mut tokens = input.iter();
    
    // Parse according to algorithm
    // ...
    
    ParseResult {
        root: builder.finish().unwrap(),
        errors,
        warnings: Vec::new(),
        metrics: ParseMetrics::default(),
        #[cfg(feature = "backend-glr")]
        forest: None,
        _phantom: PhantomData,
    }
}

4. Implement Incremental Parsing

Support incremental parsing:

fn parse_with_session(
    &mut self,
    input: &[T],
    entry: N,
    session: &IncrementalSession<'_, T::Kind>,
) -> ParseResult<T, N> {
    // Find reusable nodes
    // Re-parse only affected regions
    // Combine reused and new nodes
    // ...
}

5. Implement Validation

Validate grammar compatibility:

fn validate(grammar: &Grammar<T, N>) -> Vec<GrammarError<T, N>> {
    let mut errors = Vec::new();
    
    // Check for left recursion
    // Check for ambiguity
    // Check for other requirements
    // ...
    
    errors
}

Example: PEG Parser

Here’s a simplified example of a PEG parser:

pub struct PegParser<T, N> {
    grammar: Grammar<T, N>,
    config: PegConfig,
    state: PegState,
}

impl<T, N> ParserBackend<T, N> for PegParser<T, N>
where
    T: Token,
    N: NonTerminal,
{
    type Config = PegConfig;
    type Error = PegError;
    type State = PegState;

    fn new(grammar: &Grammar<T, N>, config: Self::Config) -> Result<Self, Self::Error> {
        Ok(Self {
            grammar: grammar.clone(),
            config,
            state: PegState::new(),
        })
    }

    fn parse(&mut self, input: &[T], entry: N) -> ParseResult<T, N> {
        // PEG parsing logic
        // ...
    }

    // ... other methods
}

Best Practices

Follow existing patterns: Look at LL/LR/GLR implementations
Handle errors gracefully: Return errors, don’t panic
Support incremental: Implement incremental parsing if possible
Validate grammar: Check grammar compatibility
Document behavior: Document algorithm-specific behavior

Next Steps

See existing backends for reference:
Check Contributing for development workflow

Custom Syntax Kinds

This chapter covers advanced usage of syntax kinds.

Advanced SyntaxKind Implementation

Additional Methods

The SyntaxKind trait provides optional methods:

impl SyntaxKind for MySyntaxKind {
    fn is_terminal(self) -> bool {
        // ...
    }
    
    fn is_trivia(self) -> bool {
        // ...
    }
    
    // Optional: Mark keywords
    fn is_keyword(self) -> bool {
        matches!(self, 
            MySyntaxKind::If | 
            MySyntaxKind::While | 
            MySyntaxKind::For
        )
    }
    
    // Optional: Mark literals
    fn is_literal(self) -> bool {
        matches!(self, 
            MySyntaxKind::Number | 
            MySyntaxKind::String | 
            MySyntaxKind::Boolean
        )
    }
}

Syntax Kind Organization

Grouping Kinds

Organize kinds logically:

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum MySyntaxKind {
    // Keywords
    If, While, For, Return,
    
    // Operators
    Plus, Minus, Multiply, Divide,
    
    // Punctuation
    LParen, RParen, LBrace, RBrace,
    
    // Literals
    Number, String, Boolean,
    
    // Identifiers
    Ident,
    
    // Trivia
    Whitespace, Comment,
    
    // Non-terminals
    Expr, Stmt, Block,
    
    // Special
    Eof, Error,
}

Using Macros

Use macros to reduce boilerplate:

macro_rules! define_syntax_kinds {
    (
        keywords: [$($keyword:ident),*],
        operators: [$($op:ident),*],
        // ...
    ) => {
        #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
        enum MySyntaxKind {
            $($keyword,)*
            $($op,)*
            // ...
        }
        
        impl SyntaxKind for MySyntaxKind {
            fn is_keyword(self) -> bool {
                matches!(self, $(MySyntaxKind::$keyword)|*)
            }
            // ...
        }
    };
}

Syntax Kind Conversion

Converting to/from Strings

impl MySyntaxKind {
    pub fn as_str(self) -> &'static str {
        match self {
            MySyntaxKind::If => "if",
            MySyntaxKind::While => "while",
            // ...
        }
    }
    
    pub fn from_str(s: &str) -> Option<Self> {
        match s {
            "if" => Some(MySyntaxKind::If),
            "while" => Some(MySyntaxKind::While),
            // ...
            _ => None,
        }
    }
}

Best Practices

Use descriptive names: Choose clear, descriptive names
Group logically: Organize kinds by category
Document purpose: Document what each kind represents
Use exhaustiveness: Use exhaustive matching in implementations
Consider extensions: Design for future extensions

Next Steps

See Syntax Kinds for basics
Check Examples for real-world usage

Extending Lexers

This chapter shows how to extend lexers with custom tokenization logic.

Custom Tokenization

Custom Matchers

Use custom matchers for complex patterns:

let lexer = LexerBuilder::new()
    .custom_token(MySyntaxKind::String, |text, pos| {
        // Custom string matching
        // ...
    })
    .build(MySyntaxKind::Eof, MySyntaxKind::Ident)
    .expect("Failed to build lexer");

See Custom Patterns for details.

Extending Pattern Types

Adding New Pattern Types

To add new pattern types, extend the Pattern enum:

pub enum Pattern {
    Literal(CompactString),
    CharClass(CharSet),
    Repeat { pattern: Box<Pattern>, min: usize, max: Option<usize> },
    Regex(CompactString),
    Any,
    // Add new pattern types
    Custom(CustomPattern),
}

Implementing Pattern Matching

Implement pattern matching logic:

impl Pattern {
    fn matches(&self, text: &str, pos: usize) -> Option<usize> {
        match self {
            Pattern::Custom(custom) => custom.matches(text, pos),
            // ... other patterns
        }
    }
}

Lexer Hooks

Pre-processing

Add pre-processing hooks:

pub struct LexerBuilder<K: SyntaxKind> {
    // ...
    preprocessors: Vec<Box<dyn Fn(&str) -> String>>,
}

impl<K: SyntaxKind> LexerBuilder<K> {
    pub fn preprocess(mut self, f: impl Fn(&str) -> String + 'static) -> Self {
        self.preprocessors.push(Box::new(f));
        self
    }
}

Post-processing

Add post-processing hooks:

impl<K: SyntaxKind> LexerBuilder<K> {
    pub fn postprocess(mut self, f: impl Fn(&mut Vec<Token<K>>) + 'static) -> Self {
        self.postprocessors.push(Box::new(f));
        self
    }
}

Incremental Lexing

Token Updates

Update tokens incrementally:

pub struct IncrementalLexer<K: SyntaxKind> {
    lexer: CompiledLexer<K>,
    old_tokens: Vec<Token<K>>,
}

impl<K: SyntaxKind> IncrementalLexer<K> {
    pub fn tokenize_incremental(
        &self,
        text: &str,
        edits: &[TextEdit],
    ) -> Vec<Token<K>> {
        // Re-tokenize only affected regions
        // Reuse unchanged tokens
        // ...
    }
}

Best Practices

Use custom matchers: For complex patterns, use custom matchers
Extend patterns carefully: Only add new pattern types if needed
Test thoroughly: Test with various inputs
Document behavior: Document custom behavior
Consider performance: Keep custom logic efficient

Next Steps

See Lexers for basic usage
Check Custom Patterns for examples

Contributing

Thank you for your interest in contributing to Sipha! This document provides guidelines and instructions for contributing.

Code of Conduct

This project adheres to the Rust Code of Conduct. By participating, you are expected to uphold this code.

Getting Started

Prerequisites

Rust toolchain (stable, beta, or nightly)
cargo (comes with Rust)
rustfmt and clippy (install with rustup component add rustfmt clippy)

Development Setup

Fork the repository and clone your fork:

git clone https://github.com/yourusername/sipha.git
cd sipha

Build the project:
```
cargo build
```
Run the test suite:
```
cargo test
```

Run clippy:

cargo clippy --all-targets --all-features --workspace -- -W clippy::pedantic -W clippy::nursery -W clippy::cargo -D warnings

Check formatting:
```
cargo fmt --all -- --check
```

Development Workflow

Create a branch for your changes:

git checkout -b feature/your-feature-name
# or
git checkout -b fix/your-bug-fix

Make your changes following the coding standards below.
Write or update tests for your changes.
Ensure all tests pass:
```
cargo test --all-features
```

Run clippy and fix any warnings:

cargo clippy --all-targets --all-features --workspace -- -W clippy::pedantic -W clippy::nursery -W clippy::cargo -D warnings

Format your code:
```
cargo fmt --all
```

Commit your changes with a clear, descriptive commit message:

git commit -m "Add feature: description of what you added"

Push to your fork:

git push origin feature/your-feature-name

Open a Pull Request on GitHub with a clear description of your changes.

Coding Standards

Code Style

Follow Rust’s standard formatting conventions. Run cargo fmt before committing.
Use cargo clippy to catch common issues and follow Rust best practices.
We use pedantic clippy lints - some are allowed in clippy.toml and lib.rs, but try to follow them when possible.

Documentation

Document all public APIs with doc comments.
Include examples in doc comments where helpful.
Update the README if you add new features or change existing behavior.

Testing

Write tests for new features and bug fixes.
Aim for good test coverage, especially for critical paths.
Include both unit tests and integration tests where appropriate.
Test with different feature combinations when applicable.

Error Handling

Prefer Result types over panicking in library code.
Provide clear, actionable error messages.
Use appropriate error types from the error module.

Commit Messages

Use clear, descriptive commit messages.
Start with a verb in imperative mood (e.g., “Add”, “Fix”, “Update”).
Reference issue numbers when applicable (e.g., “Fix #123: description”).

Example:

Add incremental parsing cache invalidation

Implements cache invalidation for incremental parsing when text edits
occur. This improves performance by only re-parsing affected regions.

Fixes #42

Pull Request Process

Ensure your PR is up to date with the target branch (usually trunk).
Write a clear description of what your PR does and why.
Reference related issues if applicable.
Ensure CI passes - all tests, clippy checks, and formatting checks must pass.
Request review from maintainers.
Address feedback promptly and update your PR as needed.

Feature Requests

If you have an idea for a new feature:

Check existing issues to see if it’s already been discussed.
Open an issue describing the feature, its use case, and potential implementation approach.
Wait for maintainer feedback before implementing.

Bug Reports

When reporting bugs, please include:

A clear description of the bug
Steps to reproduce
Expected behavior
Actual behavior
Your environment (Rust version, OS, etc.)
Minimal code example if possible

Fuzz Testing

Sipha includes fuzz testing infrastructure to ensure robustness and correctness. Fuzz tests help find edge cases and potential panics in the parser.

Prerequisites

Install cargo-fuzz:

cargo install cargo-fuzz

Running Fuzz Tests

Navigate to the fuzz directory and run the fuzz targets:

cd fuzz
cargo fuzz run parser_fuzz
cargo fuzz run incremental_fuzz

Fuzz Targets

parser_fuzz: Tests the parser with random byte streams to ensure no panics occur.
incremental_fuzz: Differential fuzzing that compares incremental parsing results with full re-parsing to ensure correctness.

Adding New Fuzz Targets

To add a new fuzz target:

Create a new file in fuzz/fuzz_targets/ with your fuzz target code
Add a [[bin]] entry in fuzz/Cargo.toml pointing to your target
Run cargo fuzz list to verify the target is registered

Questions?

Feel free to open an issue for questions or discussions. We’re happy to help!

License

By contributing to Sipha, you agree that your contributions will be licensed under the MIT License.

Design Principles

This chapter explains the core design principles behind Sipha.

Incremental-First Design

Sipha is designed from the ground up for incremental parsing:

Node reuse: Unchanged subtrees are automatically reused
Cache management: Parse results are cached for future reuse
Affected regions: Only affected regions are re-parsed
Performance: Optimized for interactive editing scenarios

Immutability

Sipha uses immutable data structures:

Green trees: Immutable, shareable representation
Safe sharing: Immutability enables safe sharing across threads
Functional style: Operations create new trees rather than modifying existing ones

Modularity

Sipha is modular and extensible:

Backend system: Multiple parsing backends via trait
Feature flags: Optional features via feature flags
Pluggable components: Lexers, parsers, and trees are pluggable

Performance

Performance is a key consideration:

Efficient algorithms: Use efficient parsing algorithms
Memory efficiency: Shared trees reduce memory usage
Fast operations: Optimized for common operations
Incremental updates: Fast updates for interactive applications

Type Safety

Sipha leverages Rust’s type system:

Generic types: Generic over token and non-terminal types
Trait bounds: Clear trait bounds for extensibility
Type safety: Compile-time guarantees for correctness

Error Handling

Sipha provides comprehensive error handling:

Result types: Use Result for fallible operations
Error recovery: Configurable error recovery strategies
Rich diagnostics: Detailed error messages with context

API Design

Sipha’s API is designed for:

Ease of use: Simple, intuitive API
Flexibility: Support for various use cases
Extensibility: Easy to extend and customize
Documentation: Well-documented with examples

Next Steps

See Module Structure for codebase organization
Check Data Structures for key types

Module Structure

This chapter describes the organization of Sipha’s codebase.

Core Modules

`syntax`

Syntax tree types and operations:

green.rs: Green tree nodes
red.rs: Red tree nodes
kind.rs: Syntax kind trait
builder.rs: Tree builder
visitor.rs: Tree visitors
query.rs: Tree queries

`lexer`

Lexical analysis:

builder.rs: Lexer builder
dfa.rs: DFA construction and tokenization
token.rs: Token types
mod.rs: Module exports

`grammar`

Grammar definition and validation:

builder.rs: Grammar builder
expr.rs: Grammar expressions
validate.rs: Grammar validation
analysis.rs: Grammar analysis
docs.rs: Grammar documentation

`backend`

Parser backends:

mod.rs: Backend trait and common code
ll/: LL(k) parser implementation
lr/: LR parser implementation
glr/: GLR parser implementation
common.rs: Common backend utilities

`incremental`

Incremental parsing support:

mod.rs: Incremental parser and session

`error`

Error types and diagnostics:

mod.rs: Error types and diagnostics

`parser`

Parser traits and interfaces:

mod.rs: Parser traits

Workspace Structure

sipha/
├── crates/
│   ├── sipha/          # Main library
│   ├── sipha_derive/   # Procedural macros
│   └── sipha_lsp/      # LSP support
├── fuzz/               # Fuzz testing
└── doc/                # Documentation (this book)

Module Dependencies

syntax
  ↑
  ├── lexer
  │     ↓
  │   grammar
  │     ↓
  │   backend
  │     ↓
  │   incremental
  │
  └── error

Next Steps

See Data Structures for key types
Check Design Principles for design rationale

Data Structures

This chapter describes the key data structures in Sipha.

Green Trees

GreenNode

Immutable, shareable tree node:

use sipha::syntax::{SyntaxKind, TextSize};

pub struct GreenNode<K: SyntaxKind> {
    kind: K,
    text_len: TextSize,
    children: GreenChildren<K>,
}

GreenToken

Immutable token:

use sipha::syntax::SyntaxKind;
use compact_str::CompactString;

pub struct GreenToken<K: SyntaxKind> {
    kind: K,
    text: CompactString,
}

GreenElement

Tree element (node or token):

use std::sync::Arc;
use sipha::syntax::SyntaxKind;

pub enum GreenElement<K: SyntaxKind> {
    Node(Arc<GreenNode<K>>),
    Token(GreenToken<K>),
}

Red Trees

SyntaxNode

Red tree node for traversal:

use std::sync::Arc;

pub struct SyntaxNode {
    green: Arc<GreenNode>,
    parent: Option<Arc<SyntaxNode>>,
    index: usize,
}

SyntaxToken

Red tree token:

use std::sync::Arc;

pub struct SyntaxToken {
    green: GreenToken,
    parent: Option<Arc<SyntaxNode>>,
    index: usize,
}

Grammar

Grammar definition:

use std::collections::HashMap;
use lasso::Rodeo;

pub struct Grammar<T, N> {
    rules: HashMap<N, Rule<T, N>>,
    entry_point: N,
    interner: Rodeo,
}

Rule

Production rule:

pub struct Rule<T, N> {
    pub lhs: N,
    pub rhs: Expr<T, N>,
    pub metadata: RuleMetadata,
}

Expr

Grammar expression:

pub enum Expr<T, N> {
    Token(T),
    Rule(N),
    Seq(Vec<Expr<T, N>>),
    Choice(Vec<Expr<T, N>>),
    // ... more variants
}

Incremental Parsing

IncrementalParser

Incremental parser wrapper:

pub struct IncrementalParser<P, T, N> {
    parser: P,
    cache: ParseCache<T::Kind>,
}

IncrementalSession

Incremental parsing session:

pub struct IncrementalSession<'a, K> {
    old_tree: Option<&'a GreenNode<K>>,
    edits: &'a [TextEdit],
    affected: AffectedRegion,
    reusable: Vec<ReuseCandidate<K>>,
    cache: Option<&'a ParseCache<K>>,
}

ParseCache

Parse result cache:

use std::collections::HashMap;
use std::sync::Arc;
use lasso::Rodeo;

pub struct ParseCache<K> {
    version: usize,
    nodes: HashMap<CacheKey, Arc<GreenNode<K>>>,
    interner: Rodeo,
}

Lexer

CompiledLexer

Compiled lexer:

use std::collections::{HashMap, HashSet};
use compact_str::CompactString;

pub struct CompiledLexer<K> {
    rules: Vec<CompiledRule<K>>,
    dfa: Option<Dfa>,
    keywords: HashMap<CompactString, K>,
    trivia_kinds: HashSet<K>,
}

Token

Lexer token:

use compact_str::CompactString;
use sipha::syntax::TextRange;

pub struct Token<K> {
    pub kind: K,
    pub text: CompactString,
    pub range: TextRange,
}

Error Types

ParseError

Parsing error:

use sipha::syntax::TextRange;

pub struct ParseError<T, N> {
    pub span: TextRange,
    pub message: String,
    pub expected: Vec<T>,
    pub found: Option<T>,
}

LexerError

Lexer error:

use sipha::syntax::TextRange;

pub struct LexerError {
    pub span: TextRange,
    pub kind: LexerErrorKind,
}

Next Steps

See Performance Considerations for optimization
Check Module Structure for organization

Performance Considerations

This chapter covers performance considerations in Sipha’s design.

Memory Efficiency

Green trees are designed for sharing:

Arc-based: Nodes use Arc for reference counting
Shared subtrees: Unchanged subtrees are shared across edits
Arena allocation: Nodes can be arena-allocated for efficiency

Compact Representation

Green trees use compact representations:

Small nodes: Minimal overhead per node
Compact strings: Use CompactString for short tokens
Efficient children: Optimized child storage

Parsing Performance

DFA-Based Lexing

Lexer uses DFA for O(n) tokenization:

Compiled DFA: DFA is compiled once
Fast matching: O(n) tokenization time
Efficient patterns: Character classes are efficient

Table-Based Parsing

Parsers use tables for fast lookups:

Parse tables: Pre-computed parse tables
Fast lookups: O(1) table lookups
Compact tables: Optimized table representation

Incremental Performance

Node Reuse

Incremental parsing reuses nodes:

Budget-based: Limit reuse candidates
Position indexing: Fast node lookup
Smart invalidation: Efficient cache invalidation

Cache Management

Parse cache is optimized:

Version-based: Efficient version management
Automatic eviction: Old versions are evicted
Interned strings: Rule names are interned

Optimization Tips

Grammar Optimization

Factor patterns: Extract common subexpressions
Minimize nesting: Flatten deeply nested structures
Use appropriate lookahead: Don’t use more than needed

Lexer Optimization

Order patterns: Most specific patterns first
Use character classes: Prefer over regex
Keyword trie: Keywords use efficient trie

Parser Optimization

Choose right backend: Use appropriate backend
Enable cache: Always provide grammar for cache
Adjust budget: Tune reuse budget for use case

Benchmarking

Measuring Performance

use std::time::Instant;

// Assuming parser, tokens, and entry are already defined
let start = Instant::now();
let result = parser.parse(&tokens, entry);
let duration = start.elapsed();

Profiling

Use profilers to find bottlenecks:

perf: Linux profiling
valgrind: Memory and call profiling
cargo-flamegraph: Flamegraph generation

Best Practices

Profile first: Profile before optimizing
Measure changes: Benchmark before and after
Use incremental: Always use for interactive apps
Optimize grammar: Factor and simplify
Choose right backend: Use appropriate backend

Next Steps

See Performance for optimization tips
Check Design Principles for design rationale

Basic Arithmetic Example

This example walks through building a simple arithmetic expression parser step by step.

Overview

We’ll build a parser for arithmetic expressions like 42 + 10 * 3 with proper operator precedence.

Step 1: Define Syntax Kinds

use sipha::syntax::SyntaxKind;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    // Terminals
    Number,
    Plus,
    Minus,
    Multiply,
    Divide,
    LParen,
    RParen,
    Whitespace,
    Eof,
    // Non-terminals
    Expr,
    Term,
    Factor,
}

impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool {
        !matches!(self, 
            ArithSyntaxKind::Expr | 
            ArithSyntaxKind::Term | 
            ArithSyntaxKind::Factor
        )
    }
    
    fn is_trivia(self) -> bool {
        matches!(self, ArithSyntaxKind::Whitespace)
    }
}

Step 2: Build Lexer

use sipha::syntax::SyntaxKind;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    Number, Plus, Minus, Multiply, Divide, LParen, RParen, Whitespace, Eof,
    Expr, Term, Factor,
}
impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor) }
    fn is_trivia(self) -> bool { matches!(self, ArithSyntaxKind::Whitespace) }
}
use sipha::lexer::{LexerBuilder, Pattern, CharSet};

let lexer = LexerBuilder::new()
    .token(ArithSyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1,
        max: None,
    })
    .token(ArithSyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(ArithSyntaxKind::Minus, Pattern::Literal("-".into()))
    .token(ArithSyntaxKind::Multiply, Pattern::Literal("*".into()))
    .token(ArithSyntaxKind::Divide, Pattern::Literal("/".into()))
    .token(ArithSyntaxKind::LParen, Pattern::Literal("(".into()))
    .token(ArithSyntaxKind::RParen, Pattern::Literal(")".into()))
    .token(ArithSyntaxKind::Whitespace, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
        min: 1,
        max: None,
    })
    .trivia(ArithSyntaxKind::Whitespace)
    .build(ArithSyntaxKind::Eof, ArithSyntaxKind::Number)
    .expect("Failed to build lexer");

Step 3: Define Grammar

use sipha::syntax::SyntaxKind;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    Number, Plus, Minus, Multiply, Divide, LParen, RParen, Whitespace, Eof,
    Expr, Term, Factor,
}
impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor) }
    fn is_trivia(self) -> bool { matches!(self, ArithSyntaxKind::Whitespace) }
}
use sipha::grammar::{GrammarBuilder, NonTerminal, Expr};
use sipha::lexer::Token as LexerToken;
use sipha::syntax::{TextRange, TextSize};
use std::convert::TryFrom;

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum ArithNonTerminal {
    Expr,
    Term,
    Factor,
}

impl NonTerminal for ArithNonTerminal {
    fn name(&self) -> &str {
        match self {
            ArithNonTerminal::Expr => "Expr",
            ArithNonTerminal::Term => "Term",
            ArithNonTerminal::Factor => "Factor",
        }
    }
}

// Helper function to create tokens with proper ranges
fn create_token(kind: ArithSyntaxKind, text: &str, offset: u32) -> LexerToken<ArithSyntaxKind> {
    let len = TextSize::from(u32::try_from(text.len()).unwrap_or(0));
    LexerToken::new(kind, text, TextRange::at(TextSize::from(offset), len))
}

let grammar = GrammarBuilder::new()
    .entry_point(ArithNonTerminal::Expr)
    
    // Expr -> Term | Expr + Term | Expr - Term
    .rule(ArithNonTerminal::Expr, Expr::choice(vec![
        Expr::seq(vec![
            Expr::non_terminal(ArithNonTerminal::Expr),
            Expr::token(create_token(ArithSyntaxKind::Plus, "+", 0)),
            Expr::non_terminal(ArithNonTerminal::Term),
        ]),
        Expr::seq(vec![
            Expr::non_terminal(ArithNonTerminal::Expr),
            Expr::token(create_token(ArithSyntaxKind::Minus, "-", 0)),
            Expr::non_terminal(ArithNonTerminal::Term),
        ]),
        Expr::non_terminal(ArithNonTerminal::Term),
    ]))
    
    // Term -> Factor | Term * Factor | Term / Factor
    .rule(ArithNonTerminal::Term, Expr::choice(vec![
        Expr::seq(vec![
            Expr::non_terminal(ArithNonTerminal::Term),
            Expr::token(create_token(ArithSyntaxKind::Multiply, "*", 0)),
            Expr::non_terminal(ArithNonTerminal::Factor),
        ]),
        Expr::seq(vec![
            Expr::non_terminal(ArithNonTerminal::Term),
            Expr::token(create_token(ArithSyntaxKind::Divide, "/", 0)),
            Expr::non_terminal(ArithNonTerminal::Factor),
        ]),
        Expr::non_terminal(ArithNonTerminal::Factor),
    ]))
    
    // Factor -> Number | ( Expr )
    .rule(ArithNonTerminal::Factor, Expr::choice(vec![
        Expr::token(create_token(ArithSyntaxKind::Number, "42", 0)),
        Expr::seq(vec![
            Expr::token(create_token(ArithSyntaxKind::LParen, "(", 0)),
            Expr::non_terminal(ArithNonTerminal::Expr),
            Expr::token(create_token(ArithSyntaxKind::RParen, ")", 0)),
        ]),
    ]))
    
    .build()
    .expect("Failed to build grammar");

Step 4: Parse

use sipha::syntax::SyntaxKind;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum ArithSyntaxKind {
    Number, Plus, Minus, Multiply, Divide, LParen, RParen, Whitespace, Eof,
    Expr, Term, Factor,
}
impl SyntaxKind for ArithSyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, ArithSyntaxKind::Expr | ArithSyntaxKind::Term | ArithSyntaxKind::Factor) }
    fn is_trivia(self) -> bool { matches!(self, ArithSyntaxKind::Whitespace) }
}
use sipha::lexer::{LexerBuilder, Pattern, CharSet};
let lexer = LexerBuilder::new()
    .token(ArithSyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1, max: None,
    })
    .token(ArithSyntaxKind::Plus, Pattern::Literal("+".into()))
    .token(ArithSyntaxKind::Minus, Pattern::Literal("-".into()))
    .token(ArithSyntaxKind::Multiply, Pattern::Literal("*".into()))
    .token(ArithSyntaxKind::Divide, Pattern::Literal("/".into()))
    .token(ArithSyntaxKind::LParen, Pattern::Literal("(".into()))
    .token(ArithSyntaxKind::RParen, Pattern::Literal(")".into()))
    .token(ArithSyntaxKind::Whitespace, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::whitespace())),
        min: 1, max: None,
    })
    .trivia(ArithSyntaxKind::Whitespace)
    .build(ArithSyntaxKind::Eof, ArithSyntaxKind::Number)
    .expect("Failed to build lexer");
use sipha::grammar::{GrammarBuilder, NonTerminal, Expr};
use sipha::lexer::Token as LexerToken;
use sipha::syntax::{TextRange, TextSize};
use std::convert::TryFrom;
#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum ArithNonTerminal { Expr, Term, Factor, }
impl NonTerminal for ArithNonTerminal {
    fn name(&self) -> &str { match self { ArithNonTerminal::Expr => "Expr", ArithNonTerminal::Term => "Term", ArithNonTerminal::Factor => "Factor", } }
}
fn create_token(kind: ArithSyntaxKind, text: &str, offset: u32) -> LexerToken<ArithSyntaxKind> {
    let len = TextSize::from(u32::try_from(text.len()).unwrap_or(0));
    LexerToken::new(kind, text, TextRange::at(TextSize::from(offset), len))
}
let grammar = GrammarBuilder::new()
    .entry_point(ArithNonTerminal::Expr)
    .rule(ArithNonTerminal::Expr, Expr::choice(vec![
        Expr::seq(vec![Expr::non_terminal(ArithNonTerminal::Expr), Expr::token(create_token(ArithSyntaxKind::Plus, "+", 0)), Expr::non_terminal(ArithNonTerminal::Term)]),
        Expr::seq(vec![Expr::non_terminal(ArithNonTerminal::Expr), Expr::token(create_token(ArithSyntaxKind::Minus, "-", 0)), Expr::non_terminal(ArithNonTerminal::Term)]),
        Expr::non_terminal(ArithNonTerminal::Term),
    ]))
    .rule(ArithNonTerminal::Term, Expr::choice(vec![
        Expr::seq(vec![Expr::non_terminal(ArithNonTerminal::Term), Expr::token(create_token(ArithSyntaxKind::Multiply, "*", 0)), Expr::non_terminal(ArithNonTerminal::Factor)]),
        Expr::seq(vec![Expr::non_terminal(ArithNonTerminal::Term), Expr::token(create_token(ArithSyntaxKind::Divide, "/", 0)), Expr::non_terminal(ArithNonTerminal::Factor)]),
        Expr::non_terminal(ArithNonTerminal::Factor),
    ]))
    .rule(ArithNonTerminal::Factor, Expr::choice(vec![
        Expr::token(create_token(ArithSyntaxKind::Number, "42", 0)),
        Expr::seq(vec![Expr::token(create_token(ArithSyntaxKind::LParen, "(", 0)), Expr::non_terminal(ArithNonTerminal::Expr), Expr::token(create_token(ArithSyntaxKind::RParen, ")", 0))]),
    ]))
    .build().expect("Failed to build grammar");
use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;
use sipha::syntax::SyntaxNode;

let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)
    .expect("Failed to create parser");

let input = "42 + 10 * 3";
let tokens = lexer.tokenize(input)
    .expect("Failed to tokenize");

let result = parser.parse(&tokens, ArithNonTerminal::Expr);

if !result.errors.is_empty() {
    eprintln!("Errors: {:?}", result.errors);
} else {
    let root = SyntaxNode::new_root(result.root.clone());
    println!("Parse successful!");
    println!("Root: {:?}", root.kind());
}

Complete Example

See examples/basic_arithmetic.rs for the complete working example.

Next Steps

Try Incremental Parsing Example
Check GLR Example for ambiguous grammars

Incremental Parsing Example

This example demonstrates incremental parsing with Sipha.

Overview

We’ll build a simple language server that uses incremental parsing to efficiently update the parse tree as the user edits code.

Setup

use sipha::incremental::{IncrementalParser, TextEdit};
use sipha::backend::ll::{LlParser, LlConfig};
use sipha::syntax::{TextRange, TextSize, GreenNode};
use sipha::grammar::Grammar;
use sipha::lexer::CompiledLexer;

// These types would be defined in your actual code
// type MyToken = /* ... */;
// type MyNonTerminal = /* ... */;
// type MySyntaxKind = /* ... */;

struct LanguageServer {
    parser: IncrementalParser<LlParser, MyToken, MyNonTerminal>,
    grammar: Grammar<MyToken, MyNonTerminal>,
    lexer: CompiledLexer<MySyntaxKind>,
    current_tree: Option<GreenNode<MySyntaxKind>>,
}

Initial Parse

use sipha::incremental::{IncrementalParser, TextEdit};
use sipha::backend::ll::{LlParser, LlConfig};
use sipha::syntax::{TextRange, TextSize, GreenNode};
use sipha::grammar::Grammar;
use sipha::lexer::CompiledLexer;
struct LanguageServer {
    parser: IncrementalParser<LlParser, MyToken, MyNonTerminal>,
    grammar: Grammar<MyToken, MyNonTerminal>,
    lexer: CompiledLexer<MySyntaxKind>,
    current_tree: Option<GreenNode<MySyntaxKind>>,
}
type MyToken = ();
type MyNonTerminal = ();
type MySyntaxKind = ();
fn build_grammar() -> Grammar<MyToken, MyNonTerminal> { todo!() }
fn build_lexer() -> CompiledLexer<MySyntaxKind> { todo!() }
impl LanguageServer {
    fn new() -> Self {
        let grammar = build_grammar();
        let parser = LlParser::new(&grammar, Default::default()).unwrap();
        let lexer = build_lexer();
        
        Self {
            parser: IncrementalParser::new(parser),
            grammar,
            lexer,
            current_tree: None,
        }
    }
    
    fn initial_parse(&mut self, text: &str) {
        let tokens = self.lexer.tokenize(text).unwrap();
        
        let result = self.parser.parse_incremental(
            &tokens,
            None,              // No old tree
            &[],               // No edits
            MyNonTerminal::Expr,
            Some(&self.grammar), // Populate cache
        );
        
        self.current_tree = Some(result.root.clone());
    }
}

Handling Edits

use sipha::incremental::{IncrementalParser, TextEdit};
use sipha::backend::ll::{LlParser, LlConfig};
use sipha::syntax::{TextRange, TextSize, GreenNode};
use sipha::grammar::Grammar;
use sipha::lexer::CompiledLexer;
struct LanguageServer {
    parser: IncrementalParser<LlParser, MyToken, MyNonTerminal>,
    grammar: Grammar<MyToken, MyNonTerminal>,
    lexer: CompiledLexer<MySyntaxKind>,
    current_tree: Option<GreenNode<MySyntaxKind>>,
}
type MyToken = ();
type MyNonTerminal = ();
type MySyntaxKind = ();
impl LanguageServer {
    fn report_errors(&self, _errors: &[()]) {}
}
impl LanguageServer {
    fn on_text_changed(&mut self, edits: Vec<TextEdit>, new_text: &str) {
        // Re-tokenize (or update tokens incrementally)
        let tokens = self.lexer.tokenize(new_text).unwrap();
        
        // Incremental parse
        let result = self.parser.parse_incremental(
            &tokens,
            self.current_tree.as_ref(), // Previous tree
            &edits,                     // Text edits
            MyNonTerminal::Expr,
            Some(&self.grammar),        // Update cache
        );
        
        // Update current tree
        self.current_tree = Some(result.root.clone());
        
        // Report errors
        if !result.errors.is_empty() {
            self.report_errors(&result.errors);
        }
    }
}

Creating Edits

use sipha::incremental::TextEdit;
use sipha::syntax::{TextRange, TextSize};
struct LanguageServer;
impl LanguageServer {
    fn on_text_changed(&mut self, _edits: Vec<TextEdit>, _new_text: &str) {}
}
let mut server = LanguageServer;
// User changed "42" to "100" at position 0
let edit = TextEdit::replace(
    TextRange::new(TextSize::from(0), TextSize::from(2)),
    "100".into(),
);

server.on_text_changed(vec![edit], "100 + 10");

Performance

Incremental parsing provides significant performance improvements:

Small edits: 10-100x faster
Medium edits: 5-20x faster
Large edits: 2-5x faster

Complete Example

See examples/incremental_parsing.rs for the complete working example.

Next Steps

See Incremental Parsing for detailed documentation
Check Real-World Patterns for more examples

GLR Parsing Example

This example demonstrates using the GLR parser to handle ambiguous grammars.

Overview

We’ll parse an ambiguous expression grammar and disambiguate using precedence rules.

Ambiguous Grammar

// Grammar: Expr -> Expr + Expr | Expr * Expr | Number
// Input: "1 + 2 * 3"
// Ambiguous: Could be (1 + 2) * 3 or 1 + (2 * 3)

Setup

use sipha::grammar::Grammar;
type MyToken = ();
type MyNonTerminal = ();
let grammar: Grammar<MyToken, MyNonTerminal> = todo!();
use sipha::backend::glr::{GlrParser, GlrConfig};
use sipha::backend::glr::disambiguate_by_precedence;

let config = GlrConfig::default();
let mut parser = GlrParser::new(&grammar, config)
    .expect("Failed to create GLR parser");

Parsing

use sipha::backend::glr::{GlrParser, GlrConfig};
use sipha::backend::glr::disambiguate_by_precedence;
use sipha::syntax::SyntaxNode;
use sipha::lexer::Token;
type MyToken = ();
type MyNonTerminal = ();
type MySyntaxKind = ();
struct Grammar;
let grammar = Grammar;
let config = GlrConfig::default();
let mut parser = GlrParser::new(&grammar, config).expect("Failed to create GLR parser");
let tokens: Vec<Token<MySyntaxKind>> = vec![];
let result = parser.parse(&tokens, MyNonTerminal::Expr);

// Check for ambiguity
if let Some(forest) = result.forest {
    println!("Ambiguous: {} parse trees", forest.roots.len());
    
    // Disambiguate using precedence
    let disambiguated = forest.disambiguate(disambiguate_by_precedence(|op| {
        match op {
            MySyntaxKind::Multiply | MySyntaxKind::Divide => 2,
            MySyntaxKind::Plus | MySyntaxKind::Minus => 1,
            _ => 0,
        }
    }));
    
    // Use disambiguated tree
    let root = SyntaxNode::new_root(disambiguated);
} else {
    // Unambiguous - use result directly
    let root = SyntaxNode::new_root(result.root);
}

Disambiguation Strategies

Precedence

use sipha::backend::glr::disambiguate_by_precedence;
type MySyntaxKind = ();
struct Forest;
impl Forest {
    fn disambiguate<F>(&self, _f: F) -> () { () }
}
let forest = Forest;
let disambiguated = forest.disambiguate(disambiguate_by_precedence(|op| {
    match op {
        MySyntaxKind::Multiply => 2,
        MySyntaxKind::Plus => 1,
        _ => 0,
    }
}));

Associativity

use sipha::backend::glr::disambiguate_by_associativity;
use sipha::backend::glr::Associativity;
type MySyntaxKind = ();
struct Forest;
impl Forest {
    fn disambiguate<F>(&self, _f: F) -> () { () }
}
let forest = Forest;
let disambiguated = forest.disambiguate(disambiguate_by_associativity(|op| {
    match op {
        MySyntaxKind::Minus => Associativity::Left,
        MySyntaxKind::Power => Associativity::Right,
        _ => Associativity::None,
    }
}));

Custom

struct Forest;
impl Forest {
    fn disambiguate<F>(&self, _f: F) -> () { () }
}
let forest = Forest;
fn compute_complexity(_tree: &()) -> usize { 0 }
let disambiguated = forest.disambiguate(|alternatives| {
    // Choose based on custom criteria
    alternatives
        .iter()
        .min_by_key(|tree| compute_complexity(tree))
        .cloned()
});

Complete Example

See examples/glr_parsing.rs for the complete working example.

Next Steps

See GLR Parser for detailed documentation
Check Real-World Patterns for more examples

PEG Parsing Example

This example demonstrates using the PEG parser with ordered choice and memoization.

Overview

We’ll parse an expression grammar using PEG’s ordered choice semantics and demonstrate the benefits of memoization (packrat parsing).

Ordered Choice Semantics

PEG uses ordered choice where the first matching alternative wins:

// Grammar: Expr -> Expr + Term | Term
// In PEG: First alternative tried first
// If "Expr + Term" matches, "Term" is never tried

This makes precedence natural - lower precedence operators are tried first.

Setup

use sipha::backend::peg::{PegParser, PegConfig};
use sipha::backend::ParserBackend;

let config = PegConfig {
    enable_memoization: true,    // Enable packrat parsing
    max_memo_size: 10000,        // Cache size limit
    error_recovery: true,
    max_errors: 100,
    max_backtrack_depth: 1000,
};

let mut parser = PegParser::new(&grammar, config)
    .expect("Failed to create PEG parser");

Basic Parsing

use sipha::backend::peg::{PegParser, PegConfig};
use sipha::backend::ParserBackend;
type MyToken = ();
type MyNonTerminal = ();
struct Grammar;
let grammar = Grammar;
let config = PegConfig::default();
let mut parser = PegParser::new(&grammar, config).expect("Failed to create PEG parser");
let tokens: Vec<MyToken> = vec![];
let result = parser.parse(&tokens, MyNonTerminal::Expr);

if result.errors.is_empty() {
    println!("Parse successful!");
    println!("Tokens consumed: {}", result.metrics.tokens_consumed);
    println!("Nodes created: {}", result.metrics.nodes_created);
} else {
    for error in &result.errors {
        eprintln!("Error: {error}");
    }
}

Memoization (Packrat Parsing)

Memoization caches parse results for O(n) performance:

use sipha::backend::peg::{PegParser, PegConfig};
type MyToken = ();
type MyNonTerminal = ();
struct Grammar;
let grammar = Grammar;
let config_with_memo = PegConfig {
    enable_memoization: true,
    ..Default::default()
};

let config_without_memo = PegConfig {
    enable_memoization: false,
    ..Default::default()
};

// With memoization: O(n) time, O(n) space
let parser_with = PegParser::new(&grammar, config_with_memo)?;

// Without memoization: May be exponential for some grammars
let parser_without = PegParser::new(&grammar, config_without_memo)?;

Incremental Parsing

PEG works well with incremental parsing thanks to memoization:

use sipha::incremental::IncrementalParser;
use sipha::backend::peg::{PegParser, PegConfig};
type MyToken = ();
type MyNonTerminal = ();
struct Grammar;
let grammar = Grammar;
let parser = PegParser::new(&grammar, PegConfig::default()).expect("parser");
use sipha::incremental::{IncrementalParser, TextEdit};
use sipha::syntax::{TextRange, TextSize};

let mut incremental = IncrementalParser::new(parser);

// Initial parse
let result1 = incremental.parse_incremental(
    &tokens,
    None,
    &[],
    MyNonTerminal::Expr,
    Some(&grammar),
);

// After edit - benefits from memoization cache
let edits = vec![TextEdit::replace(
    TextRange::new(TextSize::from(0), TextSize::from(1)),
    "new_text".into(),
)];
let result2 = incremental.parse_incremental(
    &new_tokens,
    Some(&result1.root),
    &edits,
    MyNonTerminal::Expr,
    Some(&grammar),
);

Ordered Choice for Precedence

PEG’s ordered choice makes operator precedence natural:

// Lower precedence tried first
Expr -> Expr + Term | Term

// Higher precedence tried later  
Term -> Term * Factor | Factor

// This naturally gives * higher precedence than +

Performance Comparison

The example demonstrates performance with and without memoization:

With memoization: Linear time O(n)
Without memoization: May be exponential for some grammars
Incremental: Fast for small edits (benefits from cache)

Complete Example

See examples/peg_parsing.rs for the complete working example.

Run it with:

cargo run --example peg_parsing --features backend-peg

Expected Output

The example demonstrates:

Building a lexer and grammar
Parsing with ordered choice
Memoization benefits
Incremental parsing performance
Performance comparison with/without memoization

Next Steps

See PEG Parser for detailed documentation
Check Choosing a Backend to see when to use PEG
Explore Incremental Parsing for more details

Real-World Patterns

This chapter covers common patterns and idioms for using Sipha in real-world applications.

Language Server Pattern

Structure

struct LanguageServer {
    parser: IncrementalParser<LlParser, Token, NonTerminal>,
    grammar: Grammar<Token, NonTerminal>,
    lexer: CompiledLexer<SyntaxKind>,
    documents: HashMap<Uri, Document>,
}

struct Document {
    text: String,
    tree: Option<GreenNode<SyntaxKind>>,
    version: i32,
}

Handling Changes

impl LanguageServer {
    fn did_change(&mut self, uri: Uri, edits: Vec<TextEdit>, version: i32) {
        let doc = self.documents.get_mut(&uri).unwrap();
        doc.text = apply_edits(&doc.text, &edits);
        doc.version = version;
        
        let tokens = self.lexer.tokenize(&doc.text).unwrap();
        let result = self.parser.parse_incremental(
            &tokens,
            doc.tree.as_ref(),
            &edits,
            MyNonTerminal::Root,
            Some(&self.grammar),
        );
        
        doc.tree = Some(result.root.clone());
        
        // Publish diagnostics
        self.publish_diagnostics(uri, &result.errors);
    }
}

Error Reporting Pattern

Collecting Errors

fn collect_errors(result: &ParseResult) -> Vec<Diagnostic> {
    result.errors
        .iter()
        .map(|error| Diagnostic {
            range: error.span,
            severity: DiagnosticSeverity::Error,
            message: error.message.clone(),
            // ...
        })
        .collect()
}

Reporting to User

fn report_errors(errors: &[Diagnostic]) {
    for error in errors {
        eprintln!("Error at {}: {}", error.range, error.message);
    }
}

Tree Traversal Pattern

Finding Nodes

fn find_nodes_by_kind(root: &SyntaxNode, kind: SyntaxKind) -> Vec<SyntaxNode> {
    root.descendants()
        .filter(|n| n.kind() == kind)
        .collect()
}

Collecting Information

fn collect_identifiers(root: &SyntaxNode) -> Vec<String> {
    root.descendants()
        .filter(|n| n.kind() == SyntaxKind::Ident)
        .filter_map(|n| n.first_token())
        .map(|t| t.text().to_string())
        .collect()
}

Grammar Extension Pattern

Adding Rules

fn extend_grammar(base: GrammarBuilder) -> GrammarBuilder {
    base
        .rule(MyNonTerminal::NewRule, Expr::token(new_token))
        .rule(MyNonTerminal::AnotherRule, Expr::choice(vec![
            Expr::rule(MyNonTerminal::NewRule),
            Expr::rule(MyNonTerminal::ExistingRule),
        ]))
}

Testing Pattern

Test Helpers

fn parse_test(input: &str) -> ParseResult {
    let lexer = build_test_lexer();
    let grammar = build_test_grammar();
    let mut parser = LlParser::new(&grammar, Default::default()).unwrap();
    
    let tokens = lexer.tokenize(input).unwrap();
    parser.parse(&tokens, MyNonTerminal::Expr)
}

#[test]
fn test_simple_expression() {
    let result = parse_test("1 + 2");
    assert!(result.errors.is_empty());
    // ...
}

Best Practices

Use incremental parsing: Always use for interactive apps
Cache grammar: Reuse grammar across parses
Handle errors gracefully: Don’t panic on errors
Test thoroughly: Test with various inputs
Profile performance: Profile and optimize hot paths

Next Steps

See Examples for complete examples
Check Architecture for design patterns

API Overview

This chapter provides an overview of Sipha’s key types and traits.

Core Traits

SyntaxKind

Trait for syntax kind identifiers:

use std::fmt::Debug;
use std::hash::Hash;

pub trait SyntaxKind: Copy + Clone + PartialEq + Eq + Hash + Debug {
    fn is_terminal(self) -> bool;
    fn is_trivia(self) -> bool;
}

Token

Trait for tokens:

use std::hash::Hash;
use sipha::syntax::{SyntaxKind, TextSize};
use compact_str::CompactString;

pub trait Token: Clone + PartialEq + Eq + Hash {
    type Kind: SyntaxKind;
    fn kind(&self) -> Self::Kind;
    fn text_len(&self) -> TextSize;
    fn text(&self) -> CompactString;
}

NonTerminal

Trait for non-terminals:

use std::hash::Hash;

pub trait NonTerminal: Clone + PartialEq + Eq + Hash {
    fn name(&self) -> &str;
}

ParserBackend

Trait for parser backends:

use sipha::grammar::Grammar;

pub trait ParserBackend<T, N>: Sized + Send
where
    T: Token,
    N: NonTerminal,
{
    type Config: Default + Clone;
    type Error: std::error::Error + Send + Sync + 'static;
    type State: Send + Sync;
    
    fn new(grammar: &Grammar<T, N>, config: Self::Config) -> Result<Self, Self::Error>;
    fn parse(&mut self, input: &[T], entry: N) -> ParseResult<T, N>;
    // ... more methods
}

Key Types

Grammar

Grammar definition:

use std::marker::PhantomData;

pub struct Grammar<T, N> {
    _phantom: PhantomData<(T, N)>,
}

ParseResult

Parse result:

use sipha::syntax::GreenNode;

pub struct ParseResult<T, N> {
    pub root: GreenNode<T::Kind>,
    pub errors: Vec<ParseError<T, N>>,
    pub warnings: Vec<ParseWarning<T, N>>,
    pub metrics: ParseMetrics,
    #[cfg(feature = "backend-glr")]
    pub forest: Option<ParseForest<T::Kind>>,
}

GreenNode

Green tree node:

use sipha::syntax::SyntaxKind;

pub struct GreenNode<K: SyntaxKind> {
    _phantom: std::marker::PhantomData<K>,
}

SyntaxNode

Red tree node:

pub struct SyntaxNode {
    // ...
}

Builders

GrammarBuilder

Build grammars:

pub struct GrammarBuilder<T, N> {
    // ...
}

impl<T, N> GrammarBuilder<T, N> {
    pub fn new() -> Self;
    pub fn entry_point(mut self, entry: N) -> Self;
    pub fn rule(mut self, lhs: N, rhs: Expr<T, N>) -> Self;
    pub fn build(self) -> Result<Grammar<T, N>, GrammarError>;
}

LexerBuilder

Build lexers:

use sipha::syntax::SyntaxKind;
use sipha::lexer::{LexerBuilder, Pattern, CompiledLexer};
type LexerError = ();
type K = ();
pub struct LexerBuilder<K: SyntaxKind> {
    _phantom: std::marker::PhantomData<K>,
}

impl<K: SyntaxKind> LexerBuilder<K> {
    pub fn new() -> Self { todo!() }
    pub fn token(self, kind: K, pattern: Pattern) -> Self { todo!() }
    pub fn keyword(self, text: &str, kind: K) -> Self { todo!() }
    pub fn trivia(self, kind: K) -> Self { todo!() }
    pub fn build(self, eof_kind: K, ident_kind: K) -> Result<CompiledLexer<K>, LexerError> { todo!() }
}

GreenNodeBuilder

Build green trees:

use sipha::syntax::{GreenNode, SyntaxKind};
type K = ();
type BuilderError = ();
struct GreenNodeBuilder;
pub struct GreenNodeBuilder {
    _phantom: std::marker::PhantomData<()>,
}

impl GreenNodeBuilder {
    pub fn new() -> Self { todo!() }
    pub fn start_node(&mut self, kind: K) { todo!() }
    pub fn token(&mut self, kind: K, text: &str) -> Result<(), BuilderError> { todo!() }
    pub fn finish_node(&mut self) -> Result<(), BuilderError> { todo!() }
    pub fn finish(self) -> Result<GreenNode<K>, BuilderError> { todo!() }
}

Incremental Parsing

IncrementalParser

Incremental parser wrapper:

use sipha::syntax::{GreenNode, SyntaxKind};
use sipha::grammar::Grammar;
use sipha::incremental::TextEdit;
type P = ();
type T = ();
type N = ();
type ParseResult<T, N> = ();
pub struct IncrementalParser<P, T, N> {
    _phantom: std::marker::PhantomData<(P, T, N)>,
}

impl<P, T, N> IncrementalParser<P, T, N> {
    pub fn new(parser: P) -> Self { todo!() }
    pub fn parse_incremental(
        &mut self,
        input: &[T],
        old_tree: Option<&GreenNode<T::Kind>>,
        edits: &[TextEdit],
        entry: N,
        grammar: Option<&Grammar<T, N>>,
    ) -> ParseResult<T, N> { todo!() }
}

TextEdit

Text edit:

use sipha::syntax::{TextRange, TextSize};

pub struct TextEdit {
    pub range: TextRange,
    pub new_text: String,
}

impl TextEdit {
    pub fn replace(range: TextRange, new_text: impl Into<String>) -> Self { todo!() }
    pub fn insert(pos: TextSize, text: impl Into<String>) -> Self { todo!() }
    pub fn delete(range: TextRange) -> Self { todo!() }
}

Error Types

ParseError

Parsing error:

use sipha::syntax::TextRange;

pub struct ParseError<T, N> {
    pub span: TextRange,
    pub message: String,
    pub expected: Vec<T>,
    pub found: Option<T>,
}

LexerError

Lexer error:

use sipha::syntax::TextRange;

pub struct LexerError {
    pub span: TextRange,
    pub kind: LexerErrorKind,
}

Next Steps

See Feature Flags for available features
Check FAQ for common questions

Feature Flags

Sipha uses feature flags to enable optional functionality.

Backend Features

`backend-ll`

Enable LL(k) parser backend (default).

[dependencies]
sipha = { version = "0.5.0", features = ["backend-ll"] }

`backend-lr`

Enable LR parser backend.

[dependencies]
sipha = { version = "0.5.0", features = ["backend-lr"] }

`backend-glr`

Enable GLR parser backend (requires backend-lr).

[dependencies]
sipha = { version = "0.5.0", features = ["backend-lr", "backend-glr"] }

Optional Features

`diagnostics`

Enable rich error diagnostics with miette.

[dependencies]
sipha = { version = "0.5.0", features = ["diagnostics"] }

`unicode`

Enable full Unicode support for identifiers.

[dependencies]
sipha = { version = "0.5.0", features = ["unicode"] }

`visitor`

Enable syntax tree visitor patterns.

[dependencies]
sipha = { version = "0.5.0", features = ["visitor"] }

`query`

Enable XPath-like query API for syntax trees.

[dependencies]
sipha = { version = "0.5.0", features = ["query"] }

`tree-utils`

Enable tree diffing and validation utilities.

[dependencies]
sipha = { version = "0.5.0", features = ["tree-utils"] }

Feature Combinations

Minimal

[dependencies]
sipha = "0.5.0"  # Only backend-ll

Full

[dependencies]
sipha = { 
    version = "0.5.0", 
    features = [
        "backend-ll",
        "backend-lr",
        "backend-glr",
        "diagnostics",
        "unicode",
        "visitor",
        "query",
        "tree-utils",
    ]
}

Language Server

[dependencies]
sipha = { 
    version = "0.5.0", 
    features = [
        "backend-ll",
        "diagnostics",
        "visitor",
        "query",
    ]
}

Default Features

By default, only backend-ll is enabled.

Next Steps

See API Overview for API reference
Check FAQ for common questions

FAQ

Frequently asked questions about Sipha.

General

What is Sipha?

Sipha is a flexible, incremental parsing library for Rust with support for multiple parsing algorithms.

Why use Sipha?

Sipha is designed for interactive applications like IDEs and language servers that need fast, incremental parsing.

Is Sipha production-ready?

Sipha is currently in alpha and under active development. The API is subject to change.

Parsing

Which backend should I use?

See Choosing a Backend for guidance.

Does Sipha support left recursion?

LL parser: With left-recursion elimination
LR parser: Yes, naturally
GLR parser: Yes, naturally

How do I handle ambiguous grammars?

Use the GLR backend. See GLR Parser for details.

Incremental Parsing

When should I use incremental parsing?

Use incremental parsing for interactive applications like IDEs and language servers.

How much faster is incremental parsing?

Small edits: 10-100x faster
Medium edits: 5-20x faster
Large edits: 2-5x faster

Do I need to provide the grammar for incremental parsing?

Yes, providing the grammar enables cache population, which improves performance.

Performance

How do I optimize performance?

See Performance for optimization tips.

Is Sipha fast?

Sipha is competitive with other Rust parsing libraries for batch parsing, and significantly faster for incremental parsing.

Grammar

How do I define a grammar?

Use GrammarBuilder. See Grammars for details.

Can I use regex in patterns?

Yes, use Pattern::Regex. See Lexers for details.

Error Handling

How do I handle errors?

Check result.errors after parsing. See Error Handling for details.

Does Sipha recover from errors?

Yes, Sipha supports configurable error recovery strategies.

Contributing

How do I contribute?

See Contributing for guidelines.

Where do I report bugs?

Open an issue on GitHub.

Next Steps

See Getting Started to get started
Check Examples for examples

Glossary

This glossary defines key terms used throughout the Sipha documentation.

A

Affected Region
The minimal range of text that needs to be re-parsed after an edit. Sipha computes this automatically by expanding the edit range to include necessary context.

Ambiguous Grammar
A grammar that can produce multiple valid parse trees for the same input. Sipha’s GLR backend can handle ambiguous grammars by producing a parse forest.

Arena Allocation
A memory allocation strategy where objects are allocated in a contiguous region (arena) for better cache locality and faster deallocation. Green trees use arena allocation.

B

Backend
A parsing algorithm implementation. Sipha supports multiple backends: LL(k), LR, and GLR, each with different characteristics.

C

Cache (Parse Cache)
A persistent storage for parse results that enables cross-session node reuse. The cache stores parse nodes keyed by rule name, position, and version.

Choice Expression
A grammar expression that matches one of several alternatives. Represented as Expr::choice(vec![...]).

D

DFA (Deterministic Finite Automaton)
A state machine used by Sipha’s lexer for efficient tokenization. DFAs provide O(n) tokenization time.

Disambiguation
The process of selecting a single parse tree from multiple valid alternatives in an ambiguous grammar. GLR parsers support various disambiguation strategies.

E

Entry Point
The starting non-terminal for parsing. Defined when building a grammar using GrammarBuilder::entry_point().

Error Recovery
The ability of a parser to continue parsing after encountering an error, rather than stopping immediately. Sipha supports configurable recovery strategies.

Expression (Grammar Expression)
A component of a grammar rule’s right-hand side. Can be a token, non-terminal, sequence, choice, optional, repeat, or empty.

F

First Set
The set of terminals that can appear as the first token in a derivation from a non-terminal. Used by LL parsers for prediction.

Follow Set
The set of terminals that can appear immediately after a non-terminal in a derivation. Used by LL parsers for error recovery.

Forest (Parse Forest)
A data structure representing multiple parse trees for ambiguous grammars. GLR parsers produce parse forests when ambiguity is detected.

G

GLR (Generalized LR)
A parsing algorithm that extends LR parsing to handle ambiguous and non-deterministic grammars by maintaining multiple parser stacks.

Grammar
A formal description of a language’s syntax, consisting of non-terminals, production rules, and an entry point.

Green Tree
The storage layer of Sipha’s syntax tree representation. Green trees are compact, immutable, and can be shared across multiple red trees.

I

Incremental Parsing
The ability to efficiently re-parse only changed portions of code, reusing unchanged subtrees from previous parses. Sipha’s core feature.

Immutable
Data structures that cannot be modified after creation. Sipha’s syntax trees are immutable, enabling safe sharing and efficient updates.

L

Left Recursion
A grammar rule where a non-terminal appears as the first symbol in its own production. Some backends require left-recursion elimination.

Lexer
The component that converts raw text into tokens. Also called a tokenizer or scanner.

LL(k) Parser
A top-down predictive parser that uses k tokens of lookahead to make parsing decisions.

LR Parser
A bottom-up shift-reduce parser that builds parse trees from the leaves up. Supports LALR and canonical LR variants.

N

Non-Terminal
A grammar symbol that can be expanded into other symbols. Represented as enum variants implementing the NonTerminal trait.

Node Reuse
The process of identifying and reusing unchanged subtrees from previous parses during incremental parsing.

P

Parse Cache
See Cache (Parse Cache).

Parser Backend
See Backend.

Production Rule
A grammar rule that defines how a non-terminal can be expanded. Written as NonTerminal -> Expression.

R

Red Tree
The API layer of Sipha’s syntax tree representation. Red trees provide convenient navigation and query APIs, created on-demand from green trees.

Reuse Budget
A limit on how many nodes to consider for reuse during incremental parsing. Can be fixed or heuristic-based.

S

Sequence Expression
A grammar expression that matches a sequence of expressions in order. Represented as Expr::seq(vec![...]).

Syntax Kind
A unified type representing both terminals (tokens) and non-terminals. All syntax kinds implement the SyntaxKind trait.

Syntax Tree
A tree representation of the syntactic structure of parsed code. Sipha uses green/red trees for efficient representation.

T

Terminal
A grammar symbol that cannot be expanded further. In Sipha, terminals are tokens produced by the lexer.

Token
A unit of lexical analysis, representing a sequence of characters with a specific meaning (e.g., identifier, keyword, operator).

Trivia
Tokens that are syntactically insignificant, such as whitespace and comments. Trivia is automatically skipped during parsing.

U

Unambiguous Grammar
A grammar that produces exactly one parse tree for each valid input. Most parsers require unambiguous grammars, except GLR.

V

Version (Cache Version)
A number used to invalidate old cache entries. When the grammar or lexer changes, the version increments, invalidating old cache entries.

Cheat Sheet

Quick reference for common Sipha patterns and APIs.

Syntax Kinds

use sipha::syntax::SyntaxKind;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum MySyntaxKind {
    // Terminals
    Number, Plus, Minus, Eof,
    // Non-terminals
    Expr, Term,
}

impl SyntaxKind for MySyntaxKind {
    fn is_terminal(self) -> bool {
        !matches!(self, Self::Expr | Self::Term)
    }
    
    fn is_trivia(self) -> bool {
        false // No trivia in this example
    }
}

Lexer Builder

use sipha::syntax::SyntaxKind;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum MySyntaxKind { Number, Plus, Minus, Eof, Whitespace, Expr, Term, }
impl SyntaxKind for MySyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, Self::Expr | Self::Term) }
    fn is_trivia(self) -> bool { matches!(self, Self::Whitespace) }
}
use sipha::lexer::{LexerBuilder, Pattern, CharSet};

let lexer = LexerBuilder::new()
    .token(MySyntaxKind::Number, Pattern::Repeat {
        pattern: Box::new(Pattern::CharClass(CharSet::digits())),
        min: 1,
        max: None,
    })
    .token(MySyntaxKind::Plus, Pattern::Literal("+".into()))
    .trivia(MySyntaxKind::Whitespace)
    .build(MySyntaxKind::Eof, MySyntaxKind::Number)?;

Grammar Builder

use sipha::syntax::SyntaxKind;
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum MySyntaxKind { Number, Plus, Minus, Eof, Whitespace, Expr, Term, }
impl SyntaxKind for MySyntaxKind {
    fn is_terminal(self) -> bool { !matches!(self, Self::Expr | Self::Term) }
    fn is_trivia(self) -> bool { matches!(self, Self::Whitespace) }
}
use sipha::grammar::{GrammarBuilder, Expr, NonTerminal};
use sipha::lexer::Token as LexerToken;
use sipha::syntax::{TextRange, TextSize};
use std::convert::TryFrom;

#[derive(Debug, Clone, PartialEq, Eq, Hash)]
enum MyNonTerminal { Expr, }
impl NonTerminal for MyNonTerminal {
    fn name(&self) -> &str { match self { MyNonTerminal::Expr => "Expr", } }
}

// Helper to create tokens
fn token(kind: MySyntaxKind, text: &str, offset: u32) -> LexerToken<MySyntaxKind> {
    let len = TextSize::from(u32::try_from(text.len()).unwrap_or(0));
    LexerToken::new(kind, text, TextRange::at(TextSize::from(offset), len))
}

let grammar = GrammarBuilder::new()
    .entry_point(MyNonTerminal::Expr)
    .rule(MyNonTerminal::Expr, Expr::token(token(MySyntaxKind::Number, "1", 0)))
    .build()?;

Grammar Expressions

Expression	Syntax	Description
Token	`Expr::token(token)`	Match a specific token
Non-terminal	`Expr::non_terminal(NT::Expr)`	Match a non-terminal
Sequence	`Expr::seq(vec![...])`	Match expressions in order
Choice	`Expr::choice(vec![...])`	Match one of several alternatives
Optional	`Expr::optional(Box::new(...))`	Match zero or one occurrence
Repeat	`Expr::repeat(Box::new(...), min, max)`	Match repeated occurrences
Empty	`Expr::Empty`	Match nothing (epsilon)

Parser Usage

use sipha::grammar::{GrammarBuilder, Grammar, Expr, NonTerminal};
use sipha::lexer::Token;
type MyToken = ();
type MyNonTerminal = ();
type MySyntaxKind = ();
let grammar: Grammar<MyToken, MyNonTerminal> = todo!();
let tokens: Vec<Token<MySyntaxKind>> = vec![];
use sipha::backend::ll::{LlParser, LlConfig};
use sipha::backend::ParserBackend;
use sipha::syntax::SyntaxNode;

let config = LlConfig::default();
let mut parser = LlParser::new(&grammar, config)?;
let result = parser.parse(&tokens, MyNonTerminal::Expr);

if !result.errors.is_empty() {
    // Handle errors
}
let root = SyntaxNode::new_root(result.root.clone());

Incremental Parsing

use sipha::backend::ll::LlParser;
use sipha::grammar::Grammar;
use sipha::lexer::Token;
use sipha::syntax::GreenNode;
type MyToken = ();
type MyNonTerminal = ();
type MySyntaxKind = ();
let parser: LlParser = todo!();
let grammar: Grammar<MyToken, MyNonTerminal> = todo!();
let tokens: Vec<Token<MySyntaxKind>> = vec![];
let old_tree: Option<&GreenNode<MySyntaxKind>> = None;
let entry_point: MyNonTerminal = todo!();
use sipha::incremental::{IncrementalParser, TextEdit};
use sipha::syntax::{TextRange, TextSize};

let mut incremental = IncrementalParser::new(parser);
let edits = vec![TextEdit::replace(
    TextRange::new(TextSize::from(0), TextSize::from(2)),
    "new".into(),
)];
let result = incremental.parse_incremental(
    &tokens,
    old_tree,
    &edits,
    entry_point,
    Some(&grammar),
);

use sipha::syntax::{SyntaxNode, GreenNode, SyntaxKind};
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
enum MySyntaxKind { Expr, }
impl SyntaxKind for MySyntaxKind {
    fn is_terminal(self) -> bool { false }
    fn is_trivia(self) -> bool { false }
}
let green_node: GreenNode<MySyntaxKind> = todo!();
use sipha::syntax::SyntaxNode;

let root = SyntaxNode::new_root(green_node);

// Children
for child in root.children() {
    println!("{:?}", child.kind());
}

// Descendants
for node in root.descendants() {
    if node.kind() == MySyntaxKind::Expr {
        // Found expression
    }
}

// Parent
if let Some(parent) = root.parent() {
    println!("Parent: {:?}", parent.kind());
}

// Siblings
for sibling in root.next_sibling().into_iter() {
    println!("Sibling: {:?}", sibling.kind());
}

Error Handling

use sipha::backend::ParserBackend;
type MyToken = ();
type MyNonTerminal = ();
struct Parser;
impl ParserBackend<MyToken, MyNonTerminal> for Parser {
    type Config = ();
    type Error = ();
    type State = ();
    fn new(_: &(), _: ()) -> Result<Self, ()> { Ok(Parser) }
    fn parse(&mut self, _: &[MyToken], _: MyNonTerminal) -> sipha::backend::ParseResult<MyToken, MyNonTerminal> { todo!() }
}
let mut parser = Parser;
let tokens: Vec<MyToken> = vec![];
let result = parser.parse(&tokens, MyNonTerminal::Expr);
// Check for errors
if !result.errors.is_empty() {
    for error in &result.errors {
        eprintln!("Error: {} at {:?}", error.message, error.span);
    }
}

// Check for warnings
for warning in &result.warnings {
    eprintln!("Warning: {}", warning.message);
}

Pattern Matching

Pattern	Syntax	Example
Literal	`Pattern::Literal("+".into())`	Match exact string
Char Class	`Pattern::CharClass(CharSet::digits())`	Match character class
Repeat	`Pattern::Repeat { pattern, min, max }`	Match repetition
Regex	`Pattern::Regex(regex)`	Match regex pattern

Common Character Sets

use sipha::lexer::CharSet;

let _ = CharSet::digits();        // [0-9]
let _ = CharSet::whitespace();    // \s
let _ = CharSet::letters();       // [a-zA-Z]
let _ = CharSet::alphanumeric();  // [a-zA-Z0-9]
let _ = CharSet::hex_digits();    // [0-9a-fA-F]

Backend Selection

Use Case	Backend	Reason
Simple grammar	LL	Easy to use, good error messages
Left recursion	LR	Natural support
Ambiguous grammar	GLR	Handles ambiguity
Maximum performance	LR	Efficient table-based parsing

Feature Flags

[dependencies]
sipha = { version = "0.5.0", features = [
    "backend-ll",      # LL(k) parser (default)
    "backend-lr",      # LR parser
    "backend-glr",     # GLR parser (requires backend-lr)
    "diagnostics",     # Rich error diagnostics
    "unicode",         # Unicode support
    "visitor",         # Visitor patterns
    "query",           # XPath-like queries
    "tree-utils",      # Tree utilities
] }

Common Patterns

Optional Semicolon

use sipha::grammar::{GrammarBuilder, Expr, NonTerminal};
use sipha::lexer::Token;
type MyToken = ();
type MyNonTerminal = ();
type MySyntaxKind = ();
let semicolon_token: Token<MySyntaxKind> = todo!();
let mut builder = GrammarBuilder::new();
builder.rule(MyNonTerminal::Stmt, Expr::seq(vec![
    Expr::non_terminal(MyNonTerminal::Expr),
    Expr::optional(Box::new(Expr::token(semicolon_token))),
]))

Delimited List

use sipha::grammar::{GrammarBuilder, Expr, NonTerminal};
use sipha::lexer::Token;
type MyToken = ();
type MyNonTerminal = ();
type MySyntaxKind = ();
let open_token: Token<MySyntaxKind> = todo!();
let close_token: Token<MySyntaxKind> = todo!();
let mut builder = GrammarBuilder::new();
builder.rule(MyNonTerminal::List, Expr::seq(vec![
    Expr::token(open_token),
    Expr::repeat(
        Box::new(Expr::non_terminal(MyNonTerminal::Item)),
        0,
        None,
    ),
    Expr::token(close_token),
]))

Operator Precedence

use sipha::grammar::{GrammarBuilder, Expr, NonTerminal};
use sipha::lexer::Token;
type MyToken = ();
type MyNonTerminal = ();
type MySyntaxKind = ();
let multiply_token: Token<MySyntaxKind> = todo!();
let plus_token: Token<MySyntaxKind> = todo!();
let mut builder = GrammarBuilder::new();
// Higher precedence first
builder.rule(MyNonTerminal::Expr, Expr::choice(vec![
    Expr::seq(vec![  // * has higher precedence
        Expr::non_terminal(MyNonTerminal::Term),
        Expr::token(multiply_token),
        Expr::non_terminal(MyNonTerminal::Term),
    ]),
    Expr::seq(vec![  // + has lower precedence
        Expr::non_terminal(MyNonTerminal::Expr),
        Expr::token(plus_token),
        Expr::non_terminal(MyNonTerminal::Term),
    ]),
    Expr::non_terminal(MyNonTerminal::Term),
]))

Keyboard shortcuts

The Sipha Book