Performance
This chapter covers performance optimization tips and benchmarking for Sipha.
General Tips
Use Incremental Parsing
For interactive applications, always use incremental parsing:
let mut incremental = IncrementalParser::new(parser);
// Much faster for small edits
Choose the Right Backend
- LL(k): Fast, good for most grammars
- LR: Fast, good for batch parsing
- GLR: Slower, only use for ambiguous grammars
Optimize Grammar
- Factor common patterns: Extract common subexpressions
- Avoid deep nesting: Flatten deeply nested structures
- Use appropriate lookahead: Don’t use more lookahead than needed
Lexer Performance
Pattern Ordering
Order patterns from most specific to least specific:
// Good: Specific patterns first
.token(MySyntaxKind::Keyword, Pattern::Literal("if".into()))
.token(MySyntaxKind::Ident, Pattern::CharClass(CharSet::letters()))
// Bad: General patterns first
.token(MySyntaxKind::Ident, Pattern::CharClass(CharSet::letters()))
.token(MySyntaxKind::Keyword, Pattern::Literal("if".into()))
Use Character Classes
Prefer character classes over regex when possible:
// Good: Character class
.token(MySyntaxKind::Number, Pattern::CharClass(CharSet::digits()))
// Slower: Regex
.token(MySyntaxKind::Number, Pattern::Regex(r"\d+".into()))
Keyword Trie
Keywords are matched using a trie for O(m) performance where m is keyword length.
Parser Performance
Cache Population
Always provide grammar for cache population:
// Good: Cache populated
parser.parse_incremental(&tokens, old_tree, &edits, entry, Some(&grammar))
// Bad: No cache
parser.parse_incremental(&tokens, old_tree, &edits, entry, None)
Reuse Budget
Adjust reuse budget based on use case:
// For large files
let budget = ReuseBudget::Heuristic {
max_depth: 30,
max_nodes: 2000,
};
// For small files
let budget = ReuseBudget::Fixed(100);
Memory Optimization
Arena Allocation
Green nodes are arena-allocated for efficient memory usage.
Node Sharing
Incremental parsing shares unchanged nodes, reducing memory usage.
Cache Eviction
Cache automatically evicts old versions:
// Keep last 2 versions (default)
cache.evict_old_entries(2);
Benchmarking
Measure Parse Time
let start = std::time::Instant::now();
let result = parser.parse(&tokens, entry);
let duration = start.elapsed();
println!("Parse time: {:?}", duration);
println!("Metrics: {:?}", result.metrics);
Compare Backends
// LL parser
let ll_start = Instant::now();
ll_parser.parse(&tokens, entry);
let ll_time = ll_start.elapsed();
// LR parser
let lr_start = Instant::now();
lr_parser.parse(&tokens, entry);
let lr_time = lr_start.elapsed();
Profiling
Use a Profiler
Profile your parser to find bottlenecks:
# With perf
perf record --call-graph=dwarf cargo run --release
perf report
# With valgrind
valgrind --tool=callgrind cargo run --release
Common Bottlenecks
- Lexer: Pattern matching, especially regex
- Parser: Table lookups, especially for large grammars
- Tree construction: Node allocation and linking
Best Practices
- Profile first: Profile before optimizing
- Measure changes: Benchmark before and after changes
- Use incremental parsing: Always use for interactive apps
- Optimize grammar: Factor and simplify grammar
- Choose right backend: Use appropriate backend for use case
Next Steps
- See Grammar Analysis for grammar optimization
- Check Architecture for design considerations