How Incremental Parsing Works
This chapter explains the internal mechanisms of incremental parsing in Sipha.
Overview
Incremental parsing works by:
- Tracking edits: Identifying which parts of the text changed
- Computing affected regions: Determining minimal regions to re-parse
- Finding reusable nodes: Identifying unchanged subtrees from previous parse
- Re-parsing selectively: Only parsing affected regions
- Reconstructing the tree: Combining reused nodes with new parse results
Incremental Parsing Flow
flowchart TD
Start[Text Edit] --> Track[Track Edit]
Track --> Compute[Compute Affected Region]
Compute --> Find[Find Reusable Nodes]
Find --> Check{Reusable<br/>Nodes Found?}
Check -->|Yes| Reuse[Reuse Unchanged Subtrees]
Check -->|No| ParseAll[Parse Affected Region]
Reuse --> ParseGaps[Parse Gaps Only]
ParseGaps --> Combine[Combine Results]
ParseAll --> Combine
Combine --> Update[Update Parse Cache]
Update --> Reconstruct[Reconstruct Tree]
Reconstruct --> End[New Syntax Tree]
style Start fill:#e1f5ff,color:#000000
style Reuse fill:#c8e6c9,color:#000000
style ParseGaps fill:#fff9c4,color:#000000
style Combine fill:#ffccbc,color:#000000
style End fill:#c8e6c9,color:#000000
Edit Tracking
When text is edited, Sipha tracks the changes using TextEdit:
use sipha::incremental::TextEdit;
use sipha::syntax::{TextRange, TextSize};
let edit = TextEdit::replace(
TextRange::new(TextSize::from(10), TextSize::from(20)),
"new text".into(),
);
Edits can be:
- Replacements: Replace a range with new text
- Insertions: Insert text at a position
- Deletions: Delete a range
Affected Region Computation
Sipha computes the minimal region that needs to be re-parsed:
let affected = AffectedRegion::from_edits(&edits, Some(total_len));
The affected region:
- Union: The smallest range covering all edits
- Expanded: The union plus padding for context (default 16 characters)
The expansion ensures that context-dependent parsing (like operator precedence) works correctly.
Node Reuse
Sipha identifies nodes from the previous parse that can be reused:
let reusable = find_reusable_nodes(old_tree, affected.expanded(), budget);
A node is reusable if:
- It doesn’t intersect with the affected region
- It matches the expected syntax kind (if specified)
- It’s within the reuse budget
Reuse Budget
The reuse budget limits how many nodes to consider for reuse:
use sipha::incremental::ReuseBudget;
// Fixed budget
let budget = ReuseBudget::Fixed(1000);
// Heuristic budget (default)
let budget = ReuseBudget::Heuristic {
max_depth: 20,
max_nodes: 1000,
};
The heuristic budget considers:
- Tree depth: Deeper trees allow more candidates
- File size: Larger files allow more candidates
- Affected region size: Smaller affected regions allow more candidates
Incremental Session
An IncrementalSession captures everything needed for incremental parsing:
use sipha::incremental::IncrementalSession;
let session = IncrementalSession::new(old_tree, &edits);
The session provides:
- Old tree: Previous parse tree
- Edits: Text changes
- Affected region: Computed affected range
- Reusable nodes: List of nodes that can be reused
- Cache access: Optional persistent parse cache
Parse Cache
Sipha maintains a persistent parse cache for cross-session node reuse:
let session = IncrementalSession::with_cache(old_tree, &edits, &cache);
The cache:
- Keys: Rule name, start position, version
- Values: Cached parse nodes
- Versioning: Invalidates old entries based on version
- Eviction: Removes old versions automatically
Cache population happens automatically when using parse_incremental with a grammar:
let result = parser.parse_incremental(
&tokens,
old_tree,
&edits,
entry_point,
Some(&grammar), // Cache is populated automatically
);
Re-parsing Process
When re-parsing incrementally:
- Find reusable nodes: Query the session for reusable nodes at each position
- Parse gaps: Re-parse only the regions not covered by reusable nodes
- Combine results: Merge reused nodes with newly parsed nodes
- Update cache: Store new parse results in the cache
Tree Reconstruction
The final tree is reconstructed by:
- Starting from root: Begin with the old tree root
- Replacing affected subtrees: Replace subtrees in affected regions
- Inserting new nodes: Add newly parsed nodes
- Preserving structure: Maintain tree structure outside affected regions
Performance Characteristics
Incremental parsing performance depends on:
- Edit size: Smaller edits = better performance
- Tree structure: Deeper trees = more reuse opportunities
- Cache hit rate: Higher hit rate = better performance
- Affected region size: Smaller regions = faster re-parsing
Typical performance improvements:
- Small edits (1-10 characters): 10-100x faster
- Medium edits (10-100 characters): 5-20x faster
- Large edits (100+ characters): 2-5x faster
Limitations
Incremental parsing has some limitations:
- Context sensitivity: Some parsing decisions depend on distant context
- Error recovery: Error recovery may require re-parsing larger regions
- Ambiguity: Ambiguous grammars may require full re-parse
- Cache overhead: Cache management has some overhead
Next Steps
- See Usage for API examples
- Read Implementation Details for advanced topics