fix(wasm): resolve incremental edge loss, unnecessary reparses, and V8 crash#938
Conversation
…8 crash Three WASM-engine bugs fixed: 1. Edge loss (#932): purgeAndAddReverseDeps deleted ALL outgoing edges from reverse-dep files, then relied on reparsing to rebuild them — but reparsed data was incomplete. Now saves edge topology before purge and reconnects to new node IDs after insert, preserving all edges without reparsing. 2. Unnecessary reparses (#933): reverse-dep files were added to parseChanges and fully reparsed even though only their edge targets changed. The new save-and-reconnect approach eliminates all reverse-dep reparsing — only actually-changed files are parsed. 3. V8 crash (#931): WASM tree objects stored in symbols._tree were never cleaned up on error paths, causing V8 to crash during GC of orphaned WASM objects (~40% of builds). Added error-path tree cleanup in the pipeline catch block. Also added a depth guard (MAX_WALK_DEPTH=500) in walkWithVisitors to prevent stack overflow during deep AST recursion. Closes #931, closes #932, closes #933 Impact: 8 functions changed, 28 affected
|
Claude finished @carlos-alm's task —— View job PR Review Complete ✅Comprehensive 5-phase review for WASM incremental build fixes: Todo List:
Phase 1: Root Cause Analysis ✅Problem Being Solved: Three critical WASM-engine bugs in incremental builds:
Why This Approach: The solution is architecturally sound:
Risk Assessment: Low risk. Changes are isolated to incremental build pipeline, maintain backward compatibility, and add safety mechanisms. Phase 2: Configuration Change Detection ✅Configuration Changes Found:
Analysis:
Verdict: Configuration changes are safety enhancements, not quality compromises. Phase 3: Test Integrity Check ✅Test File Analysis:
Test Coverage:
Verdict: Test integrity is excellent - no test modifications, claims are verifiable. Phase 4: Code Quality Assessment ✅Implementation Quality:
Architecture Compliance:
Potential Issues:
Verdict: High quality implementation with excellent error handling and performance improvements. Phase 5: Backlog Compliance Rating ✅Zero-dep: ✅ — No new runtime dependencies added. Uses existing tree-sitter/SQLite/better-sqlite3 infrastructure. Foundation-aligned: ✅ — Fully compliant with FOUNDATION.md principles:
Problem-fit: 5/5 — Directly addresses the core problem:
Breaking: No — Purely additive changes, no API signature modifications, backward compatible. Tier: Tier 1 (zero-dep ✅ + foundation-aligned ✅) Root Cause AnalysisProblem Being Solved: Three critical WASM incremental build bugs causing edge loss, unnecessary parsing, and V8 crashes Backlog Compliance
Critical ConcernsNone identified. This is a well-engineered stability and performance improvement. Final Recommendation
|
Codegraph Impact Analysis9 functions changed → 28 callers affected across 14 files
|
Greptile SummaryFixes three WASM incremental-build bugs: edge loss from missing reconnection after purge, unnecessary reverse-dep reparses, and a V8 GC crash from orphaned WASM tree objects. The save-and-reconnect strategy in Confidence Score: 5/5Safe to merge — all three bugs are correctly fixed and prior review concerns are fully addressed. All previous P0/P1 findings (stale source-ID dangling edges, silent MAX_WALK_DEPTH truncation, WASM trees not freed on success path) have been resolved. The save-and-reconnect implementation is logically sound: edges are captured before purge, co-changed-file sources are filtered, and new target IDs are resolved deterministically via proximity-based lookup. No new correctness issues found. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[detectChanges] --> B{hasPurge AND\nhasReverseDeps?}
B -- Yes, WASM path --> C[Save edges:\nnon-changed src → changed tgt\ninto savedReverseDepEdges]
C --> D[purgeFilesFromGraph\nchanged + removed files]
B -- No --> D
D --> E[parseFiles\nchanged files ONLY\nno reverse-dep reparse]
E --> F[insertNodes\nnew node IDs for changed files]
F --> G[buildEdges Phase 1\nnew edges from fileSymbols]
G --> H{savedReverseDepEdges\nnon-empty?}
H -- Yes --> I[reconnectReverseDepEdges\nlookup new target IDs\nORDER BY ABS line - saved_line\nre-insert edges]
H -- No --> J[runAnalyses]
I --> J
J --> K[WASM tree cleanup\ntree.delete on success path]
K --> L[finalize]
Reviews (4): Last reviewed commit: "Merge branch 'main' into fix/wasm-increm..." | Re-trigger Greptile |
| if (hasReverseDeps && hasPurge) { | ||
| const saveEdgesStmt = db.prepare(` | ||
| SELECT e.source_id, n_tgt.name AS tgt_name, n_tgt.kind AS tgt_kind, | ||
| n_tgt.file AS tgt_file, n_tgt.line AS tgt_line, | ||
| e.kind AS edge_kind, e.confidence, e.dynamic | ||
| FROM edges e | ||
| JOIN nodes n_src ON e.source_id = n_src.id | ||
| JOIN nodes n_tgt ON e.target_id = n_tgt.id | ||
| WHERE n_tgt.file = ? AND n_src.file != n_tgt.file | ||
| `); | ||
| for (const changedPath of changePaths) { | ||
| for (const row of saveEdgesStmt.all(changedPath) as Array<{ | ||
| source_id: number; | ||
| tgt_name: string; | ||
| tgt_kind: string; | ||
| tgt_file: string; | ||
| tgt_line: number; | ||
| edge_kind: string; | ||
| confidence: number; | ||
| dynamic: number; | ||
| }>) { | ||
| ctx.savedReverseDepEdges.push({ | ||
| sourceId: row.source_id, | ||
| tgtName: row.tgt_name, | ||
| tgtKind: row.tgt_kind, | ||
| tgtFile: row.tgt_file, | ||
| tgtLine: row.tgt_line, | ||
| edgeKind: row.edge_kind, | ||
| confidence: row.confidence, | ||
| dynamic: row.dynamic, | ||
| }); | ||
| } | ||
| } | ||
| debug(`Saved ${ctx.savedReverseDepEdges.length} reverse-dep edges for reconnection`); | ||
| } |
There was a problem hiding this comment.
Stale source IDs saved for edges between two co-changed files
The save query captures ALL edges whose target is in a changed file (n_tgt.file = ?), including edges whose source is also a file in changePaths. After purgeFilesFromGraph runs, those source nodes are deleted and their IDs are gone. reconnectReverseDepEdges in build-edges.ts then calls batchInsertEdges with the stale sourceId, creating a dangling edge. Since FK constraints are off by default in SQLite, the insert silently succeeds. buildEdges then also creates the correct A_new → B_new edge — leaving a duplicate dangling edge in the graph for every pair of co-changed interdependent files that also have a reverse-dep.
Add n_src.file AS src_file to the SELECT and filter it out in the push loop:
const changePathSet = new Set(changePaths);
const saveEdgesStmt = db.prepare(`
SELECT e.source_id, n_tgt.name AS tgt_name, n_tgt.kind AS tgt_kind,
n_tgt.file AS tgt_file, n_tgt.line AS tgt_line,
e.kind AS edge_kind, e.confidence, e.dynamic,
n_src.file AS src_file
FROM edges e
JOIN nodes n_src ON e.source_id = n_src.id
JOIN nodes n_tgt ON e.target_id = n_tgt.id
WHERE n_tgt.file = ? AND n_src.file != n_tgt.file
`);
for (const changedPath of changePaths) {
for (const row of saveEdgesStmt.all(changedPath) as Array<{
source_id: number; tgt_name: string; tgt_kind: string;
tgt_file: string; tgt_line: number; edge_kind: string;
confidence: number; dynamic: number; src_file: string;
}>) {
// Skip edges whose source is also being purged — buildEdges will
// re-create them with correct new IDs.
if (changePathSet.has(row.src_file)) continue;
ctx.savedReverseDepEdges.push({
sourceId: row.source_id,
tgtName: row.tgt_name,
tgtKind: row.tgt_kind,
tgtFile: row.tgt_file,
tgtLine: row.tgt_line,
edgeKind: row.edge_kind,
confidence: row.confidence,
dynamic: row.dynamic,
});
}
}There was a problem hiding this comment.
Fixed -- added changePathSet filter to skip edges whose source file is also in changePaths. The save query now selects n_src.file AS src_file and the loop does if (changePathSet.has(row.src_file)) continue before pushing to savedReverseDepEdges. This prevents stale source IDs from producing dangling edges after purge.
There was a problem hiding this comment.
Fixed -- added changePathSet filter to skip edges whose source file is also in changePaths. The save query now selects n_src.file AS src_file and the loop does if (changePathSet.has(row.src_file)) continue before pushing to savedReverseDepEdges. This prevents stale source IDs from producing dangling edges after purge.
| function walk(node: TreeSitterNode | null, depth: number): void { | ||
| if (!node) return; | ||
| if (!node || depth > MAX_WALK_DEPTH) return; |
There was a problem hiding this comment.
Silent truncation at MAX_WALK_DEPTH — no diagnostic log
When depth > MAX_WALK_DEPTH bails out, analysis results for that subtree (complexity, CFG, dataflow) will be silently incomplete. Without any log, it's impossible to know whether this guard ever fired or which files hit it. A single debug() call on first trigger would make this much easier to diagnose.
| function walk(node: TreeSitterNode | null, depth: number): void { | |
| if (!node) return; | |
| if (!node || depth > MAX_WALK_DEPTH) return; | |
| if (!node) return; | |
| if (depth > MAX_WALK_DEPTH) { | |
| debug(`walkWithVisitors: AST depth limit (${MAX_WALK_DEPTH}) hit — subtree truncated`); | |
| return; | |
| } |
There was a problem hiding this comment.
Fixed -- added debug() import and a log message when MAX_WALK_DEPTH is hit: debug(walkWithVisitors: AST depth limit (${MAX_WALK_DEPTH}) hit -- subtree truncated). This makes it diagnosable when the guard fires.
| if (ctx.allSymbols?.size > 0) { | ||
| for (const [, symbols] of ctx.allSymbols) { | ||
| const tree = symbols._tree as { delete?: () => void } | undefined; | ||
| if (tree && typeof tree.delete === 'function') { | ||
| try { | ||
| tree.delete(); | ||
| } catch { | ||
| /* ignore cleanup errors */ | ||
| } | ||
| } | ||
| symbols._tree = undefined; | ||
| } | ||
| } | ||
| if (ctx.db) { | ||
| closeDbPair({ db: ctx.db, nativeDb: ctx.nativeDb }); | ||
| } | ||
| } | ||
| throw err; | ||
| } |
There was a problem hiding this comment.
WASM trees not freed on the success path
The error-path cleanup correctly calls tree.delete() on WASM objects before GC can touch them. But on the success path the same ctx.allSymbols trees stay allocated until the GC decides to collect ctx. For large builds with many parsed files this is a long-lived WASM memory hold. Clearing the trees right after buildEdges (or at the start of runAnalyses) would release WASM memory deterministically on both paths and remove any risk of the same crash pattern appearing on slow-GC runs.
There was a problem hiding this comment.
Fixed -- added deterministic WASM tree cleanup on the success path, right after runAnalyses completes (trees are still needed during analysis). Same cleanup pattern as the error-path catch block: iterates ctx.allSymbols, calls tree.delete(), and sets _tree to undefined.
…ees on success path (#938) Impact: 4 functions changed, 26 affected
…#938) Add ORDER BY ABS(line - ?) to the reconnectReverseDepEdges node lookup so duplicate (name, kind, file) nodes (e.g. TypeScript overloads) pick the closest match by line number instead of relying on non-deterministic LIMIT 1 behavior. Impact: 1 functions changed, 3 affected
|
Fixed the remaining P2 nit about non-deterministic LIMIT 1 in reconnectReverseDepEdges. The query now uses |
|
Fixed the remaining P2 nit about non-deterministic LIMIT 1 in reconnectReverseDepEdges. The query now uses ORDER BY ABS(line - ?) LIMIT 1 with the saved tgtLine as a proximity hint, so duplicate (name, kind, file) nodes (e.g. TypeScript overloads) deterministically pick the closest match by line number. |
Summary
Fixes three WASM-engine bugs in the incremental build pipeline:
purgeAndAddReverseDepsdeleted ALL outgoing edges from reverse-dep files, then relied on reparsing to rebuild them — but reparsed data was incomplete. Now saves edge topology before purge and reconnects to new node IDs after insert, preserving all edges without reparsing.parseChangesand fully reparsed even though only their edge targets changed. The new save-and-reconnect approach eliminates all reverse-dep reparsing — only actually-changed files are parsed.symbols._treewere never cleaned up on error paths, causing V8 to crash during GC of orphaned WASM objects (~40% of builds). Added error-path tree cleanup in the pipeline catch block. Also added a depth guard (MAX_WALK_DEPTH=500) inwalkWithVisitorsto prevent stack overflow during deep AST recursion.Changes
src/domain/graph/builder/stages/detect-changes.tssrc/domain/graph/builder/stages/build-edges.tsreconnectReverseDepEdgesto reconnect saved edges to new node IDssrc/domain/graph/builder/context.tssavedReverseDepEdgesfield toPipelineContextsrc/domain/graph/builder/stages/run-analyses.tssrc/domain/graph/builder/pipeline.tssrc/ast-analysis/visitor.tsMAX_WALK_DEPTHdepth guard to prevent stack overflowTest plan
Closes #931, closes #932, closes #933