Nanopass Roadmap
This document proposes a nanopass-inspired compiler architecture for RukaLang. It is a migration plan, not an all-at-once rewrite.
Assumptions
- The main priorities are maintainability and clarity of invariants.
- Small compile-time regressions are acceptable if we gain much better architecture hygiene.
- We keep the current top-level stage order (
syntax -> meta -> elab -> hir -> check -> mir -> codegen) and split inside stages first. - We preserve current language behavior while we refactor pass structure.
If these assumptions change, we should revise this plan before implementation.
Assumptions confirmed by maintainer (2026-03-18).
Design Goals
- Minimize boilerplate when adding passes and intermediate forms, using macros or other metaprogramming constructs where appropriate.
- Carry source information through the whole pipeline with one shared representation.
- Maximize allocation performance with arena-based storage (
cranelift_entityfirst choice). - Make pass contracts explicit: each pass declares required input invariants and guaranteed output invariants.
- Framework should be split into its own crate to allow potential separate publication & reuse in other compilers later, but supporting RukaLang should be the top priority.
Core Architecture
1) Pass Interface With Low Boilerplate
Introduce a shared pass API with a small descriptor and one run method.
pub trait Pass {
type In;
type Out;
type Error;
const NAME: &'static str;
fn run(&mut self, input: Self::In, cx: &mut PassContext) -> Result<Self::Out, Self::Error>;
}
PassContext carries shared facilities:
- interners,
- source/provenance tables,
- diagnostics sink,
- reusable scratch arenas,
- stats/timing hooks.
To reduce repeated code, add a small helper macro for pass declaration metadata and a pass runner that wires logging/timing/diagnostic framing once.
2) Shared Source + Provenance Representation
Use one representation for all source and expansion provenance:
SourceFileId(entity_impl!newtype)SpanIdpointing into a global span arenaOriginIdfor synthetic/generated nodes
Each IR node stores either:
- a direct compact
SpanId, or - an
OriginIdwhen generated from other nodes.
OriginId resolves through a provenance graph:
Origin::Parsed { span }Origin::Expanded { from: OriginId, phase: PassId }Origin::Lowered { from: OriginId, phase: PassId }Origin::Synthesized { reason, parent: Option<OriginId> }
This gives one diagnostics path from any late IR entity back to user source, including meta expansion and lowering.
3) Allocation Model
Use PrimaryMap for arena-owned entities and SecondaryMap/side tables for annotations:
PrimaryMap<EntityId, Node>for nodes,SecondaryMap<EntityId, SpanId>for direct source,SecondaryMap<EntityId, OriginId>for provenance,- compact index-backed vectors for analysis facts.
Guidelines:
- pre-size maps from cheap counts when possible,
- avoid cloning large subtrees across passes; prefer id remapping tables,
- keep per-pass temporary state in scratch arenas owned by
PassContext, then clear/reuse.
Pipeline Refactor Plan
Current implementation reference:
- See Pass Inventory for the current typed pass list, execution order, and implementation links.
Implementation Status (2026-03-19)
Completed so far:
- Pass framework landed in
src/passwith typed pass execution, pass ids, timing capture, and shared provenance tables (SourceFileId,SpanId,OriginId). - Top-level production pipeline now runs through typed pass wrappers for all
current stages:
meta,elab,hir,check,mir,codegen.rust,codegen.wasm. - Elaboration split-in-progress: core runtime call/template concerns are now
explicit subpasses:
elab.normalize_runtime_calls_and_spreadselab.validate_runtime_call_argselab.bind_template_call_argselab.instantiate_runtime_function
- Per-pass observability landed:
- pass timings (
--dump-pass-timings) - pass snapshots (
--dump-pass-snapshots) - JSONL snapshots with schema/version (
--dump-pass-snapshots-json)
- pass timings (
- Provenance side-table implementation started:
- HIR expression origin side tables
- MIR local origin side tables
- origin chains include
Parsed -> Expanded(meta) -> Lowered(elab) -> Lowered(hir[/mir])
- Browser/WASM analysis path migrated onto driver-based pipeline hooks.
- CI now includes a browser WASM API smoke check to catch runtime regressions in compile/analyze behavior.
Remaining major work:
- Continue decomposition of
elabuntil major mixed-responsibility blocks are isolated behind stable pass contracts. - Start
checkphase split (collect_decls,resolve_signatures, etc.). - Extend provenance mapping to more node/entity kinds and tighten diagnostic source reconstruction quality.
- Add stronger fixture/snapshot coverage for pass contracts and structured snapshot schema stability.
Phase 0: Infrastructure First
Deliverables:
passcrate/module withPass,PassContext,PassId, pass runner.- Shared provenance tables and IDs (
SourceFileId,SpanId,OriginId). - Compiler driver updates to run a pass list and emit pass timing stats.
Exit criteria:
- current behavior unchanged,
- existing tests pass,
- source spans still appear in diagnostics.
Status:
- Complete.
Phase 1: Split elab Into Explicit Subpasses
Current elab mixes many concerns. First split candidate:
collect_runtime_templatesresolve_type_namesinstantiate_runtime_templatesinfer_runtime_expr_typesnormalize_runtime_calls_and_spreadsruntime_type_validation
Each subpass operates over one arena-backed runtime AST form with side tables, not deep cloning.
Exit criteria:
- golden tests for per-pass output snapshots,
- invariants documented for each pass,
- no language behavior drift.
Status:
- In progress. Core runtime call/template concerns now run as explicit elab
subpasses (
normalize_runtime_calls_and_spreads,validate_runtime_call_args,bind_template_call_args,instantiate_runtime_function).
Phase 2: Split check Into Independent Analyses
Suggested decomposition:
collect_declsresolve_signaturescheck_expr_and_stmt_typescheck_loans_and_movesfinalize_checked_program
Store analysis outputs in compact side tables keyed by expression/statement ids.
Exit criteria:
- diagnostics parity for current fixtures,
- checker internals no longer require one giant mutable state object.
Phase 3: Split mir_lower
Suggested decomposition:
build_function_skeletonslower_cfglower_types_and_layoutinsert_runtime_intrinsicsmir_sanity_validation
Exit criteria:
- MIR graph parity on fixture corpus,
- no codegen regressions in Rust/WASM outputs.
Phase 4: Optional Full Nanopass Expansion
After phases 1-3, we can choose finer granularity pass-by-pass.
Decision gate:
- if a pass still has mixed responsibilities or weak invariants, split again,
- if not, keep current granularity.
This keeps a path to full nanopass architecture without forcing every split immediately.
Boilerplate Reduction Strategy
- Use generated
EntityIdnewtypes (entity_impl!) and common arena wrappers. - Keep one pass registration table:
- pass name,
- input/output type ids,
- optional debug dump hook.
- Auto-wire pass logging, timing, and panic context in one runner.
- Reuse traversal helpers for common AST/HIR/MIR walk patterns.
Source/Diagnostics Strategy
- Every emitted diagnostic must carry an
OriginId. - Diagnostics rendering resolves origin chain to best user-facing span.
- If multiple source candidates exist (for generated nodes), render:
- primary span,
- one secondary note with expansion/lowering origin.
This keeps diagnostics robust as pass count grows.
Performance Strategy
- Prefer arena ids over owned recursive trees in inner passes.
- Keep hot tables in flat vectors keyed by entity index.
- Batch allocate nodes and annotations per pass; avoid per-node heap allocations.
- Collect and track pass timing/allocation counters from day one of migration.
Validation Plan
At each phase:
- Run
cargo test. - Run
./scripts/ci.shbefore PR. - Add fixture tests for any new diagnostics surface.
- Add pass contract tests:
- checks for required input invariants,
- checks for guaranteed output invariants.
Decisions (Confirmed)
- Pass errors use per-pass error enums wrapped by one top-level compiler error type.
- Provenance uses one canonical
OriginIdpath; parsed nodes areOrigin::Parsed { span }.- Storage policy: keep
OriginIdin side tables keyed by arena entity ids, not as direct IR node fields. - Rationale: lower node-size overhead, less constructor/pattern-match churn, one shared provenance representation.
- Storage policy: keep
- Subpasses prefer in-place mutation over arena-backed IR + side tables, and emit a new IR only when structure must change.
- Pass registration starts as a static compile-time pass list (typed, no dynamic dispatch).
- Expose pass-level debug dumps through CLI flags in phase 0.
- Persist provenance graph in browser artifacts and revisit graph presentation as pass count grows.
- Use one shared IR node id namespace per stage (not per-module) for maintainability.
Suggested First Implementation Slice
Keep this first slice small and reversible:
- Add
PassContext+ provenance ids/tables. - Wrap existing
elab::elaborate_programas a single pass under new runner. - Split only one
elabconcern (normalize_runtime_calls_and_spreads) into its own pass. - Verify diagnostics parity and benchmark compile time on fixture corpus.
If this slice lands cleanly, continue with the rest of phase 1.