Overview

RukaLang is an experimental language with mutable value semantics and staged metaprogramming inspired by MetaOCaml.

The fastest way to understand current direction is:

As this book grows, these details will be moved into versioned chapters that are validated in CI.

Guide

This section covers day-to-day usage, browser tooling, CI, and contributor workflows.

Build and Run

Build the compiler:

cargo build

Run a canonical example:

cargo run -- examples/basics.rk

Useful CLI invocations while developing:

cargo run -- --dump-ast examples/basics.rk
cargo run -- --dump-hir examples/basics.rk
cargo run -- --dump-tokens examples/basics.rk
cargo run -- --emit-rust=target/out.rs examples/basics.rk
cargo run -- --emit-wat=target/out.wat examples/basics.rk
cargo run -- --emit-wasm=target/out.wasm examples/basics.rk
cargo run -- --run examples/basics.rk

Run the full local CI check suite:

./scripts/ci.sh

Web Playground

The browser playground lives in web/. It loads examples/basics.rk by default, compiles in WASM when you click Compile, and shows AST, HIR, MIR, emitted Rust/WAT, and run results from generated WASM. Only validated WASM is executed; if WASM generation fails, Rust/graphs still render and diagnostics are shown in the output panel.

From the repository root:

cd web
npm install
npm run build:wasm
npm run dev

Local routes match production layout:

npm run dev runs full mdBook + rustdoc staging once via predev, so Docs and Rustdoc start from current local artifacts. During dev, docs watcher updates run lightweight mdBook/theme sync only (no rustdoc rebuild), so CSS/theme changes still stay in sync without heavy rebuilds.

Run browser E2E tests (Chromium + Firefox):

cd web
npx playwright install chromium firefox
npm run test:e2e

CI and Deploy

Run the full local check suite:

./scripts/ci.sh

./scripts/ci.sh currently runs:

  • cargo test
  • cargo test --doc
  • cargo test --manifest-path crates/rukalang_wasm/Cargo.toml
  • mdbook build docs
  • mdbook test docs
  • mdbook-linkcheck --standalone docs

Browser E2E tests are local-only and are not part of CI:

./scripts/e2e-local.sh

Equivalent manual commands:

cd web
npx playwright install chromium firefox
npm run test:e2e

Codeberg Pages deployment runs from /.forgejo/workflows/pages.yml on push to main, publishes the built site output to the pages branch, and relies on the repository webhook for https://ruka.codeberg.page/RukaLang/ to refresh the site.

The deploy workflow publishes rustdoc with private items enabled using:

cargo doc --no-deps --document-private-items

Published routes:

Documentation Workflow

RukaLang documentation sync is enforced with executable checks:

  • cargo test --doc validates Rust doc examples.
  • cargo test validates rk code fences in documentation chapters.
  • npm --prefix web run test:wasm-api validates browser WASM API smoke behavior.
  • mdbook build docs ensures the book renders.
  • mdbook test docs compiles runnable Rust snippets in the book.
  • mdbook-linkcheck --standalone docs fails on broken links.

For feature changes:

  1. Update API docs near code using /// and //! where relevant.
  2. Update guide/reference pages in the docs source tree when behavior changes.
  3. Keep command examples aligned with cargo run -- --help and tests.

RukaLang snippet fences use these tags:

  • rk: example must compile.
  • rk,run: example must compile and run.
  • rk,fail: example must fail to compile.

Rustdoc link convention for mdBook pages:

  • Use path-relative links rooted at ../../rustdoc/ from docs chapters.
  • Prefer reference-style links at the bottom of the page for readability.
  • Example:
    • in text: [`LowerMirPass`][rustdoc-lower-mir-pass]
    • link target: [rustdoc-lower-mir-pass]: ../../rustdoc/rukalang/driver/passes/struct.LowerMirPass.html

If a change intentionally requires no docs updates, explain why in the pull request.

Run docs checks locally:

mdbook build docs
mdbook test docs
mdbook-linkcheck --standalone docs

./scripts/ci.sh includes these checks.

Reference

Reference pages document stable user-facing behavior and command-line interfaces, including the browser-facing WASM API, staging model, and MIR representation.

Language Design (MVP)

This chapter captures the current language and runtime design direction for RukaLang.

Metaprogramming/staging design notes are maintained in docs/src/reference/metaprogramming.md.

Sections

Array and Slice Design

This page documents the V2 array/slice model as it is implemented today, plus the remaining backend ABI work.

For implementation-level type normalization and coercion policy, see: docs/src/internals/ownership-representation.md.

Goals

  • Make fixed-size and runtime-sized arrays explicit and internally consistent.
  • Reserve the word "slice" for borrowed/view semantics only.
  • Keep ownership mode on TypeRef (T, &T, @T) unchanged.
  • Avoid unnecessary runtime checks and copies.
  • Encode slice values in WASM ABI as two value slots (i32 pointer + i32 length).

Surface Language Model

The source syntax remains:

  • [T; n] for static arrays.
  • [T] for runtime-sized sequence type syntax.

Ownership is still controlled by parameter/local mode markers:

  • T view mode.
  • &T mutable borrow mode.
  • @T owned mode.

For parameters:

  • [T] means read-only view of sequence data.
  • &[T] means mutable borrow of sequence data.
  • @[T] means owned runtime-sized array.
  • [T; n] means a compile-time-sized read-only view.
  • &[T; n] means a compile-time-sized mutable borrow.
  • @[T; n] means owned compile-time-sized array.

Return annotations do not use ownership sigils. Return values are always owned.

Semantic Type Kinds

The redesign uses separate base type concepts. Ownership/mutability is provided by access mode (View, MutBorrow, Owned), not by the base type itself.

  1. StaticArray<T, N>

    • Fixed-size value type.
    • Length known at compile time.
    • No runtime length header is needed for the value itself.
  2. DynamicArray<T>

    • Owned runtime-sized array value.
    • Heap allocated.
    • Carries runtime length metadata as a header before the elements.
  3. Slice<T>

    • Borrow/view only.
    • Runtime representation is (data_ptr, len).
    • May point into static arrays or dynamic arrays.
    • Never owns backing storage.
  4. StaticSlice<T, N>

    • Fixed-extent view window used in normalized ownership paths.
    • No new surface syntax; this is an internal base-type concept used for [T; N] and &[T; N] in non-owned positions.
    • Runtime ABI representation is a thin pointer (data_ptr) because N is compile-time known.
    • May point into static arrays, dynamic arrays, or slice views after checks.

Ownership Interpretation

Given the type constructor shape above:

  • View mode (T) reads from caller-visible storage.
  • Borrowed mode (&T) is mutable borrow.
  • Owned mode (@T) creates an owned value by copying or moving.

Examples:

  • x: [i64; 4] is a read-only view of StaticSlice<i64, 4>.
  • x: &[i64; 4] is a mutable borrow of StaticSlice<i64, 4>.
  • x: @[i64; 4] is an owned StaticArray<i64, 4> value transfer.
  • x: [i64] is a view Slice<i64>.
  • x: &[i64] is a mutable Slice<i64> borrow.
  • x: @[i64] is an owned DynamicArray<i64> transfer.

Read-only versus mutable for view/borrow forms is encoded by ownership mode, not by changing the underlying collection type constructor:

  • View mode (T) means read-only access.
  • Borrow mode (&T) means mutable access.
  • Owned mode (@T) means owned value transfer.

Coercion and Compatibility Rules

Coercions are defined over normalized pairs (BaseTy, AccessMode) and consumed by both checker and MIR lowering.

Value Coercions

  • StaticArray<T, N> -> DynamicArray<T> is allowed.
    • Requires allocation of dynamic storage and element transfer/copy.
  • DynamicArray<T> -> StaticArray<T, N> is allowed.
    • Requires runtime length check (len == N).
    • Traps on failure.
    • Produces static-array storage for the destination.

Normalized view of the same rules:

  • (StaticArray<T, N>, Owned) -> (DynamicArray<T>, Owned) = RequiresMaterialization.
  • (DynamicArray<T>, Owned) -> (StaticArray<T, N>, Owned) = AllowedWithRuntimeCheck(len == N) + RequiresMaterialization.

Borrow/View Coercions

  • StaticArray<T, N> -> Slice<T> is allowed without copying.
  • DynamicArray<T> -> Slice<T> is allowed without copying.
  • StaticArray<T, M> -> StaticSlice<T, N> is allowed when M >= N.
    • No runtime check.
    • No copy.
  • StaticSlice<T, M> -> StaticSlice<T, N> is allowed when M >= N.
    • No runtime check.
    • No copy.
  • Slice<T> -> StaticSlice<T, N> is allowed.
    • Compiler inserts runtime check len >= N unless statically proven.
    • Passing a longer slice is valid.
  • DynamicArray<T> -> StaticSlice<T, N> is allowed.
    • Compiler inserts runtime check len >= N unless statically proven.
    • No copy when used as a borrow/view coercion.
  • StaticSlice<T, N> -> Slice<T> is allowed without copying.
    • Length is materialized as compile-time constant N in generated code.

Normalized view of borrow/view coercions:

  • (StaticArray<T, M>, View|MutBorrow) -> (StaticSlice<T, N>, View|MutBorrow) = AllowedNoCheck when M >= N.
  • (StaticSlice<T, M>, View|MutBorrow) -> (StaticSlice<T, N>, View|MutBorrow) = AllowedNoCheck when M >= N.
  • (Slice<T>, View|MutBorrow) -> (StaticSlice<T, N>, View|MutBorrow) = AllowedWithRuntimeCheck(len >= N) unless statically proven.
  • (DynamicArray<T>, View|MutBorrow) -> (StaticSlice<T, N>, View|MutBorrow) = AllowedWithRuntimeCheck(len >= N) unless statically proven.

Static-Length Reference Behavior

  • &[T; N] or view [T; N] can be represented as a thin pointer.
  • When source length is already known to satisfy >= N, no runtime check is needed.
  • When source length is runtime-known (for example from Slice<T>), compiler inserts a runtime check.

Index and Range Semantics

  • xs[i] reads element i using normal bounds policy.
  • xs[a..b] always produces Slice<T> (borrow/view).
  • Slice ranges always refer to existing storage and carry (ptr, len).
  • Creating an owned array from a range requires an explicit owned conversion path.

No implicit owned copy is created for range results.

Allocation and Storage Model

Static Arrays

  • Default storage for non-boxed static arrays is stack-like aggregate storage.
  • In direct WASM backend this means shadow-stack local storage when needed.
  • Static arrays do not use dynamic array headers.

Dynamic Arrays

  • Always heap allocated.
  • Always carry runtime length metadata in header.
  • Release logic follows owned heap object rules.

Slices

  • Non-owning view values only.
  • Represented as pointer+length pair.
  • Static-sized references use StaticSlice<T, N> and are represented as thin pointers; N is carried in type metadata, not runtime payload.
  • Never allocate by themselves.
  • Never require retain/release ownership operations.

WASM ABI Contract (Current)

ABI projection is derived from normalized (BaseTy, AccessMode) forms.

Core Mapping

  • Scalars keep current scalar mapping.
  • Static-array references ([T; N] in view/borrow positions) are thin pointer ABI values (i32).
  • Dynamic-array owned values (@[T]) use pointer ABI value (i32) to heap object with length header followed by the elements.
  • Slice values currently lower as pointer-sized ABI values in direct WASM.
  • Static-array and dynamic-array owned values also lower as pointer-sized ABI values in direct WASM.

Equivalent normalized mapping:

  • (StaticSlice<T, N>, View|MutBorrow) -> one i32 (thin pointer concept in normalization).
  • (Slice<T>, View|MutBorrow) -> currently one i32 runtime handle in direct WASM backend.
  • (DynamicArray<T>, Owned) -> one i32 heap handle.

Returns

  • Borrowed returns are rejected before ABI planning.
  • Owned slice returns currently follow the aggregate out-slot return path in direct WASM.
  • Tuple/struct/static-array aggregate returns use out-slot rules.

Shadow Stack

  • Owned slice values currently participate in shadow-stack aggregate handling in direct WASM.
  • Shadow stack remains for aggregate values that require addressable temporary storage.

MIR and Backend Representation Notes

The implementation is expected to update MIR type/instruction modeling so that:

  • Dynamic arrays and slices are different concepts.
  • Slice-producing instructions return slice-pair values.
  • Call lowering can pass and return slice pairs directly.
  • Heap ownership inference excludes slices.
  • Heap ownership for static arrays only applies when storage is actually heap-owned (for example boxed paths), not by default.

Runtime Check Insertion Policy

Compiler-inserted checks are required whenever static bounds are not proven.

Examples:

  • Slice<T> -> &[T; N] check len >= N.
  • DynamicArray<T> -> StaticArray<T, N> check len == N.
  • Index/range operations maintain existing bounds safety behavior.
  • StaticArray<T, 8> -> &[T; 4] requires no check and no copy.
  • Slice<T> -> &[T; 4] checks len >= 4; on success it passes a thin pointer.

When compile-time facts prove the check condition, the check is omitted.

Implementation Status

Implemented:

  1. Ownership normalization uses explicit StaticArray, DynamicArray, Slice, and StaticSlice base kinds.
  2. Shared coercion matrix drives checker and MIR boundary decisions.
  3. Runtime length checks are emitted for:
    • DynamicArray<T> -> StaticArray<T, N> (len == N)
    • Slice<T>|DynamicArray<T> -> StaticSlice<T, N> (len >= N)
  4. Runtime trap path for failed coercion checks uses std::panic.
  5. Return ownership sigils are disallowed; borrowed returns are rejected.

Remaining backend ABI work:

  1. Move direct WASM slice view ABI to explicit (ptr, len) multi-slot passing and returning.
  2. Remove slice dependence on aggregate out-slot/shadow-stack paths where the value can be represented directly in locals/results.

Non-Goals

  • No new user-facing keywords.
  • No parser pre-processing.
  • No syntax split for "minimum-length slice" types in this proposal.

Goals and Core Model

Goals

  • Use mutable value semantics (MVS): values are logically independent, and each value can be mutated.
  • Avoid tracing garbage collection.
  • Avoid manual memory management in user code.
  • Keep copies explicit at the assignment/operator level with predictable eager-copy behavior.
  • Keep memory behavior deterministic.

Core Model

  • Mutable Value Semantics - value semantics, no visible aliasing, no identity, but values are locally mutable.
  • No garbage collector or manual memory management - deterministic ownership and drop in generated code.
  • Second-class references - references may only be created in function and block signatures and may not outlive the function/block invocation.
  • Keep ownership costs predictable - borrows and moves avoid extra copy work.

No-Cycle Constraint

The language disallows ownership cycles, and those cycles should be impossible to create within MVS.

  • Composite ownership graphs must be acyclic.
  • Strong back-references that would create cycles are invalid.

Because cycles are disallowed, ownership-based drop remains sufficient for reclamation.

Ownership, Borrowing, and Types

Type-Level Ownership Modes

  • T: read-only view parameter (default for function parameters).
  • &T: mutable borrow parameter.
  • @T: owned parameter.

Context and Surface Forms

  • In type positions (parameter/return annotations, type terms), Name[...] is a type application/constructor form (for example Pair[i64]).
  • Pointer indirection is written as *T and is non-nullable.
  • Pointer allocation is written as @box(expr) and produces *T when expr: T.
  • Option[T] is a built-in optional type constructor. Use Option[*T] for nullable box references.
  • *T represents one explicit heap edge to inline T, so the immediate payload T cannot itself be another pointer or a built-in heap-handle value.
  • Built-in sequence type forms:
    • [T; N] fixed-size array
    • [T] runtime-sized array
  • In expression positions, name(...) is a call expression.
  • Resolution is context-driven and validated in later semantic passes.

Ownership markers also exist in both spaces:

  • Type-level: T, &T, @T.
  • Local/value-level: let x = expr, let @x = expr, &x, <-x forms.

Copyability Classes

  • Values are copyable by default.
  • Composite types may be explicitly declared linear (non-copyable).
  • In MVP, a user-defined linear type must contain at least one linear field (directly or transitively through contained types).
  • Copyable types may not contain linear fields (transitively).
  • Linear values can be moved and borrowed, but not copied.
  • Copyable containers (Array, slices, Map) reject linear element values in MVP.

Array literals ([a, b, c]) construct fixed-size arrays by default. When a [T] type is expected by context, the same literal form constructs an owned runtime-sized array.

Return annotations do not accept ownership sigils. Return values are always owned.

Local Binding Rules

  • let x = expr introduces a read-only view local.
  • let &x = expr introduces a mutable reference local.
  • let x <- y introduces a read-only local that now owns the moved value from y.
  • let @x = expr introduces an owned local.
  • Plain let locals cannot be assigned through, mutably borrowed, or moved with <-x, even when initialized with <-.
  • Mutable reference locals (let &x = expr) may be assigned through and mutably borrowed.
  • Owned let @x locals may be assigned through, mutably borrowed, and explicitly moved.

Read-only parameters and read-only locals are intentionally different:

  • A parameter of type T may alias caller storage because it is only a read-only view.
  • A local created with let x <- y becomes the new read-only owner of y's storage after the move.

Borrowing Rules

  • Plain T parameters and binders are read-only views. They cannot be assigned through and do not move ownership.
  • &T borrows are exclusive mutable borrows; they cannot coexist with any other mutable access to the same value.
  • Mutable borrow overlap checks are place-aware for struct fields and conservative for index/slice projections.
  • @T parameters receive an owned value. Plain x copies into @T, while <-x moves and invalidates the source binding.
  • Borrowed values are non-owning and may not escape the function or block that created them.

Lifetime and Destruction

  • Heap values are reclaimed when reference count reaches zero.
  • No tracing collector is used.
  • Frees of large object graphs can still cause long delays (for now).
  • Generated Rust and WASM implement box mutation with uniqueness checks before mutable borrows.

For compiler implementation details, see Borrow Checking (Internals).

Expression and Call Semantics

Assignment Semantics

The language has two assignment operators:

  • = means logical value copy.
  • <- means ownership transfer that invalidates a named source binding.

Syntax Snapshot (MVP)

  • Function declarations:
  • fn name(p1: T, p2: &U, p3: V) -> ReturnType { ... }
    • Return types are required in function signatures.
    • Return annotations do not accept ownership sigils; return values are always owned.
  • Assignment:
    • let b = a (read-only view local; source stays valid)
    • let @b = a (owned local initialized from a)
    • let b <- a (read-only view local initialized by move; source is invalid afterward)
    • let @b <- a (owned local initialized by move; source is invalid afterward)
    • let b <- expr and let @b <- expr are invalid when expr is not a named place
    • place.field = expr updates a struct field through a named place path
  • Array and slice types:
    • [T; N] is a fixed-size array type
    • [T] is an owned dynamically sized array type
  • Array literals:
    • [e1, e2, ...] is an array literal (Rust-like)
    • [] is an empty array literal and needs array type context
    • f(<-x) explicit move from named binding
    • f(rvalue_expr) implicit move from rvalue
  • Expression statements:
    • expr; discards the expression result regardless of type
  • Integer arithmetic:
    • unary -x requires x: i64 and returns i64
    • binary +, -, *, /, % require i64 operands and return i64
    • precedence follows * / % above + -, and unary - binds tighter than both

Copy Assignment (=)

  • Source and destination remain valid after assignment.
  • Copy assignment is valid only for copyable values.
  • Runtime performs an eager owned copy for copyable heap-backed values.

Local Declarations

  • let x = expr creates a read-only view local.
  • let &x = expr creates a mutable reference local.
  • let x <- y creates a read-only local that owns the moved value from y.
  • let @x = expr creates an owned local.
  • Assigning to x or borrowing &x requires let @x or let &x = ...; moving <-x still requires let @x.
  • Assigning to x is valid when x was declared with let &x = ....
  • let x <- y invalidates y, but x still remains read-only after the move.

Read-only parameters and read-only locals do not have identical storage semantics:

  • A parameter x: T may read directly from caller-owned storage.
  • A local let x <- y becomes the new read-only owner after the move and does not alias a still-live source binding.

Move Assignment (<-)

  • <- is only valid when the source is a named binding/place.
  • Destination receives ownership and source becomes invalid.
  • <- on rvalues/temporaries is invalid (there is no name to invalidate).
  • Using moved-from source is a runtime or compile-time error (depending on checker stage).

Arrays and Slices

  • [T; N] is an owned value and participates in =, <-, and <-x like other owned values.
  • [T] is an owned runtime-sized array value and is represented as an owned contiguous sequence.
  • In parameter mode T, [T] means a read-only slice view.
  • In parameter mode &T, [T] means a mutable slice view.
  • Slice parameters can accept array arguments with element-compatible item types.
  • @box(expr) allocates a non-null pointer value of type *T.
  • @array(init, len) constructs heap arrays ([T]) by inferring T from init.
  • @as(T, x) performs compile-time-safe numeric casts only.
  • @intCast(T, x) performs checked integer casts and traps on overflow.
  • @intToFloat(T, x) converts integer values to floating-point values.
  • @trunc(T, x) allows narrowing integer-to-integer and float-to-float casts.

Checked cast edge cases: @intCast(i8, 120i16) succeeds, while @intCast(u8, -1i16) and @intCast(i8, 255u16) trap.

Expression-Oriented Semantics

The language is expression-oriented (Rust-style): control-flow constructs and blocks are expressions.

Unit Result

  • Unit is implicit in syntax (no required () literal in MVP).
  • A block with no tail expression yields unit.
  • while yields unit.

if Expression Rules

  • if (cond) { then } else { other } is an expression.
  • if (cond) { then } (no else) is allowed only if then yields unit.
  • With else, both branches must yield compatible result categories.

Pointer and Option Semantics

Pointers are non-null owned handles used for explicit indirection.

  • Pointer types are written as *T.
  • @box(expr) constructs a pointer value of type *T when expr: T.
  • Option[T] is the built-in optional type constructor; use Option[*T] when a boxed value may be absent.
  • *expr dereferences a pointer expression and reads the pointee as a value.
  • Field/index projection through a pointer base implicitly dereferences as needed (for example node.next.value where node.next: *Node).
  • Passing &b when b: *T borrows the pointee as &T.
  • &*b is rejected.
  • *T adds exactly one explicit heap indirection whose allocation stores T inline.
  • The immediate payload T cannot itself be another pointer or a built-in heap-handle type such as String or owned runtime-sized array values.

Pointer and Optional Construction Examples

  • let @head = Some(@box(Node { value: 1, next: None() }));
  • match (head) { Some(node) => node.value, None() => 0 };

Result Discard Rules

  • Any expression statement form expr; discards the result.
  • let _ = expr, let _ <- expr, and let @_ <- expr are ordinary bindings to _ (not special discard forms).

Iteration Semantics

The language does not expose first-class references, so iterator design avoids storing borrowed references in user-visible iterator objects.

for Forms

  • for (collection) |x| { ... }
    • Plain binder |x| is a read-only view of each element.
  • for (collection) |&x| { ... }
    • Mutable-borrow binder |&x| follows &T semantics (exclusive mutable borrow per iteration).
  • for (<-collection) |x| { ... }
    • Consuming traversal form; <-collection invalidates the named collection binding.
    • Elements are yielded to x using the normal plain-binder read-only view semantics.

collection may be [T; N] or [T].

Normalization Rule

  • for (<-collection) |item| { ... } consumes collection and invalidates its binding.
  • for (collection) |<-item| { ... } is invalid; iteration does not move elements out directly.

Single-Form Principle

  • Redundant forms are disallowed to keep one canonical way to express behavior.
  • <- is valid on the iterable expression only for consuming traversal from a named collection.

Call Semantics

Index and Slice Access

  • xs[i] reads through a read-only view.
  • xs[a..b] produces a read-only slice view.
  • &xs[i] and &xs[a..b] request mutable borrows.
  • let x = place_expr and let &x = place_expr can bind references to fields/indexed items for the binding scope.
  • Local mutable borrows are checked for overlap: disjoint struct fields may coexist, while index/slice projections on the same root are conservatively treated as overlapping.
  • Copyable indexed elements may still be copied into owned locals or other owned contexts when required.
  • Plain slice views are not copied into owned values implicitly.

Argument Forms

  • f(x)
    • Valid for T (read-only view) and @T (owned copy).
  • f(&x)
    • Valid only when parameter type is &T.
  • f(<-x)
    • Explicit move into an owned parameter (@T), invalidating named binding x.
  • f(rvalue_expr)
    • For T, bare rvalues are borrowed for the duration of the call.
    • For @T, bare rvalues are passed as owned temporaries.

Parameter Mode Rules

For a parameter declared as T:

  • Passing f(x) borrows a read-only view.
  • Passing f(&x) or f(<-x) is invalid.
  • The callee cannot assign through the parameter binding.

For a parameter declared as &T:

  • Passing f(&x) creates a mutable borrow.
  • Other argument forms are invalid.

For a parameter declared as @T:

  • Passing f(x) copies into a fresh owned value.
  • Passing f(<-x) moves ownership and invalidates x.
  • Passing f(rvalue) moves the temporary into the callee.

<- exists only to invalidate named bindings; temporaries already have no source binding to invalidate.

Copy Strategy

Copyable heap aggregates (String, arrays, records) use eager copy semantics for = and owned argument copies.

Linear values cannot have aliases other than statically-checked borrows.

Storage and Runtime Model

Storage Model (v0.1)

Storage representation is an implementation detail, but the runtime should follow these rules.

  • Semantics stay value-based (no observable pointer identity).
  • Layout for sized types is deterministic (size, align, field offsets).
  • Nested sized structs are stored inline.
  • Built-in indirection type constructor: *T.
  • *T is non-nullable and represented as the address of one heap allocation containing T inline.
  • Option[*T] carries nullable box references at the language level and uses option enum layout with pointer niche optimization in backends.
  • User-defined recursive/non-stack-allocatable types must use explicit *T at recursion/indirection points.
  • v0.1 uses no implicit boxing for user-defined type recursion.

Inline vs Handle-Backed

  • Inline storage is used for small/sized copyable values in locals and parameters.
  • Built-in aggregates (String, slices, arrays) manage their own internal storage and do not require *T.
  • Pointer-backed storage (*T) is used for explicit user-directed indirection and recursive graph-like user-defined data.
  • The immediate payload of *T must not itself be a built-in heap-handle type.

Copyable and Linear Storage

  • Copyable heap-backed values use eager copies at copy boundaries.
  • v0.1 rule: non-stack-allocatable linear user-defined values require explicit *T indirection.
  • Linear values are move-only, but borrow safety still applies.

Iteration and Move-Out

  • Iteration does not move elements out directly.
  • Consuming traversal is expressed by moving the container binding (for (<-collection) |x|).

Runtime Representation

Value Categories

  • Immediate: inline scalar (i64, Bool, small enums).
  • Heap object: pointer to cell with metadata header.

Heap Header (minimum)

  • Type tag/layout id.
  • Flags.

Current implementation status:

  • Generated Rust pointer copies clone pointee values into fresh cells.
  • Generated WASM pointer copies allocate fresh cells, and release frees pointer cells on drop.
  • Pointer release also recursively walks heap-backed pointee payloads before freeing the outer pointer cell.
  • Generated WASM runtime strings are stored in linear memory; literals stay static and owned string values are freed on drop.
  • Generated WASM array storage frees backing allocations on drop.
  • Array release walks nested pointer/string/array elements before freeing the outer storage.
  • Nested tuple/struct aggregate fields are recursively walked during release, then aggregate storage is freed.

Interpreter/VM Requirements

Minimum conceptual bytecode operations:

  • ASSIGN_COPY dst, src
  • ASSIGN_MOVE dst, src
  • BORROW_RO dst, src
  • BORROW_MUT dst, src
  • DROP x
  • field/index mutation ops

The VM must emit precise drops at all scope/control-flow exits.

Runtime IR Boundary (v0.1 direction)

  • Runtime execution must not rely on source-level TypeExpr annotations.
  • Type/mode validation is performed in a checker pass before runtime IR generation.
  • Checker failures are hard errors and must stop execution.
  • Runtime IR carries only execution data (locals, blocks, ops, control-flow edges), plus dynamic runtime values.
  • Runtime checks remain for dynamic semantics (use-after-move, borrow exclusivity/lifetime, Bool condition enforcement).

Runtime IR Shape (Wasm-like, custom)

  • Use block + terminator CFG shape similar to Wasm control flow.
  • Keep ownership effects explicit as IR ops (Copy, Move, BorrowRo, BorrowMut, Drop).
  • Prefer dense entity IDs (cranelift_entity) and contiguous storage to minimize pointer chasing.
  • Resolve call targets to function IDs during lowering (avoid runtime name lookup in hot paths).

Validation and Diagnostics

Safety Rules

  • Use-after-move is a compile-time error.
  • Writing through read-only borrow is a compile-time error.
  • Borrow escapes are compile-time errors unless explicitly converted into owned values by copying.
  • Mutable alias violations are compile-time errors wherever possible; any need for runtime checking will be explicitly called out in the documentation.
  • No first-class references; references can never escape or otherwise outlive the function invocation, and may only be created through function or block parameters.

Error Model

For interpreted MVP, runtime checks are acceptable, but they will be called out in comments and documentation.

  • Diagnostics should identify binding and operation that failed.
  • Move errors should suggest = (copy) when user intended to keep source valid.
  • Borrow errors should suggest & (or loop |&x|) when mutation was attempted through read-only access.
  • Owned-argument errors should suggest plain x (copyable borrow/copy semantics) or <-x (move) for named bindings.
  • Linear-copy errors should suggest <-x (move) or borrow forms (x/&x).
  • Map-key errors should state that keys must be copyable and linear keys are invalid.

This section defines what is enforced statically versus at runtime in v1.

Static in v1

  • Parse and validate ownership mode annotations and required return types in function signatures.
  • Parse and validate local ownership markers (let x, let &x, let @x).
  • Validate call-site marker compatibility:
    • plain arg allowed for T (read-only view) and @T (owned copy).
    • &arg allowed only for &T parameters.
    • <-arg allowed only for @T parameters, and only when arg is a named place.
    • plain rvalue expressions allowed for T and @T (borrow for T, owned temporary for @T).
  • Treat plain xs[i] and xs[a..b] as read-only access forms; allow copying indexed elements into owned contexts, but reject implicit owned copies of slice views.
  • Reject obvious invalid borrow targets (& on non-place expressions).
  • Reject &*expr; mutable pointee borrows must use &name where name: *T.
  • Reject overlapping mutable/shared uses of the same place while local borrows are live.
  • Allow disjoint struct-field borrows in the same scope.
  • Treat index/slice borrow overlap conservatively (same root collection overlaps).
  • Reject <- on non-place expressions.
  • Reject illegal assignment forms, including <- from rvalues/non-place sources.
  • Reject null and pointer-binding condition forms (if (p) |x| { ... }, if (p) |&x| { ... }).
  • Reject = when operand value is linear.
  • Reject linear values in copyable containers (Array, slices, Map).
  • Reject linear values as map keys.
  • Validate Some(...) and None() constructor arity and enforce match exhaustiveness for Option[T].
  • Validate *expr only when expr: *T.
  • Validate for binders (|x|, |&x|) and reject invalid binders like |<-x|.
  • Allow consuming traversal only as for (<-collection) |x| with named-place move semantics.
  • if without else is valid only when the then branch is unit.
  • Enforce that functions declare intended ownership behavior in signatures.

Conformance Examples (v1)

// Assume signatures:
// fn view(x: T) -> T
// fn log(x: T) -> Unit
// fn edit(x: &T) -> Unit
// fn consume(x: @T) -> Unit

let @a = [1, 2]
let @b = a

view(a)        // valid: plain arg to T
view(&a)       // invalid: &arg requires &T parameter
view(<-a)      // invalid: <-arg requires @T parameter

edit(&a)       // valid: mutable borrow to &T
edit(a)        // invalid: missing & for &T parameter
edit(&[1, 2])  // invalid: & requires place expression, not temporary

consume(a)     // valid: copy into @T
consume(<-a)   // valid: explicit move into @T; a invalid after this call
consume(&a)    // invalid: &arg cannot bind to @T
consume([3, 4])// valid: rvalue to @T moves directly
consume(<-[3,4]) // invalid: <- requires a named source to invalidate

// Assume: linear Handle
let @s = make_handle()
let s2 = s      // invalid: linear values cannot be copied
consume(<-s)    // valid: move; s invalid after call

set_map_key(s, 1) // invalid: map keys must be copyable; linear values are not valid keys

let @c <- b      // valid move assignment; b invalid after move
let d = c        // valid read-only view local
let e <- d       // valid move into a read-only view local; d invalid after move
let @f <- [5, 6] // invalid: <- cannot be used with rvalue source
consume(<-e)      // invalid: e is read-only even though it now owns the moved value

if a { log(d) }           // valid: no else and then-branch is unit
if a { view(d) }          // invalid: no else requires unit then-branch
if a { view(d); }         // valid: expression statement discards return value

view(d);                  // valid: return value discarded
let _ <- view(d)          // valid: ordinary binding using move assignment
let _ = view(d)           // valid: ordinary binding using copy assignment

for (d) |item| { log(item); }      // valid plain element binding
for (d) |&item| { log(item); }     // valid mutable-borrow element binding
for (<-d) |item| { log(item); }    // valid consuming traversal; d invalidated
for (d) |<-item| { log(item); }    // invalid: cannot move elements out via loop binding

let @node = Some(@box(Node { value: 1, next: None() }))
match (node) {
  Some(n) => log(int_to_string(n.value)),
  None() => log("empty"),
}

let @p = @box(1)
edit(&p)               // valid: &p borrows pointee as &i64 when p: *i64
edit(&*p)              // invalid: &*expr is rejected

Expected diagnostics for invalid lines should mention:

  • required parameter mode (T, &T, @T),
  • provided argument form (x, &x, <-x, or rvalue),
  • and one concrete fix suggestion.

Conventions and Roadmap

Standard Library API Conventions

  • Read-only APIs use plain T parameters.
  • In-place mutation APIs use &T parameters.
  • Ownership-taking APIs use @T parameters.

Example Semantics

let @a = [1, 2, 3] // array literal: [i64; 3]
let @b = a         // copy assignment: both valid, eager copy

push(&b, 4)       // mutable borrow
len(a)            // read-only borrow

let @c <- b       // move assignment: b becomes invalid

consume(c)       // copy into owned parameter
consume(<-c)     // explicit move into owned parameter; c invalid after call
consume([9, 9, 9]) // rvalue array moves directly to matching @ type

let @head = Some(@box(Node { value: 7, next: None() }))
match (head) {
  Some(node) => log(int_to_string(node.value)),
  None() => log("empty"),
}

if (cond) {
  log(c);
}

make_value();

Behavioral guarantee: mutating one logical value never causes visible mutation of another logical value (no visible aliasing).

Implementation Phases

Phase 1: MVP Runtime

  • Type modes T, &T, @T for function parameters.
  • Required function return types.
  • Operators = and <- with defined validity rules.
  • Deterministic drop for owned values.
  • Eager copy on aggregate copy boundaries.
  • Non-null *T boxes with optionality represented as Option[*T].
  • Runtime checks for borrow and move violations.

Phase 2: Ergonomics and Performance

  • Better diagnostics.
  • Escape analysis for temporary borrow elision.
  • Small-string optimization and vector growth optimizations.
  • Reduce unnecessary temporary heap traffic.

Phase 3: Static Validation

  • Ahead-of-time validation for common move/borrow errors.
  • Earlier detection of alias/lifetime violations.
  • Optional strict mode with minimal runtime borrow checks in verified code.

Open Design Questions

  • Whether local bindings are immutable by default.
  • Whether <- is allowed in destructuring/pattern assignment for v1.
  • Which diagnostics are mandatory for MVP versus best-effort.
  • How/whether to support explicit aliases.
  • How to implement custom iterators.

CLI Flags

Primary invocation pattern:

cargo run -- [FLAGS] <input.rk>

Common flags currently used in examples:

  • --dump-ast
  • --dump-hir
  • --dump-tokens
  • --dump-pass-timings
  • --dump-pass-snapshots
  • --dump-pass-snapshots-json
  • --emit-rust=PATH
  • --emit-wat=PATH
  • --emit-wasm=PATH
  • --run

For now, use cargo run -- --help as the canonical source for flag behavior.

For machine-readable pass snapshots, see Pass Snapshot Schema.

Pass Snapshot Schema

This page defines the stable JSON lines schema emitted by:

  • --dump-pass-snapshots-json

JSON Envelope (schema v1)

Each line is one JSON object:

{
  "kind": "pass_snapshot",
  "schema_version": 1,
  "snapshot_kind": "hir_program",
  "name": "hir.lower_program",
  "detail": "functions=3 exprs=42 stmts=17",
  "fields": {
    "functions": 3,
    "exprs": 42,
    "stmts": 17
  }
}

Stable top-level keys:

  • kind: always pass_snapshot
  • schema_version: currently 1
  • snapshot_kind: stable semantic kind enum value
  • name: concrete pass name
  • detail: human-readable summary
  • fields: machine-readable key/value object

snapshot_kind Values

Stable values in schema v1:

  • meta_program
  • elab_program
  • hir_program
  • check_program
  • mir_program
  • codegen_rust
  • codegen_wasm

Field Contracts (schema v1)

meta_program (meta.expand_program):

  • functions (u64)
  • structs (u64)
  • enums (u64)

elab_program (elab.elaborate_program):

  • functions (u64)
  • structs (u64)
  • enums (u64)

hir_program (hir.lower_program):

  • functions (u64)
  • exprs (u64)
  • stmts (u64)

check_program (check.check_program):

  • signatures (u64)
  • local_symbols (u64)
  • occurrences (u64)

mir_program (mir.lower_program):

  • functions (u64)
  • locals (u64)
  • instrs (u64)

codegen_rust (codegen.rust.emit_program):

  • lines (u64)
  • bytes (u64)

codegen_wasm (codegen.wasm.emit_program):

  • wat_bytes (u64)
  • wasm_bytes (u64)
  • diagnostics (u64)

Compatibility Rules

  • Existing keys and snapshot_kind values are stable within schema v1.
  • New keys may be added to fields in v1, but existing keys keep their meaning.
  • Any breaking change requires incrementing schema_version.

Validation Script

Validate captured JSONL output with:

python3 scripts/validate-pass-snapshot-jsonl.py snapshots.jsonl

Example capture + validation:

cargo run -- --dump-pass-snapshots-json examples/basics.rk > snapshots.jsonl
python3 scripts/validate-pass-snapshot-jsonl.py --strict-pass-name snapshots.jsonl

CI integration note:

  • ./scripts/ci.sh validates any *.jsonl fixtures under tests/fixtures/pass-snapshots/.
  • If no fixtures exist, this check is skipped.

Browser WASM API

The browser wrapper crate is crates/rukalang_wasm.

It currently exposes these wasm-bindgen APIs:

  • compile_for_browser_json(source_name, source_text)
  • analyze_for_browser_json(source_name, source_text)
  • lex_for_browser_json(source_text)

Success payload fields:

  • ast_graph
  • hir_graph
  • mir_graph
  • rust_source
  • wat_source
  • wasm_bytes (optional u8 array; present only when binary emission succeeds)
  • wasm_diagnostics (non-fatal backend diagnostics)

Build prerequisite for direct WASM emission:

  • run ./scripts/build-runtime-wasm.sh before Rust or web WASM builds/tests that consume browser artifacts

Error payload shape:

  • object with diagnostics array
  • each diagnostic includes phase and message
  • syntax diagnostics may include line and column
  • phase values currently include module, syntax, meta, check, mir_lower, and codegen

compile_for_browser_json behavior notes:

  • Rust/AST/HIR/MIR artifacts are emitted when frontend + MIR + Rust codegen pass.
  • AST/HIR/MIR graph payloads are browser-friendly Cytoscape data rather than DOT text.
  • WASM backend diagnostics are reported in wasm_diagnostics without failing the whole compile payload.
  • wasm_bytes is omitted when the current source uses unsupported WASM backend features.
  • wat_source is generated from emitted wasm_bytes via wasmprinter, with synthesized names enabled to improve readability; when WASM emission fails, wat_source is empty.
  • Emitted wasm_bytes are validated with wasmparser before they are returned.
  • Emitted browser WASM exports run_main, which calls runtime assert_no_leaks by default after invoking user main.

analyze_for_browser_json payload shape:

  • ok: boolean (true when no diagnostics were produced)
  • diagnostics: same diagnostic entry shape as compile errors
  • highlight_spans: lexical/semantic token spans used by the editor

Validate the wrapper crate:

cargo test --manifest-path crates/rukalang_wasm/Cargo.toml

Metaprogramming

This chapter defines the current staging model used by RukaLang.

Goals

  • Keep runtime and compile-time behavior clearly separated.
  • Keep local type inference and require explicit type annotations only at boundaries.
  • Support explicit dynamic behavior through tagged unions, not implicit dynamic typing.
  • Enable future self-hosting/compiler-in-language workflows with typed staged code.

Phase Model

  • Runtime code executes normally.
  • Compile-time code executes only in explicit staged contexts.
  • Types are compile-time values (type) and do not flow as runtime values.

Context-Sensitive Surface Forms

RukaLang intentionally reuses several syntactic forms across runtime and staged contexts.

Syntax notes:

  • Name[...] is used in type/meta contexts for type application and constructor forms.
  • name(...) is used in expression contexts for function/meta-function calls.
  • Resolution is context-driven and validated in semantic passes.

Ownership markers appear in both spaces:

  • type-level: T, &T, @T
  • expression-level: &x, <-x

Supported First-Pass Syntax

Current parser/evaluator support covers:

  • meta fn declarations for compile-time-only functions.
  • meta { ... } blocks as explicit compile-time statement forms.
  • expr { ... } typed runtime-expression builders that produce Expr[T].
  • %{ ... } quote expressions.
  • $expr splice expressions in staged contexts.
  • $(...) inline runtime splices that evaluate a meta expression and require Code[...].
  • pattern match over staged values:
    • direct type patterns for type values (for example struct { x: i64 })
    • quote patterns %{ ... } for code values
  • code type constructor usage in type position: Code[T].
  • typed expression constructor usage in type position: Expr[T].
  • quoted/runtime struct operations in expressions:
    • construction: Name { field: value, ... }
    • field read: value.field
    • field update statement: value.field = expr;

Common aliases by convention:

  • Expr[T] for typed runtime expression generation.
  • Unit aliases the empty tuple type Tuple[].

Example

meta fn choose(flag: Bool, yes: Expr[i64], no: Expr[i64]) -> Expr[i64] {
  match flag {
    true => yes,
    false => no,
  };
}

fn main() -> Unit {
  meta {
    choose(true, expr { 4 }, expr { 9 });
  };
  0;
}

Inline runtime splice example:

meta fn make_message() -> Expr[String] {
  expr { "hello" };
}

fn main() -> String {
  $(make_message())
}

Current Limitations

  • Parser support comes first; some elaboration/type rules remain partial.
  • $expr is parsed in expression positions and must be resolved before runtime lowering.
  • Runtime cannot consume type values directly.
  • Type-structure matching supports direct type patterns in match arms.

Validation Rules

  • Runtime values cannot be used in compile-time-only contexts.
  • Compile-time-only values cannot escape into runtime expressions.
  • Splice insertion must be type-correct at the quote site.

Phase Boundary Behavior

Meta evaluation is strict: it can only read values introduced in meta contexts (meta-function parameters, meta let bindings, and meta call results). Runtime bindings are unavailable in the meta phase.

Valid meta-phase access:

meta fn use_int(k: i64) -> Expr[i64] {
  expr { k };
}

fn main() -> Unit {
  meta {
    use_int(4);
  };
  0;
}

Invalid runtime-to-meta access:

meta fn use_int(k: i64) -> Expr[i64] {
  expr { k };
}

fn main() -> Unit {
  let runtime_k = 4;
  meta {
    use_int(runtime_k); // error: runtime binding unavailable in meta phase
  };
  0;
}

MIR

This page is the high-level entry point for MIR documentation.

Detailed MIR internals (instruction set, lowering behavior, local slot representation, backend mapping, and naming) live in crate docs so there is one source of truth.

Where to read MIR docs

If you are looking for naming details like p_*, v_*, and t_* locals (for example v_1), use the crate docs above; that is where the full explanation is maintained.

Internals

This section documents implementation details that contributors use when working on compiler architecture.

Compiler Pipeline

At a high level, the compiler pipeline is:

  1. Parse source into syntax structures.
  2. Expand runtime meta { ... } forms into runtime AST.
  3. Elaborate/concretize runtime types (instantiate generic struct templates into concrete runtime structs).
  4. Lower into typed HIR.
  5. Run checker passes.
  6. Lower into MIR.
  7. Emit Rust code and optionally execute it.

Checker invariant: runtime generic types are not supported past elaboration. If a generic runtime type survives into checking, it is treated as an invariant violation and rejected.

See README.md, Language Design (MVP), and the Metaprogramming reference.

Nanopass Roadmap

This document proposes a nanopass-inspired compiler architecture for RukaLang. It is a migration plan, not an all-at-once rewrite.

Assumptions

  • The main priorities are maintainability and clarity of invariants.
  • Small compile-time regressions are acceptable if we gain much better architecture hygiene.
  • We keep the current top-level stage order (syntax -> meta -> elab -> hir -> check -> mir -> codegen) and split inside stages first.
  • We preserve current language behavior while we refactor pass structure.

If these assumptions change, we should revise this plan before implementation.

Assumptions confirmed by maintainer (2026-03-18).

Design Goals

  1. Minimize boilerplate when adding passes and intermediate forms, using macros or other metaprogramming constructs where appropriate.
  2. Carry source information through the whole pipeline with one shared representation.
  3. Maximize allocation performance with arena-based storage (cranelift_entity first choice).
  4. Make pass contracts explicit: each pass declares required input invariants and guaranteed output invariants.
  5. Framework should be split into its own crate to allow potential separate publication & reuse in other compilers later, but supporting RukaLang should be the top priority.

Core Architecture

1) Pass Interface With Low Boilerplate

Introduce a shared pass API with a small descriptor and one run method.

#![allow(unused)]
fn main() {
pub trait Pass {
    type In;
    type Out;
    type Error;

    const NAME: &'static str;

    fn run(&mut self, input: Self::In, cx: &mut PassContext) -> Result<Self::Out, Self::Error>;
}
}

PassContext carries shared facilities:

  • interners,
  • source/provenance tables,
  • diagnostics sink,
  • reusable scratch arenas,
  • stats/timing hooks.

To reduce repeated code, add a small helper macro for pass declaration metadata and a pass runner that wires logging/timing/diagnostic framing once.

2) Shared Source + Provenance Representation

Use one representation for all source and expansion provenance:

  • SourceFileId (entity_impl! newtype)
  • SpanId pointing into a global span arena
  • OriginId for synthetic/generated nodes

Each IR node stores either:

  • a direct compact SpanId, or
  • an OriginId when generated from other nodes.

OriginId resolves through a provenance graph:

  • Origin::Parsed { span }
  • Origin::Expanded { from: OriginId, phase: PassId }
  • Origin::Lowered { from: OriginId, phase: PassId }
  • Origin::Synthesized { reason, parent: Option<OriginId> }

This gives one diagnostics path from any late IR entity back to user source, including meta expansion and lowering.

3) Allocation Model

Use PrimaryMap for arena-owned entities and SecondaryMap/side tables for annotations:

  • PrimaryMap<EntityId, Node> for nodes,
  • SecondaryMap<EntityId, SpanId> for direct source,
  • SecondaryMap<EntityId, OriginId> for provenance,
  • compact index-backed vectors for analysis facts.

Guidelines:

  • pre-size maps from cheap counts when possible,
  • avoid cloning large subtrees across passes; prefer id remapping tables,
  • keep per-pass temporary state in scratch arenas owned by PassContext, then clear/reuse.

Pipeline Refactor Plan

Current implementation reference:

  • See Pass Inventory for the current typed pass list, execution order, and implementation links.

Implementation Status (2026-03-19)

Completed so far:

  1. Pass framework landed in src/pass with typed pass execution, pass ids, timing capture, and shared provenance tables (SourceFileId, SpanId, OriginId).
  2. Top-level production pipeline now runs through typed pass wrappers for all current stages: meta, elab, hir, check, mir, codegen.rust, codegen.wasm.
  3. Elaboration split-in-progress: core runtime call/template concerns are now explicit subpasses:
    • elab.normalize_runtime_calls_and_spreads
    • elab.validate_runtime_call_args
    • elab.bind_template_call_args
    • elab.instantiate_runtime_function
  4. Per-pass observability landed:
    • pass timings (--dump-pass-timings)
    • pass snapshots (--dump-pass-snapshots)
    • JSONL snapshots with schema/version (--dump-pass-snapshots-json)
  5. Provenance side-table implementation started:
    • HIR expression origin side tables
    • MIR local origin side tables
    • origin chains include Parsed -> Expanded(meta) -> Lowered(elab) -> Lowered(hir[/mir])
  6. Browser/WASM analysis path migrated onto driver-based pipeline hooks.
  7. CI now includes a browser WASM API smoke check to catch runtime regressions in compile/analyze behavior.

Remaining major work:

  1. Continue decomposition of elab until major mixed-responsibility blocks are isolated behind stable pass contracts.
  2. Start check phase split (collect_decls, resolve_signatures, etc.).
  3. Extend provenance mapping to more node/entity kinds and tighten diagnostic source reconstruction quality.
  4. Add stronger fixture/snapshot coverage for pass contracts and structured snapshot schema stability.

Phase 0: Infrastructure First

Deliverables:

  1. pass crate/module with Pass, PassContext, PassId, pass runner.
  2. Shared provenance tables and IDs (SourceFileId, SpanId, OriginId).
  3. Compiler driver updates to run a pass list and emit pass timing stats.

Exit criteria:

  • current behavior unchanged,
  • existing tests pass,
  • source spans still appear in diagnostics.

Status:

  • Complete.

Phase 1: Split elab Into Explicit Subpasses

Current elab mixes many concerns. First split candidate:

  1. collect_runtime_templates
  2. resolve_type_names
  3. instantiate_runtime_templates
  4. infer_runtime_expr_types
  5. normalize_runtime_calls_and_spreads
  6. runtime_type_validation

Each subpass operates over one arena-backed runtime AST form with side tables, not deep cloning.

Exit criteria:

  • golden tests for per-pass output snapshots,
  • invariants documented for each pass,
  • no language behavior drift.

Status:

  • In progress. Core runtime call/template concerns now run as explicit elab subpasses (normalize_runtime_calls_and_spreads, validate_runtime_call_args, bind_template_call_args, instantiate_runtime_function).

Phase 2: Split check Into Independent Analyses

Suggested decomposition:

  1. collect_decls
  2. resolve_signatures
  3. check_expr_and_stmt_types
  4. check_loans_and_moves
  5. finalize_checked_program

Store analysis outputs in compact side tables keyed by expression/statement ids.

Exit criteria:

  • diagnostics parity for current fixtures,
  • checker internals no longer require one giant mutable state object.

Phase 3: Split mir_lower

Suggested decomposition:

  1. build_function_skeletons
  2. lower_cfg
  3. lower_types_and_layout
  4. insert_runtime_intrinsics
  5. mir_sanity_validation

Exit criteria:

  • MIR graph parity on fixture corpus,
  • no codegen regressions in Rust/WASM outputs.

Phase 4: Optional Full Nanopass Expansion

After phases 1-3, we can choose finer granularity pass-by-pass.

Decision gate:

  • if a pass still has mixed responsibilities or weak invariants, split again,
  • if not, keep current granularity.

This keeps a path to full nanopass architecture without forcing every split immediately.

Boilerplate Reduction Strategy

  1. Use generated EntityId newtypes (entity_impl!) and common arena wrappers.
  2. Keep one pass registration table:
    • pass name,
    • input/output type ids,
    • optional debug dump hook.
  3. Auto-wire pass logging, timing, and panic context in one runner.
  4. Reuse traversal helpers for common AST/HIR/MIR walk patterns.

Source/Diagnostics Strategy

  1. Every emitted diagnostic must carry an OriginId.
  2. Diagnostics rendering resolves origin chain to best user-facing span.
  3. If multiple source candidates exist (for generated nodes), render:
    • primary span,
    • one secondary note with expansion/lowering origin.

This keeps diagnostics robust as pass count grows.

Performance Strategy

  1. Prefer arena ids over owned recursive trees in inner passes.
  2. Keep hot tables in flat vectors keyed by entity index.
  3. Batch allocate nodes and annotations per pass; avoid per-node heap allocations.
  4. Collect and track pass timing/allocation counters from day one of migration.

Validation Plan

At each phase:

  1. Run cargo test.
  2. Run ./scripts/ci.sh before PR.
  3. Add fixture tests for any new diagnostics surface.
  4. Add pass contract tests:
    • checks for required input invariants,
    • checks for guaranteed output invariants.

Decisions (Confirmed)

  1. Pass errors use per-pass error enums wrapped by one top-level compiler error type.
  2. Provenance uses one canonical OriginId path; parsed nodes are Origin::Parsed { span }.
    • Storage policy: keep OriginId in side tables keyed by arena entity ids, not as direct IR node fields.
    • Rationale: lower node-size overhead, less constructor/pattern-match churn, one shared provenance representation.
  3. Subpasses prefer in-place mutation over arena-backed IR + side tables, and emit a new IR only when structure must change.
  4. Pass registration starts as a static compile-time pass list (typed, no dynamic dispatch).
  5. Expose pass-level debug dumps through CLI flags in phase 0.
  6. Persist provenance graph in browser artifacts and revisit graph presentation as pass count grows.
  7. Use one shared IR node id namespace per stage (not per-module) for maintainability.

Suggested First Implementation Slice

Keep this first slice small and reversible:

  1. Add PassContext + provenance ids/tables.
  2. Wrap existing elab::elaborate_program as a single pass under new runner.
  3. Split only one elab concern (normalize_runtime_calls_and_spreads) into its own pass.
  4. Verify diagnostics parity and benchmark compile time on fixture corpus.

If this slice lands cleanly, continue with the rest of phase 1.

Pass Inventory

Current passes that run through the typed pass mechanism (Pass + PassContext). Rustdoc links use mdBook-relative paths (../../rustdoc/...).

Top-Level Pipeline (execution order)

  1. meta.expand_program - Expand meta constructs into runtime-facing AST.
  2. elab.elaborate_program - Elaborate types/templates in AST.
  3. hir.lower_program - Lower elaborated AST to HIR.
  4. check.check_program - Semantic/type check HIR.
  5. mir.lower_program - Lower HIR + checker facts to MIR.
  6. codegen.rust.emit_program - Emit Rust source from MIR.
  7. codegen.wasm.emit_program - Emit WAT/WASM artifacts from MIR.

Elaboration Subpasses

  1. elab.normalize_runtime_calls_and_spreads - Rewrite ...tuple call args.
  2. elab.validate_runtime_call_args - Validate normalized runtime call args.
  3. elab.bind_template_call_args - Bind template call args and build specialization key.
  4. elab.instantiate_runtime_function - Instantiate/cache concrete runtime template function.

Calling Conventions

RukaLang has two internal call boundaries that matter for compiler work:

  1. MIR-level ownership and representation conventions.
  2. Backend ABI conventions (Rust emission and direct WASM emission).

This page documents the current rules and points to the rustdoc pages where those rules are encoded.

MIR-Level Contract

MIR stores both ownership mode and runtime/storage representation. Together, these define how a value is passed at call boundaries.

Current boundary semantics:

  • View parameters are source-level read-only access; MutBorrow parameters are mutable borrow access; Owned parameters are value transfer parameters.
  • MirParamBinding::source_repr and MirParamBinding::local_repr define source boundary shape vs lowered local shape.
  • MirParamBinding::requires_materialization marks parameters where the lowered local representation differs from the source boundary representation.
  • MirParamBinding::materializes_view_from_owned marks the current materialized view case where a source owned value is projected into a view-local boundary.
  • MirCallArgBinding::requires_deref_read marks call arguments that need a load/read from a place local before value passing.
  • Boundary coercions that require runtime checks currently lower at MIR call sites with explicit CollectionLen comparisons and CallExtern("std::panic") on failure.

Core MIR container docs:

Rust Backend Convention

Rust codegen follows Rust-level references/values directly:

Behavior by argument mode:

  • Borrowed (view call arg mode) passes &T (or &*place when the local is place-shaped).
  • MutableBorrow passes &mut T (or &mut *place for mutable place locals).
  • OwnedMove passes by move; place reads are cloned from dereference.
  • OwnedCopy passes a cloned value; place reads are cloned from dereference.

For slice place reads copied into owned values, Rust emission uses .to_vec() instead of (*place).clone().

Entry points:

WASM Backend Convention

The direct WASM backend uses a strict, explicit ABI with normalized value types and out-slot returns for aggregate return values.

Signature shaping is defined by:

Current value mapping:

  • i64 lowers to i64.
  • Most other runtime values lower to pointer-sized i32 handles (including strings, pointers, arrays, tuples, structs, enums, slices, and references).

Return conventions:

  • Non-aggregate mutable-borrow params use an inout convention: argument value in, updated value returned as an extra WASM result.
  • Scalar-like returns use normal WASM result values.
  • Aggregate returns (currently tuple/struct/slice) use an out-slot pointer parameter inserted at parameter index 0 and no WASM result.
  • Aggregate temporaries are placed on the runtime shadow stack when required.

For the full shadow-stack lifecycle and memory layout, see WASM Shadow Stack.

Call lowering is defined by:

The call-argument strategy is selected from local representation and arg mode:

  • Pass-through by value for normal value locals and mutable-borrow pointer ABI args.
  • Dereference-load for mutable-borrow inout args when the source local is a non-passthrough place.
  • Dereference-load to match callee ABI when reading from place-shaped locals.

Backend entry points:

Runtime WASM ABI Surface

Runtime-call ABI metadata is centralized in ruka_runtime:

The coercion runtime trap path is exposed as std::panic in this descriptor table.

This descriptor table is what the WASM backend linker and call lowering use for symbol resolution and runtime signature checks.

Borrow Checking

RukaLang now tracks local borrows with a place-based checker that is smaller than Rust borrowck but follows the same safety shape for overlapping access.

Scope and Goals

  • Keep reference semantics simple for MVP.
  • Support temporary local references to named bindings, struct fields, tuple fields, indexed elements, and slice ranges.
  • Prevent overlapping mutable and shared access to the same storage region.
  • Avoid lifetime inference complexity by keeping references local-only (no storing in user data structures and no returning references).

Surface Forms Covered

  • let x = place_expr creates a shared local reference when place_expr is a place expression.
  • let &x = place_expr creates a mutable local reference.
  • Existing call-argument borrow forms remain available (&arg for &T parameters).

For non-place initializers, plain let x = expr continues to behave like a normal value initialization.

Place Model

The checker resolves borrowable expressions into a canonical place path:

  • root binding name (x)
  • zero or more projections:
    • field projection (.field / tuple index like .0)
    • index-like projection ([i] and [a..b] both normalize to one index-like projection)

Examples:

  • pair.left -> pair .field(left)
  • xs[3] -> xs .index_like
  • xs[1..3] -> xs .index_like
  • pair.left.value -> pair .field(left) .field(value)

Active Loans

Each scope keeps a list of active loans:

  • loan kind: shared or mutable
  • place path
  • owner local (the local binding that introduced the loan)

Loans are introduced by local borrow declarations and removed when the owning scope exits.

Overlap Rule

Two places overlap when:

  • they have the same root binding, and
  • their projections are not proven disjoint.

Disjointness rule used today:

  • field-vs-field at the same depth with different field names is disjoint (pair.left vs pair.right)
  • any case involving index-like projection is treated as overlapping (conservative)
  • prefix/ancestor-descendant place relations overlap

This matches Rust's conservative behavior for array/slice indexing while still allowing independent struct-field borrows.

Enforced Access Rules

  • read of a place is rejected if an overlapping mutable loan is active
  • write/move of a place is rejected if any overlapping loan is active
  • creating a shared loan is rejected if an overlapping mutable loan is active
  • creating a mutable loan is rejected if any overlapping loan is active

Current Limits

  • No index disjointness proof (xs[0] vs xs[1] is still overlapping).
  • No borrow splitting API yet (Rust-like split_at_mut equivalent not present).
  • Checker is lexical-scope based; it does not perform advanced non-lexical lifetime shortening.

These limits are intentional for MVP simplicity.

WASM Shadow Stack

This page explains exactly when the direct WASM backend uses the shadow stack, how frame layout is computed, and how out-slot returns interact with it.

What It Is

RukaLang's direct WASM backend uses a per-call shadow stack frame for aggregate values that should not be heap-allocated for temporary use.

The compile-time decisions are encoded in:

The runtime ABI symbols used to reserve/release the frame are:

Exactly When It Is Used

A local is assigned shadow-stack storage when all of the following are true:

  1. The local is not a function parameter.
  2. Either:
    • the local is a value local whose type is one of Tuple, Struct, or Slice, or
    • the local is a slice place local (RefRo<Slice<_>> or RefMut<Slice<_>>).

This selection logic is implemented by should_shadow_stack_local and is_shadow_stack_aggregate_ty.

Frame Layout and Prologue

Frame construction happens in lower_function.

For each selected local:

  1. Payload bytes are computed via aggregate_payload_bytes.
  2. Slot size is align_up(ARRAY_DATA_OFFSET + payload_bytes, 8).
  3. Offsets are assigned in declaration order, each starting at 8-byte alignment.

After all slots are sized:

  • If frame size is zero, no runtime shadow-stack calls are emitted.
  • If frame size is non-zero, function entry emits one reserve call to __ruka_rt::shadow_stack_reserve(frame_bytes).
  • The returned base pointer is kept in a scratch local.
  • Each shadow local is initialized to frame_base + local_offset.

How Instructions Use Shadow-Stack Locals

Aggregate-producing instructions check whether dst is shadow-backed:

  • lower_aggregate_instr skips heap allocation for tuple/struct/slice destinations and requires those destinations to be shadow-backed.
  • The instruction then writes aggregate fields directly through the local pointer.

For call destinations:

  • lower_call_family_instr checks whether the destination local is shadow-backed and requires out-slot destinations to be shadow-backed.

Out-Slot Returns and Caller/Callee Behavior

Return-type decision:

  • function_returns_via_out_slot currently returns true for tuple/struct/slice return types.
  • Borrowed/reference returns are rejected before signature planning.

Signature shaping:

  • signature_types inserts an i32 out-slot pointer parameter at index 0 when return-via-out-slot is required.
  • In that case, the WASM result list is empty.

Call-site behavior:

  • Caller passes destination pointer as arg 0.
  • If destination local is shadow-backed, that pointer is reused.
  • If destination local is not shadow-backed, lowering fails instead of heap-allocating implicit out-slot storage.

Return behavior:

  • lower_terminator handles Return.
  • For out-slot returns, it copies return bytes from the local storage pointer to the out-slot pointer.
  • For non-out-slot returns, it pushes the value as a normal WASM result.

Release and Lifetime Rules

At every emitted Return path in lower_terminator:

  • If the function reserved a non-zero frame, the backend emits one call to __ruka_rt::shadow_stack_release(frame_bytes) before return.

This gives function-scoped shadow-stack lifetimes:

  • Reserve once in function entry.
  • Reuse slots for all selected locals in that function.
  • Release once on each return path.

Runtime Side Notes

The runtime reserve/release behavior itself is implemented in the wasm32-only runtime module source:

That module currently:

  • lazily allocates one backing region,
  • bumps a shadow-stack pointer on reserve,
  • checks overflow/underflow with assertions,
  • and rewinds the pointer on release.

Ownership Representation

This page describes the ownership representation used by checker, MIR lowering, and both backends today.

The canonical model separates:

  • type identity (BaseTy)
  • access intent at a boundary (AccessMode)
  • runtime/local storage representation (MirLocalRepr, MirHeapOwnership)

That split keeps compatibility decisions in one place while preserving backend- specific lowering details where they belong.

Canonical Ownership Model

Shared ownership modeling lives in ruka_types:

Ty remains the semantic type carried across the compiler, but compatibility and boundary logic are expressed in terms of normalized ownership data.

Compatibility Decisions

Compatibility/coercion decisions are centralized in ruka_types:

Policy fields are independent so decisions can express combinations:

  • check (CheckPolicy) answers whether runtime validation is required
  • materialization (MaterializationPolicy) answers whether representation bridging is required

Current runtime-check categories:

Current concrete materialization categories:

Checker Usage

Checker call compatibility uses shared boundary coercion decisions. Ownership mode is mapped to normalized access mode at call boundaries, then validated by boundary_coercion_decision.

Primary implementation entry points live in:

  • crates/ruka_check/src/checker_calls.rs

MIR Boundary Usage

MIR lowering uses one boundary plan path for call arguments and normalized projection for parameter locals.

  • Parameter local projection and ownership-mode mapping: crates/ruka_mir_lower/src/lowerer/helpers.rs
  • Call argument planning and compatibility usage: crates/ruka_mir_lower/src/lowerer/call_args.rs

MIR itself exposes boundary helpers so consumers do not duplicate branching:

  • MirParamBinding
    • expects_view, expects_mut_borrow, materializes_view_from_owned, requires_materialization
  • MirCallArgBinding
    • is_borrowed, is_mutable_borrow, is_owned_move, is_owned_copy, requires_deref_read
  • MirAggregateArg
    • is_owned_move, is_owned_copy

Backend ABI Usage

Rust and WASM backends both consume MIR binding helpers rather than re-deriving ownership semantics independently.

Rust:

WASM:

Notes for Contributors

  • Add ownership compatibility behavior in ruka_types coercion APIs first.
  • Prefer MIR binding helper predicates over open-coded mode matching.
  • Keep mdBook pages as overview and workflow guidance; put API detail in rustdoc.