Skip to content

zig-whatwg/dom

Repository files navigation

DOM - WHATWG DOM Implementation in Zig

Zig Tests WPT Coverage DOM Coverage License

A production-ready implementation of the WHATWG DOM Standard in Zig, designed for headless browsers and JavaScript engines.

Features

  • 100% WebIDL Compliant - All implemented APIs match official WebIDL specifications exactly
  • WebKit-Style Memory Management - Reference counting with weak parent pointers
  • Zero Memory Leaks - Verified by comprehensive test suite
  • Highly Optimized - O(1) querySelector for IDs, tags, and classes (~5-15ns). Cross-browser benchmarks included
  • Fully Extensible - Clean prototype chain with vtable injection and factory patterns for HTML/XML extensions
  • Comptime Event Target Mixin - Reusable EventTarget pattern with zero overhead for any type
  • Production Ready - Extensively tested and documented
  • JavaScript Bindings Ready - See JS_BINDINGS.md for integration guide

Extensibility

This is a generic DOM library that can be extended for HTML, XML, or custom document types:

// Extend Element with custom behavior via vtable
const html_vtable = NodeVTable{
    .deinit = htmlDeinit,
    // ... custom implementations
};

// Inject factory so Document.createElement returns your custom type
const factories = Document.FactoryConfig{
    .element_factory = HTMLElement.createForDocument,
};
const doc = try Document.initWithFactories(allocator, factories);

// Now createElement returns HTMLElement!
const elem = try doc.createElement("div");

See EXTENSIBILITY.md for complete guide

Quick Start

Installation

Add to your build.zig.zon:

.dependencies = .{
    .dom = .{
        .url = "https://github.com/yourusername/dom/archive/refs/tags/v0.1.0.tar.gz",
        .hash = "...", // Run zig fetch to get hash
    },
},

Basic Usage

const std = @import("std");
const dom = @import("dom");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // Create a document
    const doc = try dom.Document.init(allocator);
    defer doc.release();

    // Create elements
    const html = try doc.createElement("html");
    defer html.node.release();

    const body = try doc.createElement("body");
    defer body.node.release();

    const div = try doc.createElement("div");
    defer div.node.release();

    // Set attributes
    try div.setId("main-content");
    try div.setClassName("container active");

    // Create text node
    const text = try doc.createTextNode("Hello, World!");
    defer text.node.release();

    // Access properties
    std.debug.print("Element: {s}\n", .{div.tag_name});
    std.debug.print("ID: {s}\n", .{div.getId() orelse ""});
    std.debug.print("Classes: {s}\n", .{div.getClassName()});

    // Check attributes
    if (div.hasAttributes()) {
        const names = try div.getAttributeNames(allocator);
        defer allocator.free(names);
        for (names) |name| {
            std.debug.print("Attribute: {s}\n", .{name});
        }
    }
}

Working with NodeLists

// Get live collection of children
const children = node.childNodes();

std.debug.print("Children: {}\n", .{children.length()});

for (0..children.length()) |i| {
    if (children.item(i)) |child| {
        std.debug.print("Child {}: {s}\n", .{i, child.nodeName()});
    }
}

Event Listeners

const MyContext = struct {
    count: usize = 0,
};

fn handleClick(ctx: *anyopaque) void {
    const context: *MyContext = @ptrCast(@alignCast(ctx));
    context.count += 1;
    std.debug.print("Clicked {} times\n", .{context.count});
}

var context = MyContext{};

// Add event listener
try element.node.addEventListener(
    "click",
    handleClick,
    @ptrCast(&context),
    false, // capture
    false, // once
    false, // passive
);

// Remove event listener
element.node.removeEventListener("click", handleClick, false);

API Reference

Node

Core node type with reference counting and tree structure.

pub const Node = struct {
    // Type information
    pub fn nodeName(self: *const Node) []const u8;
    pub fn nodeValue(self: *const Node) ?[]const u8;
    pub fn setNodeValue(self: *Node, value: []const u8) !void;
    
    // Tree queries
    pub fn getOwnerDocument(self: *const Node) ?*Document;
    pub fn parentElement(self: *const Node) ?*Element;
    pub fn childNodes(self: *Node) NodeList;
    pub fn hasChildNodes(self: *const Node) bool;
    
    // Memory management
    pub fn acquire(self: *Node) void;
    pub fn release(self: *Node) void;
    
    // Event listeners
    pub fn addEventListener(...) !void;
    pub fn removeEventListener(...) void;
    pub fn hasEventListeners(self: *const Node, event_type: []const u8) bool;
};

Element

Element nodes with attribute support.

pub const Element = struct {
    node: Node, // Base node (must be first)
    tag_name: []const u8,
    
    // Attributes
    pub fn getAttribute(self: *const Element, name: []const u8) ?[]const u8;
    pub fn setAttribute(self: *Element, name: []const u8, value: []const u8) !void;
    pub fn removeAttribute(self: *Element, name: []const u8) void;
    pub fn hasAttribute(self: *const Element, name: []const u8) bool;
    pub fn hasAttributes(self: *const Element) bool;
    pub fn getAttributeNames(self: *const Element, allocator: Allocator) ![][]const u8;
    
    // Convenience properties
    pub fn getId(self: *const Element) ?[]const u8;
    pub fn setId(self: *Element, value: []const u8) !void;
    pub fn getClassName(self: *const Element) []const u8;
    pub fn setClassName(self: *Element, value: []const u8) !void;
    
    // Class operations
    pub fn hasClass(self: *const Element, class_name: []const u8) bool;
};

Document

Document root with factory methods.

pub const Document = struct {
    node: Node, // Base node (must be first)
    
    // Factory methods
    pub fn createElement(self: *Document, tag_name: []const u8) !*Element;
    pub fn createTextNode(self: *Document, data: []const u8) !*Text;
    pub fn createComment(self: *Document, data: []const u8) !*Comment;
    
    // Document properties
    pub fn documentElement(self: *const Document) ?*Element;
    
    // Memory management
    pub fn acquire(self: *Document) void;
    pub fn release(self: *Document) void;
};

Text and Comment

Character data nodes.

pub const Text = struct {
    node: Node,
    data: []const u8,
    
    pub fn substringData(self: *const Text, offset: usize, count: usize) ![]const u8;
    pub fn appendData(self: *Text, data: []const u8) !void;
    pub fn insertData(self: *Text, offset: usize, data: []const u8) !void;
    pub fn deleteData(self: *Text, offset: usize, count: usize) !void;
    pub fn replaceData(self: *Text, offset: usize, count: usize, data: []const u8) !void;
};

pub const Comment = struct {
    node: Node,
    data: []const u8,
    // Same methods as Text
};

NodeList

Live collection of nodes.

pub const NodeList = struct {
    pub fn length(self: *const NodeList) usize;
    pub fn item(self: *const NodeList, index: usize) ?*Node;
};

Memory Management

This library uses reference counting for memory management. All nodes start with ref_count = 1.

Rules

  1. Caller owns initial reference - Must call release() when done
  2. Use defer for cleanup - Ensures release even on error
  3. Acquire before sharing - Call acquire() when storing additional references
  4. One release per acquire/init - Balance all reference count operations

Example

// Creating nodes
const doc = try dom.Document.init(allocator);
defer doc.release(); // REQUIRED

const elem = try doc.createElement("div");
defer elem.node.release(); // REQUIRED

// Sharing ownership
elem.node.acquire(); // Increment ref_count
other_container.node = elem.node;
// Both must call release()

WHATWG DOM Compliance

This implementation strictly follows the official WHATWG DOM specification:

Compliance Approach

Every implementation follows a dual specification approach:

  1. WebIDL - Exact method signatures, return types, and parameter types
  2. WHATWG Prose - Algorithms, behavior, and edge cases

All implemented APIs are verified against both sources to ensure 100% compliance.

Phase 1 - Core Nodes (Complete)

Node Interface - Core tree structure and properties

  • nodeName - Returns node name (tag name, "#text", etc.)
  • nodeValue - Gets node value (text content, comment data, null for elements)
  • nodeValue setter - Sets node value with validation
  • nodeType - Returns numeric node type constant
  • parentNode - Parent node pointer (weak reference)
  • parentElement - Parent element (null if parent is not element)
  • childNodes - Live NodeList of children
  • firstChild - First child node pointer
  • lastChild - Last child node pointer
  • previousSibling - Previous sibling pointer
  • nextSibling - Next sibling pointer
  • ownerDocument - Document that owns this node
  • baseURI - Base URI of the node (placeholder)
  • hasChildNodes() - Boolean check for children
  • getRootNode(composed) - Get root of tree (shadow DOM aware)
  • contains(other) - Check if node is inclusive descendant
  • isSameNode(other) - Identity comparison (legacy but spec-compliant)
  • isEqualNode(other) - Deep structural equality check
  • compareDocumentPosition(other) - Relative position with bitmask flags
  • cloneNode(deep) - Clones node (shallow/deep)
  • lookupPrefix(namespace) - Returns namespace prefix
  • lookupNamespaceURI(prefix) - Returns namespace URI
  • isDefaultNamespace(namespace) - Checks if namespace is default

Element Interface - Element-specific operations

  • tagName - Element tag name
  • localName - Local name (same as tagName for non-namespaced elements)
  • getAttribute(name) - Get attribute value
  • setAttribute(name, value) - Set attribute value
  • removeAttribute(name) - Remove attribute (void return per WebIDL)
  • hasAttribute(name) - Check attribute presence
  • hasAttributes() - Check if any attributes exist
  • getAttributeNames() - Array of attribute names
  • id property (getId/setId) - Convenience for id attribute
  • className property (getClassName/setClassName) - Convenience for class attribute

Document Interface - Document root and factory methods

  • createElement(localName) - Create element with tag name
  • createTextNode(data) - Create text node with content
  • createComment(data) - Create comment node
  • createDocumentFragment() - Create document fragment node
  • documentElement - Root element (typically <html>)
  • doctype - DocumentType node (placeholder)

CharacterData Interface - Text and comment data manipulation

  • data - Character data content
  • length - Data length in bytes
  • substringData(offset, count) - Extract substring
  • appendData(data) - Append to end
  • insertData(offset, data) - Insert at position
  • deleteData(offset, count) - Delete range
  • replaceData(offset, count, data) - Replace range

Text Interface - Text node operations

  • wholeText - Concatenated text of contiguous text nodes

DocumentFragment Interface - Lightweight container

  • createDocumentFragment() - Factory method
  • ✅ Node operations (appendChild, insertBefore, etc.)
  • ✅ Clone operations (shallow and deep)

NodeList Interface - Live collection

  • length - Number of nodes in list
  • item(index) - Get node at index

EventTarget Interface - Event handling

  • addEventListener(type, callback, capture, once, passive) - Add listener
  • removeEventListener(type, callback, capture) - Remove listener (void return per WebIDL)
  • dispatchEvent(event) - Synchronous event dispatching with listener invocation

Event Interface - Event objects

  • Event.init(type, options) - Create event with type and options
  • preventDefault() - Cancel default action (respects passive listeners)
  • stopPropagation() - Stop event propagation
  • stopImmediatePropagation() - Stop remaining listeners
  • ✅ Event state: bubbles, cancelable, composed, isTrusted, eventPhase

Phase 2 - Tree Manipulation ✅ Complete!

Node Interface - Tree modification operations:

  • appendChild(node) - Append child to parent
  • insertBefore(node, child) - Insert node before reference child
  • removeChild(child) - Remove child from parent
  • replaceChild(node, child) - Replace child with new node
  • textContent property - Get/set descendant text content
  • normalize() - Remove empty text nodes and merge adjacent text nodes

Phase 2 Complete: All tree manipulation APIs implemented per WHATWG DOM §4.4!

Phase 3 - AbortController & AbortSignal ✅ Complete!

AbortController Interface - Operation cancellation controller:

  • AbortController() constructor - Create new controller
  • signal property - Associated AbortSignal (SameObject)
  • abort(reason) - Abort with optional reason

AbortSignal Interface - Abort signaling and composition:

  • abort(reason) - Static factory for pre-aborted signal
  • any(signals) - Composite signal from multiple sources
  • aborted property - Check if signal is aborted
  • reason property - Get abort reason (any type)
  • throwIfAborted() - Throw if aborted
  • ✅ EventTarget integration - Auto-remove listeners on abort

Phase 3 Complete: All core AbortSignal APIs implemented per WHATWG DOM §3.1-§3.2!

Test Coverage: 62 tests covering all features, edge cases, and memory safety.

Compliance: 98% (A+) - See summaries/analysis/ABORTSIGNAL_FINAL_COMPLIANCE_AUDIT.md for full audit.

Deferred Features (HTML Integration Required)

The following WHATWG DOM features require HTML Standard infrastructure:

AbortSignal.timeout(milliseconds) - Timer-based auto-abort

  • Status: Deferred to Phase N (HTML Integration)
  • Reason: Requires HTML timer task source, event loop, and global task queue
  • Priority: P2 - Medium
  • Workaround: Users can implement custom timeout via async/threads:
    const controller = try AbortController.init(allocator);
    defer controller.deinit();
    
    // In timer thread/async context:
    const timeout_error = try DOMException.create(
        allocator, "TimeoutError", "Operation timed out"
    );
    try controller.abort(@ptrCast(timeout_error));

AbortSignal.onabort - EventHandler attribute

  • Status: Deferred to Phase N (HTML Integration)
  • Reason: EventHandler is HTML Standard type requiring special processing
  • Priority: P3 - Low (syntactic sugar)
  • Workaround: Use addEventListener("abort", ...) directly (functionally identical):
    try signal.addEventListener("abort", handler, ctx, false, false, false, null);

Phase 4 - ParentNode & ChildNode Mixins ✅ Complete!

Implemented APIs:

  • children - Live HTMLCollection of element children
  • firstElementChild, lastElementChild - Element-only child accessors
  • previousElementSibling, nextElementSibling - Element-only sibling accessors
  • childElementCount - Count of element children
  • prepend(...nodes), append(...nodes) - Insert nodes at start/end
  • replaceChildren(...nodes) - Replace all children
  • before(...nodes), after(...nodes) - Insert nodes before/after this node
  • replaceWith(...nodes) - Replace this node with others
  • remove() - Remove this node from parent
  • querySelector(selectors) - Find first matching element
  • querySelectorAll(selectors) - Find all matching elements
  • matches(selectors) - Check if element matches selector
  • closest(selectors) - Find closest ancestor matching selector

Phase 4 Complete: All modern DOM convenience methods implemented per WHATWG DOM §4.2.6-§4.2.8!

Phase 5 - CharacterData & DOMTokenList ✅ Complete!

Implemented APIs:

  • CharacterData module - Shared string manipulation functions for Text/Comment
    • substringData(), appendData(), insertData(), deleteData(), replaceData()
  • Text.splitText() - Split text node at offset (already existed)
  • DOMTokenList - Full spec-compliant token list manipulation
    • add(), remove(), toggle(), replace(), contains()
    • length, item(), value, Live collection behavior
  • Element.classList - DOMTokenList wrapper for class attribute
    • Modern class manipulation API
    • Space-separated token handling with validation

Phase 5 Complete: CharacterData interface and classList fully implemented per WHATWG DOM!

Phase 6 - Advanced Features

Completed:

  • MutationObserver - Observe DOM mutations (Phase 17)
    • Full WHATWG DOM §4.3 compliance
    • childList, attributes, characterData observation
    • Subtree observation, attribute filters, old value tracking
    • 24 comprehensive tests, zero memory leaks

Planned APIs:

  • DocumentType - Document type interface
  • Range - Text selection and manipulation
  • TreeWalker / NodeIterator - Advanced tree traversal

WebIDL Compliance Notes

Return Types:

  • WebIDL undefined → Zig void (not bool)
  • WebIDL DOMString → Zig []const u8
  • WebIDL Node? → Zig ?*Node
  • WebIDL boolean → Zig bool
  • WebIDL unsigned long → Zig u32

Live Collections:

  • NodeList reflects DOM changes immediately (no caching)
  • [SameObject] attributes return consistent view (not new allocation)

Errors:

  • DOM exceptions map to Zig error types
  • HierarchyRequestError, InvalidCharacterError, etc.

See summaries/analysis/PHASE1_DEEP_COMPLIANCE_ANALYSIS.md for comprehensive compliance verification (88% complete, A+ rating).

Testing

Test Suite

657 tests passing with zero memory leaks:

  • 481 unit tests - Comprehensive feature coverage
  • 176 WPT tests - Web Platform Tests validation
# All tests (unit + WPT)
zig build test

# Unit tests only
zig build test-unit

# WPT tests only
zig build test-wpt

# With summary
zig build test --summary all

# Specific optimization
zig build test -Doptimize=ReleaseFast

WPT Compliance

📊 WPT Coverage: 23.4% of core DOM tests (37/158)

This library validates against the official Web Platform Tests (WPT) from the WHATWG DOM Standard.

Coverage breakdown:

  • 🎯 Core DOM (nodes/): 23.4% (37/158 root tests) - Node, Element, Document operations
  • 📁 All nodes/ (with subdirs): 10.9% (37/339 total tests)
  • 🌐 Full WHATWG DOM: 5.9% (37/623 tests) - Includes events, ranges, traversal, etc.

What's implemented (31 tests):

  • Core Node operations: appendChild, insertBefore, removeChild, replaceChild
  • Tree queries: childNodes, parentNode, cloneNode, contains, compareDocumentPosition
  • Element basics: attributes (get/set/remove/has), tagName, id, className
  • Document factories: createElement, createTextNode, createComment, createDocumentFragment
  • CharacterData: Text and Comment data manipulation (append, delete, substring)

What's NOT yet implemented:

  • 127 more tests in nodes/ (ParentNode/ChildNode mixins, selectors, more edge cases)
  • 175 tests in events/ (event propagation, custom events, etc.)
  • 45 tests in ranges/ (Range API for text selection)
  • 28 tests in traversal/ (TreeWalker, NodeIterator)
  • Plus: abort/, collections/, lists/ directories

See tests/wpt/COVERAGE.md for detailed analysis, WPT directory breakdown, and roadmap.

Performance

This implementation includes aggressive optimizations making it competitive with browser engines:

Benchmarks

Cross-platform benchmark suite comparing Zig with Chromium, Firefox, and WebKit:

# First time setup (installs Playwright)
./benchmarks/setup.sh

# Run all benchmarks
zig build benchmark-all -Doptimize=ReleaseFast

Opens an interactive HTML report showing performance across all implementations.

Query Optimizations

  • O(1) getElementById - Hash map lookup (~5ns)
  • O(1) querySelector("#id") - Direct hash map via fast path (~15ns)
  • O(1) querySelector("tag") - Tag map lookup via fast path (~15ns)
  • O(1) querySelector(".class") - Class map lookup via fast path (~15ns)
  • O(k) getElementsByTagName - Returns only matching elements (~7µs for 500 elements)
  • O(k) getElementsByClassName - Returns only matching elements (~7µs for 500 elements)

See benchmarks/README.md for detailed benchmarking guide.

Documentation

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

Development Workflow

  1. Check WebIDL first - Verify exact interface signatures at /Users/bcardarella/projects/webref/ed/idl/dom.idl
  2. Read WHATWG spec - Understand algorithm behavior from https://dom.spec.whatwg.org/
  3. Write tests first - Test both signature and behavior
  4. Implement with dual references - Document WebIDL + WHATWG prose in code
  5. Verify compliance - All tests pass, no memory leaks

See AGENTS.md for comprehensive development guidelines.

License

MIT License - see LICENSE file for details.

Credits

Developed by Brian Cardarella (@bcardarella)

Inspired by:

  • WebKit - Reference counting and weak pointer patterns
  • Chromium/Blink - NodeRareData pattern and optimization techniques
  • Servo - Rust DOM implementation architecture
  • WHATWG - DOM Standard specification

Links


Status: Phase 3 Complete ✅ | Production Ready | WebIDL Compliant

See summaries/plans/IMPLEMENTATION_STATUS.md for detailed roadmap.

JavaScript Bindings

This library is designed for integration with JavaScript engines. See the comprehensive JavaScript Bindings Guide for details on:

  • Property vs Function Design - Why DOM properties are functions in Zig
  • Type Mappings - WebIDL → Zig → JavaScript conversions
  • Memory Management - Reference counting and garbage collection integration
  • Error Handling - Converting Zig errors to DOMException
  • Complete Examples - Full bindings implementation code

Quick Summary:

In Zig, DOM properties are implemented as functions for performance and memory efficiency:

// Zig API (explicit function calls)
const name = node.nodeName();      // Computed via vtable
const uri = node.baseURI();        // Computed property
const parent = node.parent_node;   // Stored field (direct access)

JavaScript bindings should expose these as properties:

// JavaScript API (looks like properties)
const name = node.nodeName;        // Calls Zig function behind the scenes
const uri = node.baseURI;          // Calls Zig function
const parent = node.parentNode;    // Direct field access

This design keeps Node size at exactly 96 bytes while maintaining full spec compliance. See JS_BINDINGS.md for complete details and implementation examples.