Skip to content

zig-whatwg/url

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WHATWG URL Standard - Zig Implementation

A complete, spec-compliant implementation of the WHATWG URL Standard in Zig.

What's New in v0.2.0

  • WebIDL Type Migration - Full spec compliance with WebIDL USVString (UTF-16) types
  • 🚀 Performance Improvements - URL setters refactored using state override (17% faster for protocol)
  • 🐛 Critical Bug Fixes - Fixed memory corruption in parseWithStateOverride
  • 📦 Updated Dependencies - encoding v0.1.3 with transitive dependency resolution
  • 🔧 Test Helpers - New test_helpers module for easier UTF-8 ↔ UTF-16 conversion

See CHANGELOG.md for complete details.

Features

  • Full URL Parsing - Parse URLs into components following WHATWG spec
  • URL Serialization - Convert URL objects back to strings
  • Host Parsing - Domains, IPv4, IPv6, opaque, empty hosts
  • IDNA Support - Unicode domain names (UTS46)
  • Percent Encoding - All URL-specific encode sets
  • URLSearchParams - Query string manipulation with live binding
  • URL Setters - Modify URL components (protocol, host, port, path, etc.)
  • Origin Calculation - Security-critical origin computation
  • Public Suffix List - Domain security boundaries
  • Blob URL Support - External store integration
  • Relative URL Resolution - Resolve URLs against base URLs
  • 40+ Validation Errors - Comprehensive error reporting

Installation

Using Zig Package Manager

Add to your build.zig.zon:

.dependencies = .{
    .url = .{
        .url = "https://github.com/zig-whatwg/url/archive/refs/tags/v0.2.0.tar.gz",
        .hash = "1220...", // Run `zig fetch --save <url>` to get the hash
    },
},

Note: The hash will be computed when you run zig fetch --save <url>. Replace the hash above with the actual hash after fetching.

Then in your build.zig:

const url = b.dependency("url", .{
    .target = target,
    .optimize = optimize,
});

exe.root_module.addImport("url", url.module("url"));

Local Development

git clone https://github.com/zig-whatwg/url.git
cd url
zig build

Usage

⚠️ Breaking Changes in v0.2.0: The API now uses WebIDL types (UTF-16 strings) for full spec compliance. All method names use underscore case (e.g., get_href() instead of getHref()).

For migration from v0.1.0:

Note: Examples below show the v0.2.0 WebIDL API. For v0.1.0 usage, see the v0.1.0 tag.

Basic URL Parsing

const std = @import("std");
const url_mod = @import("url");
const URL = url_mod.URL;
const infra = @import("infra");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();
    
    // Convert UTF-8 to UTF-16 (WebIDL USVString)
    const url_str = try infra.string.utf8ToUtf16(
        allocator, 
        "https://user:[email protected]:8080/path?query=value#fragment"
    );
    defer allocator.free(url_str);
    
    // Parse URL
    var url = try URL.init(allocator, url_str, null);
    defer url.deinit();
    
    // Access components (returns UTF-16, convert to UTF-8)
    const protocol = try url.get_protocol();
    defer allocator.free(protocol);
    const protocol_utf8 = try infra.string.utf16ToUtf8(allocator, protocol);
    defer allocator.free(protocol_utf8);  // "https:"
    
    const username = try url.get_username();
    defer allocator.free(username);
    // username is UTF-16, convert to UTF-8 for display
    
    const hostname = try url.get_hostname();
    defer allocator.free(hostname);
    // hostname is UTF-16, convert to UTF-8 for display
    
    // Serialize back to string
    const href = try url.get_href();
    defer allocator.free(href);
    
    std.debug.print("URL: {s}\n", .{href});
}

Relative URL Resolution

// Parse with base URL
var url = try URL.init(
    allocator,
    "../other/path",
    "https://example.com/some/path"
);
defer url.deinit();

const href = try url.getHref();
defer allocator.free(href);
// Result: "https://example.com/other/path"

Modifying URLs

const url_str = try infra.string.utf8ToUtf16(allocator, "http://example.com/path");
defer allocator.free(url_str);

var url = try URL.init(allocator, url_str, null);
defer url.deinit();

// Change protocol (pass UTF-16 string)
const proto = try infra.string.utf8ToUtf16(allocator, "https");
defer allocator.free(proto);
try url.set_protocol(proto);

// Update host
const host = try infra.string.utf8ToUtf16(allocator, "newhost.com:9000");
defer allocator.free(host);
try url.set_host(host);

// Change path
const pathname = try infra.string.utf8ToUtf16(allocator, "/new/path");
defer allocator.free(pathname);
try url.set_pathname(pathname);

// Add fragment
const hash = try infra.string.utf8ToUtf16(allocator, "section");
defer allocator.free(hash);
try url.set_hash(hash);

const href = try url.get_href();
defer allocator.free(href);
const href_utf8 = try infra.string.utf16ToUtf8(allocator, href);
defer allocator.free(href_utf8);
// Result: "https://newhost.com:9000/new/path#section"

Tip: For testing, use the test_helpers module for simpler UTF-8 APIs:

const helpers = @import("url").test_helpers;

var url = try helpers.initURL(allocator, "http://example.com/path", null);
defer url.deinit();

try helpers.setProtocol(&url, allocator, "https");
const href = try helpers.getHref(&url, allocator);
defer allocator.free(href);

URLSearchParams

var url = try URL.init(allocator, "https://example.com/?a=1&b=2", null);
defer url.deinit();

// Access search params
const params = url.getSearchParams();

// Get values
if (try params.get(allocator, "a")) |value| {
    defer allocator.free(value);
    std.debug.print("a = {s}\n", .{value});  // "1"
}

// Add new param
try params.append(allocator, "c", "3");

// Modify existing
try params.set(allocator, "b", "new_value");

// Delete param
try params.delete(allocator, "a", null);

const href = try url.getHref();
defer allocator.free(href);
// Result: "https://example.com/?b=new_value&c=3"

Static Methods

// Check if URL can be parsed
const can_parse = URL.call_canParse(allocator, "https://example.com/", null);
if (can_parse) {
    std.debug.print("Valid URL!\n", .{});
}

// Parse without throwing errors
const maybe_url = URL.call_parse(allocator, "maybe-invalid", null);
if (maybe_url) |url_ptr| {
    defer allocator.destroy(url_ptr);
    defer url_ptr.deinit();
    
    const href = try url_ptr.getHref();
    defer allocator.free(href);
    std.debug.print("Parsed: {s}\n", .{href});
} else {
    std.debug.print("Invalid URL\n", .{});
}

Origin Calculation

var url = try URL.init(allocator, "https://example.com:443/path", null);
defer url.deinit();

const origin = try url.getOrigin();
defer allocator.free(origin);

std.debug.print("Origin: {s}\n", .{origin});  // "https://example.com"

Spec Compliance

This implementation follows the WHATWG URL Standard specification precisely:

  • ✅ 100% WHATWG URL Standard compliance
  • ✅ 240/240 tests passing (100% pass rate)
  • ✅ 100% IDNA conformance (6,391/6,391 UTS46 tests)
  • ✅ Zero memory leaks (verified with std.testing.allocator and stress testing)
  • ✅ Optimized implementation: State override + browser patterns
  • ✅ Complete spec documentation in specs/url.md and specs/url.idl

Validation Rules

All URL parsing follows spec validation rules:

  • Scheme validation (ASCII alpha + alphanumeric/+/-/.)
  • Special scheme transitions (http ↔ https allowed, http ↔ mailto disallowed)
  • Host validation (domains, IPv4, IPv6, empty, opaque)
  • Port validation (0-65535)
  • Path normalization (./ and ../ resolution)
  • Percent encoding (8 different encode sets)

Dependencies

This library depends on other WHATWG Zig implementations:

  • infra - WHATWG Infra Standard primitives
  • encoding - WHATWG Encoding Standard
  • webidl - WebIDL types for Zig

All dependencies are fetched automatically via Zig's package manager.

API Reference

See FEATURE_CATALOG.md for complete API documentation.

URL Class

pub const URL = struct {
    // Constructor (v0.2.0: uses webidl.USVString = UTF-16)
    pub fn init(allocator: Allocator, url: webidl.USVString, base: ?webidl.USVString) !URL
    pub fn deinit(self: *URL) void
    
    // Static methods
    pub fn call_parse(allocator: Allocator, url: webidl.USVString, base: ?webidl.USVString) ?*URL
    pub fn call_canParse(allocator: Allocator, url: webidl.USVString, base: ?webidl.USVString) bool
    
    // Getters (return owned UTF-16 strings - must be freed)
    pub fn get_href(self: *const URL) !webidl.USVString
    pub fn get_origin(self: *const URL) !webidl.USVString
    pub fn get_protocol(self: *const URL) !webidl.USVString
    pub fn get_host(self: *const URL) !webidl.USVString
    pub fn getHostname(self: *const URL) ![]const u8
    pub fn getPort(self: *const URL) ![]const u8
    pub fn getPathname(self: *const URL) ![]const u8
    pub fn getSearch(self: *const URL) ![]const u8
    pub fn getHash(self: *const URL) ![]const u8
    
    // Getters (return borrowed strings - no free needed)
    pub fn getUsername(self: *const URL) []const u8
    pub fn getPassword(self: *const URL) []const u8
    pub fn getSearchParams(self: *const URL) *URLSearchParams
    
    // Setters
    pub fn set_href(self: *URL, href: webidl.USVString) !void
    pub fn set_protocol(self: *URL, protocol: webidl.USVString) !void
    pub fn setUsername(self: *URL, username: []const u8) !void
    pub fn setPassword(self: *URL, password: []const u8) !void
    pub fn setHost(self: *URL, host: []const u8) !void
    pub fn setHostname(self: *URL, hostname: []const u8) !void
    pub fn setPort(self: *URL, port: []const u8) !void
    pub fn setPathname(self: *URL, pathname: []const u8) !void
    pub fn setSearch(self: *URL, search: []const u8) !void
    pub fn setHash(self: *URL, hash: []const u8) !void
    
    // Serialization
    pub fn call_toJSON(self: *const URL) ![]const u8
};

URLSearchParams Class

pub const URLSearchParams = struct {
    pub fn append(self: *URLSearchParams, allocator: Allocator, name: []const u8, value: []const u8) !void
    pub fn delete(self: *URLSearchParams, allocator: Allocator, name: []const u8, value: ?[]const u8) !void
    pub fn get(self: *const URLSearchParams, allocator: Allocator, name: []const u8) !?[]const u8
    pub fn getAll(self: *const URLSearchParams, allocator: Allocator, name: []const u8) ![][]const u8
    pub fn has(self: *const URLSearchParams, name: []const u8, value: ?[]const u8) bool
    pub fn set(self: *URLSearchParams, allocator: Allocator, name: []const u8, value: []const u8) !void
    pub fn sort(self: *URLSearchParams) void
    pub fn size(self: *const URLSearchParams) usize
    // ... iterator support
};

Memory Management

This library uses standard Zig memory management patterns:

// URLs must be deinitialized
var url = try URL.init(allocator, "https://example.com/", null);
defer url.deinit();  // IMPORTANT: Always deinit

// Getters that return []const u8 allocate - must be freed
const href = try url.getHref();
defer allocator.free(href);  // IMPORTANT: Free returned strings

// Some getters return borrowed strings - no free needed
const username = url.getUsername();  // Borrows from URL's internal buffer
// No free needed for username

Memory Safety

  • All tests pass with std.testing.allocator (leak detection)
  • Comprehensive 2-minute memory leak stress test
  • No global state - everything takes an allocator
  • Proper defer usage throughout
  • Zero tolerance for memory leaks

Documentation

Comprehensive documentation is available:

  • FEATURE_CATALOG.md - Complete API reference with examples
  • CHANGELOG.md - Version history and release notes
  • CONTRIBUTING.md - Development and contribution guidelines
  • benchmarks/README.md - Benchmarking guide and results
  • docs/development/ - Implementation guides and plans
    • MEMORY_LEAK_TEST.md - Memory testing documentation
    • IDNA_IMPLEMENTATION_GUIDE.md - IDNA implementation details
    • PERFORMANCE_OPTIMIZATION_PLAN.md - Optimization strategies
    • And more...
  • docs/archive/ - Historical completion reports

Testing

Run the test suite:

# Run all tests
zig build test

# Run memory leak test (2-minute stress test)
zig build memory-test

Current test coverage:

  • 232/232 tests passing (100% pass rate)
    • 213 URL parsing and setter tests
    • 9 PSL integration tests
    • 10 additional validation tests
  • URL parsing (all schemes, relative URLs, edge cases)
  • URL setters (all 8 setters with validation)
  • URLSearchParams (all methods)
  • Public Suffix List operations
  • Static methods (parse, canParse)
  • Error handling and validation

Memory Leak Testing

Comprehensive memory leak test that validates memory safety under long-lived process conditions:

  • Duration: 2 minutes continuous operation
  • Iterations: ~120,000+ create/destroy cycles
  • Coverage: All public APIs (URL, URLSearchParams, host parsing, percent encoding, etc.)
  • Allocator: GeneralPurposeAllocator (mimics production usage)
  • Measurement: OS-level RSS tracking (macOS/Linux)
  • Detection: Automatic leak detection via GPA

See docs/development/MEMORY_LEAK_TEST.md for details.

Performance

This implementation uses browser-proven patterns:

  • Offset-based storage - Single allocation for URL string, offsets for components
  • Lazy parsing - Components extracted on-demand
  • Minimal allocations - Reuses buffers where possible
  • SIMD-optimized - Uses infra library's SIMD string operations

Benchmarks

Run performance benchmarks:

zig build bench

Comprehensive benchmark suite covering:

  • URL parsing (simple, complex, IPv6, Unicode)
  • Host parsing (domain, IPv4, IPv6, IDNA)
  • Percent encoding/decoding
  • URLSearchParams operations
  • URL serialization
  • URL setters

See benchmarks/README.md for detailed results and analysis.

Browser Compatibility

The hybrid implementation approach matches browser implementations:

  • Chrome Blink - KURL class uses offset-based storage
  • Firefox Gecko - nsStandardURL uses URL segments
  • WebKit - URL class uses component offsets

Contributing

See CONTRIBUTING.md for development guidelines.

License

MIT License - see LICENSE file for details.

Acknowledgments

Links


Built with Zig - https://ziglang.org/

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages