A complete, spec-compliant implementation of the WHATWG URL Standard in Zig.
- ✨ WebIDL Type Migration - Full spec compliance with WebIDL
USVString(UTF-16) types - 🚀 Performance Improvements - URL setters refactored using state override (17% faster for protocol)
- 🐛 Critical Bug Fixes - Fixed memory corruption in
parseWithStateOverride - 📦 Updated Dependencies - encoding v0.1.3 with transitive dependency resolution
- 🔧 Test Helpers - New
test_helpersmodule for easier UTF-8 ↔ UTF-16 conversion
See CHANGELOG.md for complete details.
- ✅ Full URL Parsing - Parse URLs into components following WHATWG spec
- ✅ URL Serialization - Convert URL objects back to strings
- ✅ Host Parsing - Domains, IPv4, IPv6, opaque, empty hosts
- ✅ IDNA Support - Unicode domain names (UTS46)
- ✅ Percent Encoding - All URL-specific encode sets
- ✅ URLSearchParams - Query string manipulation with live binding
- ✅ URL Setters - Modify URL components (protocol, host, port, path, etc.)
- ✅ Origin Calculation - Security-critical origin computation
- ✅ Public Suffix List - Domain security boundaries
- ✅ Blob URL Support - External store integration
- ✅ Relative URL Resolution - Resolve URLs against base URLs
- ✅ 40+ Validation Errors - Comprehensive error reporting
Add to your build.zig.zon:
.dependencies = .{
.url = .{
.url = "https://github.com/zig-whatwg/url/archive/refs/tags/v0.2.0.tar.gz",
.hash = "1220...", // Run `zig fetch --save <url>` to get the hash
},
},Note: The hash will be computed when you run
zig fetch --save <url>. Replace the hash above with the actual hash after fetching.
Then in your build.zig:
const url = b.dependency("url", .{
.target = target,
.optimize = optimize,
});
exe.root_module.addImport("url", url.module("url"));git clone https://github.com/zig-whatwg/url.git
cd url
zig build
⚠️ Breaking Changes in v0.2.0: The API now uses WebIDL types (UTF-16 strings) for full spec compliance. All method names use underscore case (e.g.,get_href()instead ofgetHref()).For migration from v0.1.0:
- See CHANGELOG.md for complete list of changes
- See MIGRATION_GUIDE_v0.2.0.md for step-by-step migration
- Use the
test_helpersmodule for easier UTF-8 ↔ UTF-16 conversion in testsNote: Examples below show the v0.2.0 WebIDL API. For v0.1.0 usage, see the v0.1.0 tag.
const std = @import("std");
const url_mod = @import("url");
const URL = url_mod.URL;
const infra = @import("infra");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// Convert UTF-8 to UTF-16 (WebIDL USVString)
const url_str = try infra.string.utf8ToUtf16(
allocator,
"https://user:[email protected]:8080/path?query=value#fragment"
);
defer allocator.free(url_str);
// Parse URL
var url = try URL.init(allocator, url_str, null);
defer url.deinit();
// Access components (returns UTF-16, convert to UTF-8)
const protocol = try url.get_protocol();
defer allocator.free(protocol);
const protocol_utf8 = try infra.string.utf16ToUtf8(allocator, protocol);
defer allocator.free(protocol_utf8); // "https:"
const username = try url.get_username();
defer allocator.free(username);
// username is UTF-16, convert to UTF-8 for display
const hostname = try url.get_hostname();
defer allocator.free(hostname);
// hostname is UTF-16, convert to UTF-8 for display
// Serialize back to string
const href = try url.get_href();
defer allocator.free(href);
std.debug.print("URL: {s}\n", .{href});
}// Parse with base URL
var url = try URL.init(
allocator,
"../other/path",
"https://example.com/some/path"
);
defer url.deinit();
const href = try url.getHref();
defer allocator.free(href);
// Result: "https://example.com/other/path"const url_str = try infra.string.utf8ToUtf16(allocator, "http://example.com/path");
defer allocator.free(url_str);
var url = try URL.init(allocator, url_str, null);
defer url.deinit();
// Change protocol (pass UTF-16 string)
const proto = try infra.string.utf8ToUtf16(allocator, "https");
defer allocator.free(proto);
try url.set_protocol(proto);
// Update host
const host = try infra.string.utf8ToUtf16(allocator, "newhost.com:9000");
defer allocator.free(host);
try url.set_host(host);
// Change path
const pathname = try infra.string.utf8ToUtf16(allocator, "/new/path");
defer allocator.free(pathname);
try url.set_pathname(pathname);
// Add fragment
const hash = try infra.string.utf8ToUtf16(allocator, "section");
defer allocator.free(hash);
try url.set_hash(hash);
const href = try url.get_href();
defer allocator.free(href);
const href_utf8 = try infra.string.utf16ToUtf8(allocator, href);
defer allocator.free(href_utf8);
// Result: "https://newhost.com:9000/new/path#section"Tip: For testing, use the test_helpers module for simpler UTF-8 APIs:
const helpers = @import("url").test_helpers;
var url = try helpers.initURL(allocator, "http://example.com/path", null);
defer url.deinit();
try helpers.setProtocol(&url, allocator, "https");
const href = try helpers.getHref(&url, allocator);
defer allocator.free(href);var url = try URL.init(allocator, "https://example.com/?a=1&b=2", null);
defer url.deinit();
// Access search params
const params = url.getSearchParams();
// Get values
if (try params.get(allocator, "a")) |value| {
defer allocator.free(value);
std.debug.print("a = {s}\n", .{value}); // "1"
}
// Add new param
try params.append(allocator, "c", "3");
// Modify existing
try params.set(allocator, "b", "new_value");
// Delete param
try params.delete(allocator, "a", null);
const href = try url.getHref();
defer allocator.free(href);
// Result: "https://example.com/?b=new_value&c=3"// Check if URL can be parsed
const can_parse = URL.call_canParse(allocator, "https://example.com/", null);
if (can_parse) {
std.debug.print("Valid URL!\n", .{});
}
// Parse without throwing errors
const maybe_url = URL.call_parse(allocator, "maybe-invalid", null);
if (maybe_url) |url_ptr| {
defer allocator.destroy(url_ptr);
defer url_ptr.deinit();
const href = try url_ptr.getHref();
defer allocator.free(href);
std.debug.print("Parsed: {s}\n", .{href});
} else {
std.debug.print("Invalid URL\n", .{});
}var url = try URL.init(allocator, "https://example.com:443/path", null);
defer url.deinit();
const origin = try url.getOrigin();
defer allocator.free(origin);
std.debug.print("Origin: {s}\n", .{origin}); // "https://example.com"This implementation follows the WHATWG URL Standard specification precisely:
- ✅ 100% WHATWG URL Standard compliance
- ✅ 240/240 tests passing (100% pass rate)
- ✅ 100% IDNA conformance (6,391/6,391 UTS46 tests)
- ✅ Zero memory leaks (verified with
std.testing.allocatorand stress testing) - ✅ Optimized implementation: State override + browser patterns
- ✅ Complete spec documentation in
specs/url.mdandspecs/url.idl
All URL parsing follows spec validation rules:
- Scheme validation (ASCII alpha + alphanumeric/+/-/.)
- Special scheme transitions (http ↔ https allowed, http ↔ mailto disallowed)
- Host validation (domains, IPv4, IPv6, empty, opaque)
- Port validation (0-65535)
- Path normalization (./ and ../ resolution)
- Percent encoding (8 different encode sets)
This library depends on other WHATWG Zig implementations:
infra- WHATWG Infra Standard primitivesencoding- WHATWG Encoding Standardwebidl- WebIDL types for Zig
All dependencies are fetched automatically via Zig's package manager.
See FEATURE_CATALOG.md for complete API documentation.
pub const URL = struct {
// Constructor (v0.2.0: uses webidl.USVString = UTF-16)
pub fn init(allocator: Allocator, url: webidl.USVString, base: ?webidl.USVString) !URL
pub fn deinit(self: *URL) void
// Static methods
pub fn call_parse(allocator: Allocator, url: webidl.USVString, base: ?webidl.USVString) ?*URL
pub fn call_canParse(allocator: Allocator, url: webidl.USVString, base: ?webidl.USVString) bool
// Getters (return owned UTF-16 strings - must be freed)
pub fn get_href(self: *const URL) !webidl.USVString
pub fn get_origin(self: *const URL) !webidl.USVString
pub fn get_protocol(self: *const URL) !webidl.USVString
pub fn get_host(self: *const URL) !webidl.USVString
pub fn getHostname(self: *const URL) ![]const u8
pub fn getPort(self: *const URL) ![]const u8
pub fn getPathname(self: *const URL) ![]const u8
pub fn getSearch(self: *const URL) ![]const u8
pub fn getHash(self: *const URL) ![]const u8
// Getters (return borrowed strings - no free needed)
pub fn getUsername(self: *const URL) []const u8
pub fn getPassword(self: *const URL) []const u8
pub fn getSearchParams(self: *const URL) *URLSearchParams
// Setters
pub fn set_href(self: *URL, href: webidl.USVString) !void
pub fn set_protocol(self: *URL, protocol: webidl.USVString) !void
pub fn setUsername(self: *URL, username: []const u8) !void
pub fn setPassword(self: *URL, password: []const u8) !void
pub fn setHost(self: *URL, host: []const u8) !void
pub fn setHostname(self: *URL, hostname: []const u8) !void
pub fn setPort(self: *URL, port: []const u8) !void
pub fn setPathname(self: *URL, pathname: []const u8) !void
pub fn setSearch(self: *URL, search: []const u8) !void
pub fn setHash(self: *URL, hash: []const u8) !void
// Serialization
pub fn call_toJSON(self: *const URL) ![]const u8
};pub const URLSearchParams = struct {
pub fn append(self: *URLSearchParams, allocator: Allocator, name: []const u8, value: []const u8) !void
pub fn delete(self: *URLSearchParams, allocator: Allocator, name: []const u8, value: ?[]const u8) !void
pub fn get(self: *const URLSearchParams, allocator: Allocator, name: []const u8) !?[]const u8
pub fn getAll(self: *const URLSearchParams, allocator: Allocator, name: []const u8) ![][]const u8
pub fn has(self: *const URLSearchParams, name: []const u8, value: ?[]const u8) bool
pub fn set(self: *URLSearchParams, allocator: Allocator, name: []const u8, value: []const u8) !void
pub fn sort(self: *URLSearchParams) void
pub fn size(self: *const URLSearchParams) usize
// ... iterator support
};This library uses standard Zig memory management patterns:
// URLs must be deinitialized
var url = try URL.init(allocator, "https://example.com/", null);
defer url.deinit(); // IMPORTANT: Always deinit
// Getters that return []const u8 allocate - must be freed
const href = try url.getHref();
defer allocator.free(href); // IMPORTANT: Free returned strings
// Some getters return borrowed strings - no free needed
const username = url.getUsername(); // Borrows from URL's internal buffer
// No free needed for username- All tests pass with
std.testing.allocator(leak detection) - Comprehensive 2-minute memory leak stress test
- No global state - everything takes an allocator
- Proper
deferusage throughout - Zero tolerance for memory leaks
Comprehensive documentation is available:
- FEATURE_CATALOG.md - Complete API reference with examples
- CHANGELOG.md - Version history and release notes
- CONTRIBUTING.md - Development and contribution guidelines
- benchmarks/README.md - Benchmarking guide and results
- docs/development/ - Implementation guides and plans
- MEMORY_LEAK_TEST.md - Memory testing documentation
- IDNA_IMPLEMENTATION_GUIDE.md - IDNA implementation details
- PERFORMANCE_OPTIMIZATION_PLAN.md - Optimization strategies
- And more...
- docs/archive/ - Historical completion reports
Run the test suite:
# Run all tests
zig build test
# Run memory leak test (2-minute stress test)
zig build memory-testCurrent test coverage:
- 232/232 tests passing (100% pass rate) ✅
- 213 URL parsing and setter tests
- 9 PSL integration tests
- 10 additional validation tests
- URL parsing (all schemes, relative URLs, edge cases)
- URL setters (all 8 setters with validation)
- URLSearchParams (all methods)
- Public Suffix List operations
- Static methods (parse, canParse)
- Error handling and validation
Comprehensive memory leak test that validates memory safety under long-lived process conditions:
- Duration: 2 minutes continuous operation
- Iterations: ~120,000+ create/destroy cycles
- Coverage: All public APIs (URL, URLSearchParams, host parsing, percent encoding, etc.)
- Allocator: GeneralPurposeAllocator (mimics production usage)
- Measurement: OS-level RSS tracking (macOS/Linux)
- Detection: Automatic leak detection via GPA
See docs/development/MEMORY_LEAK_TEST.md for details.
This implementation uses browser-proven patterns:
- Offset-based storage - Single allocation for URL string, offsets for components
- Lazy parsing - Components extracted on-demand
- Minimal allocations - Reuses buffers where possible
- SIMD-optimized - Uses
infralibrary's SIMD string operations
Run performance benchmarks:
zig build benchComprehensive benchmark suite covering:
- URL parsing (simple, complex, IPv6, Unicode)
- Host parsing (domain, IPv4, IPv6, IDNA)
- Percent encoding/decoding
- URLSearchParams operations
- URL serialization
- URL setters
See benchmarks/README.md for detailed results and analysis.
The hybrid implementation approach matches browser implementations:
- Chrome Blink - KURL class uses offset-based storage
- Firefox Gecko - nsStandardURL uses URL segments
- WebKit - URL class uses component offsets
See CONTRIBUTING.md for development guidelines.
MIT License - see LICENSE file for details.
- WHATWG URL Standard specification
- Web Platform Tests test suite
- Browser implementations (Chrome, Firefox, WebKit) for reference
- Specification: https://url.spec.whatwg.org/
- Repository: https://github.com/zig-whatwg/url
- Issues: https://github.com/zig-whatwg/url/issues
- WHATWG: https://whatwg.org/
Built with Zig - https://ziglang.org/