WHATWG Specification Manager

Single script to download and manage all 22 WHATWG specifications, optimized for LLM token consumption.

Quick Start

# Install dependencies
npm install

# Set your Anthropic API key
export ANTHROPIC_API_KEY="your-api-key-here"

# Download and optimize all 22 specs
node specs.js download

# Or use npm scripts
npm run download

# Check what's downloaded
npm run status

# Remove all specs
npm run clean

Commands

Command	Description
`node specs.js download`	Download and optimize all 22 WHATWG specs
`node specs.js status`	Show which specs are downloaded with sizes and token counts
`node specs.js list`	List all 22 available specs with URLs
`node specs.js clean`	Remove all downloaded specs from working directory
`node specs.js help`	Show help message

Or use npm scripts: npm run download, npm run status, npm run list, npm run clean

What It Does

When you run node specs.js download:

✅ Creates temporary working directory in /tmp
✅ Downloads each spec HTML to /tmp (not your working directory)
✅ Converts HTML to markdown in /tmp using pandoc
✅ Uses opencode AI to intelligently optimize while preserving ALL technical content:
- Removes table of contents
- Removes references sections
- Removes acknowledgments / acknowledgements
- Removes intellectual property sections
- Removes licensing / copyright sections
- Removes index sections
- Removes all {#anchor-id} and {.class} metadata
- Removes section numbering [1.2.3]
- Removes base64 images and SVG diagrams
- Removes decorative elements
- Preserves 100% of technical specifications, algorithms, examples, and definitions
✅ Saves optimized <spec>.md to working directory
✅ Automatically cleans up entire /tmp directory

Result: Only clean, optimized .md files with complete technical content in your working directory!

All 22 Specifications (Alphabetical Order)

#	Spec	URL	Description
1	compat	https://compat.spec.whatwg.org/	Compatibility Standard
2	compression	https://compression.spec.whatwg.org/	Compression Standard
3	console	https://console.spec.whatwg.org/	Console Standard
4	cookiestore	https://cookiestore.spec.whatwg.org/	Cookie Store API
5	dom	https://dom.spec.whatwg.org/	DOM Standard
6	encoding	https://encoding.spec.whatwg.org/	Encoding Standard
7	fetch	https://fetch.spec.whatwg.org/	Fetch Standard
8	fs	https://fs.spec.whatwg.org/	File System Standard
9	fullscreen	https://fullscreen.spec.whatwg.org/	Fullscreen API
10	html	https://html.spec.whatwg.org/	HTML Living Standard
11	infra	https://infra.spec.whatwg.org/	Infra Standard
12	mimesniff	https://mimesniff.spec.whatwg.org/	MIME Sniffing Standard
13	notifications	https://notifications.spec.whatwg.org/	Notifications API
14	quirks	https://quirks.spec.whatwg.org/	Quirks Mode Standard
15	storage	https://storage.spec.whatwg.org/	Storage Standard
16	streams	https://streams.spec.whatwg.org/	Streams Standard
17	testutils	https://testutils.spec.whatwg.org/	Test Utils Standard
18	url	https://url.spec.whatwg.org/	URL Standard
19	urlpattern	https://urlpattern.spec.whatwg.org/	URL Pattern Standard
20	webidl	https://webidl.spec.whatwg.org/	Web IDL Standard
21	websockets	https://websockets.spec.whatwg.org/	WebSockets Standard
22	xhr	https://xhr.spec.whatwg.org/	XMLHttpRequest Standard

Token Optimization Results

Individual Specifications (Estimated)

Spec	Original HTML	Optimized MD	Reduction	Tokens (est)
html	~14.0 MB	~5.4 MB	~61%	~1,800,000
webidl	~2.5 MB	~400 KB	~84%	~133,000
streams	~1.8 MB	~375 KB	~79%	~125,000
dom	~2.9 MB	~340 KB	~88%	~113,000
fetch	~1.9 MB	~250 KB	~87%	~83,000
encoding	~450 KB	~105 KB	~77%	~35,000
url	~710 KB	~105 KB	~85%	~35,000
urlpattern	~350 KB	~80 KB	~77%	~27,000
infra	~280 KB	~72 KB	~74%	~24,000
xhr	~340 KB	~70 KB	~79%	~23,000
mimesniff	~260 KB	~53 KB	~80%	~18,000
cookiestore	~240 KB	~50 KB	~79%	~17,000
websockets	~180 KB	~33 KB	~82%	~11,000
storage	~110 KB	~27 KB	~75%	~9,000
quirks	~90 KB	~22 KB	~76%	~7,000
fullscreen	~95 KB	~22 KB	~77%	~7,000
notifications	~90 KB	~21 KB	~77%	~7,000
console	~90 KB	~12 KB	~87%	~4,000
compat	~50 KB	~10 KB	~80%	~3,000
compression	~40 KB	~8 KB	~80%	~3,000
fs	~150 KB	~5 KB	~97%	~2,000
testutils	~10 KB	~1 KB	~90%	~300

Combined Totals

Metric	Before	After	Saved
Total Size	~27.4 MB	~8.5 MB	~18.9 MB (69%)
Total Tokens	~9.1M	~2.8M	~6.3M tokens (69%)

Average reduction: ~70% across all specifications!

Output Files

After running ./specs.sh download, you'll have these 22 files:

compat.md          ~10 KB    ~3K tokens
compression.md     ~8 KB     ~3K tokens
console.md         ~12 KB    ~4K tokens
cookiestore.md     ~50 KB    ~17K tokens
dom.md             ~340 KB   ~113K tokens
encoding.md        ~105 KB   ~35K tokens
fetch.md           ~250 KB   ~83K tokens
fs.md              ~5 KB     ~2K tokens
fullscreen.md      ~22 KB    ~7K tokens
html.md            ~5.4 MB   ~1.8M tokens
infra.md           ~72 KB    ~24K tokens
mimesniff.md       ~53 KB    ~18K tokens
notifications.md   ~21 KB    ~7K tokens
quirks.md          ~22 KB    ~7K tokens
storage.md         ~27 KB    ~9K tokens
streams.md         ~375 KB   ~125K tokens
testutils.md       ~1 KB     ~300 tokens
url.md             ~105 KB   ~35K tokens
urlpattern.md      ~80 KB    ~27K tokens
webidl.md          ~400 KB   ~133K tokens
websockets.md      ~33 KB    ~11K tokens
xhr.md             ~70 KB    ~23K tokens

Total: ~8.5 MB, ~2.8M tokens (down from ~27.4 MB, ~9.1M tokens)

What's Preserved

✅ All specification content:

All technical prose and definitions
All algorithms and processing models
All normative requirements
All code examples and IDL interfaces
Complete section hierarchy
External reference links

✅ 100% specification quality - zero loss of technical content

What's Removed

❌ Non-specification sections:

Table of contents
References sections
Acknowledgments / Acknowledgements
Intellectual property sections
Licensing / Copyright sections
Index sections

❌ Metadata and formatting:

All {#anchor-id} patterns
All {.css-class} attributes
All {x-internal="..."} metadata
Section numbering [1.2.3]
Base64-encoded images
SVG diagrams
Decorative separator lines
Excessive whitespace

Use Cases

For LLM Processing

70% token reduction - fit more specs in context
Lower API costs - pay for fewer tokens
Faster processing - less data to parse
Better context utilization - pure technical content

For Development

Clean references - no metadata clutter
Easy searching - pure specification prose
Version control - efficient diffs
Fast loading - smaller file sizes

For Documentation

Readable markdown - clean formatting
Complete content - all technical details
Portable - standard markdown format
Focused - specification content only

Requirements

curl - for downloading specs
pandoc - for HTML to markdown conversion
node - for running the script
Anthropic API key - for Claude Sonnet 4.5 (1M context window)

Install Dependencies

macOS:

brew install pandoc node
npm install

Ubuntu/Debian:

sudo apt install curl pandoc nodejs npm
npm install

Example Workflow

# List all 22 available specs
node specs.js list

# Download and optimize all specs (takes 5-15 minutes)
node specs.js download

# Your working directory stays clean during the process!
# All temporary work happens in /tmp

# Check what was downloaded with sizes and token counts
node specs.js status

# Use the optimized specs
cat dom.md | head -n 50

# Clean up when done
node specs.js clean

Technical Details

Temporary File Handling

Creates unique temp directory: /tmp/tmp.XXXXXX
All HTML downloads go to /tmp (never your working directory)
All intermediate markdown files stay in /tmp
Automatic cleanup via trap on script exit
Your working directory only receives final optimized .md files

Optimization Pipeline

Download HTML to /tmp
Convert with pandoc in /tmp
Optimize with opencode SDK:
- Uses Claude AI to intelligently analyze the specification
- Removes boilerplate (TOC, references, acknowledgments, metadata)
- Preserves ALL technical content (specs, algorithms, examples, definitions)
- Smart detection of what's essential vs. removable
- Context-aware optimization (not just pattern matching)
Save optimized .md to working directory
Cleanup entire /tmp directory automatically

Context Window Capacity

With 200K token context:

HTML spec (~1.8M tokens) alone
OR: DOM + Fetch + Streams + WebIDL + URL + 10 more smaller specs

With 1M token context:

HTML + all 21 other specs (~2.8M total)

With 2M+ token context:

All 22 specs multiple times!

Notes

Processing all 22 specs takes 5-15 minutes (depending on connection)
Requires active internet connection
All temporary files automatically deleted from /tmp
Safe to re-run download anytime to update specs
clean command removes specs from working directory only
No temporary files ever appear in your working directory
Version: 2.0.0 (pure Node.js using opencode SDK for intelligent optimization)

Performance Tips

Download Individual Specs

The script downloads all 22 specs, but you can modify the SPECS array in specs.js to download only specific ones:

// Edit specs.js and modify the SPECS array
const SPECS = ["dom", "fetch", "url"]; // Only these 3

Check Before/After

Use status command to see exactly what you have:

node specs.js status

Shows each spec with file size and estimated token count.

Troubleshooting

"pandoc: command not found"

brew install pandoc  # macOS
sudo apt install pandoc  # Linux

"curl: command not found"

# Curl is pre-installed on most systems
# If missing, install via package manager
sudo apt install curl  # Linux

Failed download

Check internet connection
Verify spec.whatwg.org is accessible
Check firewall settings
Try again (network issues are transient)

Not enough disk space in /tmp

The script needs ~300MB free in /tmp
Clean up /tmp manually if needed
Downloads happen one at a time to minimize space usage

License

This tool processes publicly available WHATWG specifications.

Processed specifications retain their original WHATWG licenses:

Most specs: Creative Commons Attribution 4.0 International License
Code portions: BSD 3-Clause License

This tool itself:

Use freely for any purpose
No warranty provided
Provided as-is

Ready to optimize? Run npm run download to get started!

Download all 22 WHATWG specifications, optimized and ready for LLM processing with 70% fewer tokens.

Complete list from https://spec.whatwg.org/ in alphabetical order.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
reports		reports
whatwg		whatwg
.gitignore		.gitignore
README.md		README.md
package.json		package.json
specs.js		specs.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WHATWG Specification Manager

Quick Start

Commands

What It Does

All 22 Specifications (Alphabetical Order)

Token Optimization Results

Individual Specifications (Estimated)

Combined Totals

Output Files

What's Preserved

What's Removed

Use Cases

For LLM Processing

For Development

For Documentation

Requirements

Install Dependencies

Example Workflow

Technical Details

Temporary File Handling

Optimization Pipeline

Context Window Capacity

Notes

Performance Tips

Download Individual Specs

Check Before/After

Troubleshooting

License

About

Uh oh!

Releases

Packages

Languages

liveview-native/specs

Folders and files

Latest commit

History

Repository files navigation

WHATWG Specification Manager

Quick Start

Commands

What It Does

All 22 Specifications (Alphabetical Order)

Token Optimization Results

Individual Specifications (Estimated)

Combined Totals

Output Files

What's Preserved

What's Removed

Use Cases

For LLM Processing

For Development

For Documentation

Requirements

Install Dependencies

Example Workflow

Technical Details

Temporary File Handling

Optimization Pipeline

Context Window Capacity

Notes

Performance Tips

Download Individual Specs

Check Before/After

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages