Arker is a web archiving server written in Go that captures web pages using multiple strategies and provides a short URL interface for accessing archived content.
-
Multiple Archive Types:
- MHTML (complete webpage with resources)
- Full-page screenshots
- Git repository cloning
- YouTube video downloads via yt-dlp
-
Admin Interface: Web-based admin panel for managing archived URLs and viewing capture history
-
Short URLs: Clean, short URLs for accessing archived content (e.g.,
arker.hackclub.com/hc139d
) -
Display Page: Archive viewer with tabs for each archive type and metadata bar
-
Git Clone Support: Archived git repositories can be cloned directly using standard git commands
-
Streaming & Compression: All archives are compressed using zstd and streamed to/from storage
-
Queue System: Configurable worker pool for processing archive requests
-
Modular Storage: Interface-based storage system (filesystem now, S3 ready)
-
Start with Docker Compose:
docker compose up --build
-
Access the application:
- Admin interface: http://localhost:8080/login
- Default credentials:
admin/admin
-
Archive a URL:
- Use the admin interface to add a new URL
- Or use the API:
POST /api/v1/archive
with{"url": "https://example.com"}
-
View archives:
- Click on any short ID in the admin interface
- Or visit directly: http://localhost:8080/{shortid}
curl -X POST http://localhost:8080/api/v1/archive \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com", "types": ["mhtml", "screenshot"]}'
git clone http://localhost:8080/git/{shortid}
Environment variables:
DB_URL
: PostgreSQL connection stringSTORAGE_PATH
: Path for archive storage (default:./storage
)CACHE_PATH
: Path for git clone cache (default:./cache
)MAX_WORKERS
: Number of archive workers (default:5
)
- Go 1.22+
- PostgreSQL
- Git
- Python 3 with yt-dlp
- Playwright dependencies
# Install dependencies
go mod tidy
# Start PostgreSQL (or use Docker)
docker run -d --name postgres \
-e POSTGRES_USER=user \
-e POSTGRES_PASSWORD=pass \
-e POSTGRES_DB=arker \
-p 5432:5432 postgres:15
# Install Playwright
go install github.com/playwright-community/playwright-go/cmd/playwright@latest
playwright install chromium
# Install yt-dlp
pip install yt-dlp
# Run the application
go run .
go test -v
The application uses:
- Gin for HTTP routing and middleware
- GORM for database operations with PostgreSQL
- Playwright for browser automation (MHTML, screenshots)
- go-git for Git repository operations
- zstd for streaming compression
- Sessions for admin authentication
Archive types are implemented using the Archiver
interface, making it easy to add new archive strategies.
Storage uses the Storage
interface, currently implemented for filesystem but designed for easy S3 integration.
The included docker-compose.yml
provides a complete deployment with PostgreSQL and proper resource limits for Playwright.
- Change the default admin credentials in production
- Use a secure session key (update the hardcoded "secret-key-change-in-production")
- Consider adding rate limiting for the API endpoints
- Ensure proper network security for database access
MIT License - See LICENSE file for details.