AI Agent Framework for the Real World
Build smarter automations β even for legacy software with no APIs.
π§ Note: This project is currently under active development and will be released as an open-source project soon. You can already star this repository to stay updated, and follow the build-in-public journey on YouTube. For more details, visit blindclicks.dev.
Blind Clicks is a self-hosted AI agent development framework that gives you full control over your automation workflows. Inspired by OpenAI's Operator and Anthropic's Computer Use, it goes one step further:
It works blindly β by simulating actual user clicks and keystrokes at the OS level.
It's like macros on steroids, but with an LLM-powered fallback vision layer β for when things don't go as planned.
Blind Clicks uses:
- RobotJS to simulate OS-level mouse and keyboard actions
- Dockerized Linux (via Webtop) to run applications in a browser-accessible container
- LLMs (your choice β bring your own) for prompt understanding
- Vision fallback (via OmniParser and others) to analyze screen metadata and adapt intelligently when clicks fail
Automate boring, complex workflows β even in ancient software that still runs on Windows XP.
This isn't another browser automation tool.
No Puppeteer. No Playwright. No DOM required.
Just raw, deterministic automation β designed for:
- Legacy systems with no APIs
- On-prem software where you can't use cloud agents
- Privacy-sensitive use cases (100% self-hosted)
- Developers & integrators building custom AI agents for real-world, messy UIs
When clicks fail or a UI changes unexpectedly, Blind Clicks captures the screen, analyzes it with tools like Microsoft's OmniParser, and adapts the workflow intelligently.
Fallback logic is promptable, explainable, and customizable.
Human-in-the-loop? Absolutely.
Even your AI agent can call you if it gets stuck.
Here's a real demo of an AI agent built using Blind Clicks β automating the accounting system Dativ (widely used in Germany).
This agent logs in, uploads invoices, parses and reconciles OCR data, and notifies the human when done. It even handles two-factor authentication with a human-in-the-loop fallback.
"If you've ever had to upload hundreds of invoices manually, you'll understand why this matters."
β Aemal
Blind Clicks is being built live, every Sunday, in public.
Every bug. Every breakthrough. Every idea. All documented and shared.
Episodes include live coding, real debugging sessions, design choices, and honest reflections.
π¬ "It's messy, fun, and very real. I lock myself in a room, hit record, and build this agent layer by layer."
Subscribe, follow the journey, and learn how AI agents are made β for real-world tasks.
Blind Clicks is a developer-first tool.
It's open-source, hackable, and still early in its journey. Every Sunday, I build this live on my YouTube channel.
π¬ "This project started as a way to help my wife automate a boring task. Now it's my playground for rethinking automation from the ground up."
β Aemal Sayer, Creator of Blind Clicks
Run it offline. Use your own LLM. Lock it behind a firewall.
Blind Clicks is privacy-first by design β perfect for European governments, hospitals, or any company that says "no" to OpenAI.
Aemal Sayer is a software engineer and AI enthusiast passionate about building practical solutions to real-world problems. With a background in automation and AI, he created Blind Clicks to solve a personal pain point that turned into a broader mission to make automation accessible for legacy systems.
- π Personal Website
- πΌ LinkedIn
This is not a product pitch. It's a project β a creative challenge β an open invitation.
If you're excited by deterministic AI agents, OS-level automation, or voice-based workflows that just workβ¦
π¬ Reach out or
πΊ Subscribe on YouTube β and let's build this together.