A Swift-based toolkit for macOS UI automation that provides both:
- MCP Server - Model Context Protocol server for AI agents
- Swift Library - Direct library for embedding in your applications
Enables interaction with macOS applications through accessibility APIs using a sophisticated element-based approach with UI tree management.
- Swift 6.1 or later
- macOS 14.0 or later
- Accessibility permissions - The server requires accessibility permissions to interact with applications
This MCP server exposes the following tools:
Retrieves UI elements for an application in a tree structure with limited depth (2-3 levels). Automatically opens the application if not running, brings it to focus, and provides a comprehensive overview of the application state.
Parameters:
bundle_identifier
(string, required): Bundle identifier of the application (e.g., "com.apple.safari" for Safari, "com.apple.TextEdit" for TextEdit)
Returns: JSON tree structure with UI elements containing:
element_id
: Unique identifier for the elementdescription
: Human-readable description of the elementchildren
: Array of child elements
Use case: Get an overview of the application state. If you need more details about specific elements, use update_ui_element_tree
.
Clicks a UI element by its ID using direct AXUIElement reference for maximum performance and reliability.
Parameters:
bundle_identifier
(string, required): Bundle identifier of the applicationelement_id
(string, required): Element ID obtained fromget_ui_elements
Returns: Confirmation message when the element is successfully clicked.
Updates and returns the UI element tree for a specific element by its ID. Call this function when you need more information about the children of a particular UI element.
Parameters:
bundle_identifier
(string, required): Bundle identifier of the applicationelement_id
(string, required): Element ID to update and return tree from (obtained fromget_ui_elements
)
Returns: JSON tree structure with updated UI elements and their children.
Use case: When you need to explore deeper into the UI hierarchy of a specific element.
Sets text in a UI text field element by its ID. Works with text fields, search bars, URL fields, text areas, mail composition areas, code editors, and more.
Parameters:
bundle_identifier
(string, required): Bundle identifier of the applicationelement_id
(string, required): Element ID of the text field (obtained fromget_ui_elements
)text
(string, required): Text to set in the text field
Returns: Confirmation message and complete updated UI tree of the application.
Use case: Write text into any text field across macOS applications - from code editors to email composition, URL bars to search fields.
swift build
swift run NudgeServer
swift test
dependencies: [
.package(url: "https://github.com/haarshitgarg/Nudge-Server.git", from: "1.0.0")
]
targets: [
.target(
name: "YourTarget",
dependencies: [
.product(name: "NudgeLibrary", package: "Nudge-Server")
]
)
]
- File → Add Package Dependencies...
- Enter:
https://github.com/haarshitgarg/Nudge-Server.git
- Select NudgeLibrary product
- Add to your target
- Go to System Preferences → Security & Privacy → Privacy → Accessibility
- Add your terminal application or the built executable to the list of allowed applications
- Ensure the checkbox is checked for the application
Without these permissions, the server will throw accessibilityPermissionDenied
errors.
Nudge-Server/
├── Package.swift # Swift package manifest
├── Sources/
│ ├── main_server.swift # Main server entry point
│ ├── servers/
│ │ ├── NavServer.swift # Main MCP server implementation
│ │ └── TestServer.swift # Test server (development)
│ ├── managers/
│ │ └── StateManager.swift # UI state management and element registry
│ ├── utility/
│ │ ├── utility.swift # Utility functions
│ │ └── StateManagerStructs.swift # Data structures for UI elements
│ └── error/
│ └── NudgeError.swift # Custom error types
├── Tests/
│ └── NudgeServerTests/
│ ├── WorkflowIntegrationTests.swift # Complete workflow tests
│ ├── EnhancedStateManagerTests.swift # Enhanced state manager tests
│ ├── ComprehensiveErrorHandlingTests.swift # Comprehensive error handling tests
│ └── ComprehensiveStateManagerTests.swift # Comprehensive state manager tests
└── Documentation/
├── ENHANCED_SERVER_GUIDE.md # Enhanced server guide
├── GEMINI.md # Gemini integration guide
└── REFACTORING_SUMMARY.md # Refactoring summary
- Element Registry: Maintains a registry of UI elements with unique IDs for reliable interaction
- Tree-based Discovery: Provides hierarchical UI structure for comprehensive application understanding
- Direct AXUIElement References: Uses direct accessibility API references for maximum performance
- Multi-Application Support: Handles multiple applications simultaneously with proper state management
- Auto-opening: Automatically opens applications if not running
- Focus Management: Brings applications to focus before interaction
- Window Detection: Focuses on frontmost windows and menu bars
- State Consistency: Maintains consistent UI state across operations
- Workflow Integration Tests: Tests complete workflows across multiple applications
- Error Handling Tests: Comprehensive error scenarios and recovery testing
- Performance Tests: Ensures operations complete within reasonable time limits
- Multi-Application Tests: Tests interaction with multiple applications simultaneously
import NudgeLibrary
class MyAutomator {
private let nudge = NudgeLibrary.shared
func automateApplication() async throws {
// Get UI elements for Safari
let elements = try await nudge.getUIElements(for: "com.apple.Safari")
print("Found \(elements.count) elements")
// Find and click a button
if let searchButton = elements.first(where: { $0.description.contains("Search") }) {
try await nudge.clickElement(
bundleIdentifier: "com.apple.Safari",
elementId: searchButton.element_id
)
print("Clicked search button")
}
// Update tree after interaction
if let firstElement = elements.first {
let updatedTree = try await nudge.updateUIElementTree(
bundleIdentifier: "com.apple.Safari",
elementId: firstElement.element_id
)
print("Updated tree has \(updatedTree.count) elements")
}
}
}
import SwiftUI
import NudgeLibrary
struct ContentView: View {
@State private var elements: [UIElementInfo] = []
@State private var isLoading = false
var body: some View {
VStack {
Button("Get Safari Elements") {
Task { await loadElements() }
}
.disabled(isLoading)
List(elements, id: \.element_id) { element in
Button(element.description) {
Task { await clickElement(element) }
}
}
}
}
func loadElements() async {
isLoading = true
defer { isLoading = false }
do {
elements = try await NudgeLibrary.shared.getUIElements(for: "com.apple.Safari")
} catch {
print("Error: \(error)")
}
}
func clickElement(_ element: UIElementInfo) async {
do {
try await NudgeLibrary.shared.clickElement(
bundleIdentifier: "com.apple.Safari",
elementId: element.element_id
)
} catch {
print("Click error: \(error)")
}
}
}
import NudgeLibrary
func listTools() async throws {
let (tools, nextCursor) = try await NudgeLibrary.shared.getNavTools()
for tool in tools {
print("Tool: \(tool.name)")
print("Description: \(tool.description)")
print("---")
}
}
This server runs in stdio mode and can be integrated with MCP-compatible clients. The server will:
- Accept MCP protocol messages via stdin
- Process tool calls for the 3 available tools
- Return results via stdout
- Handle errors gracefully with appropriate error messages
- Use
get_ui_elements
to discover UI elements and get their IDs - Use
click_element_by_id
to interact with specific elements - Use
update_ui_element_tree
to explore deeper into specific UI areas
The server enables AI agents to:
- Navigate complex application interfaces using element IDs
- Understand application state through hierarchical UI trees
- Perform reliable interactions with persistent element references
- Handle multi-step workflows across different applications
- Recover from errors and maintain state consistency
Tested with:
- TextEdit: Text editing and document manipulation
- Calculator: Mathematical operations and button interactions
- Safari: Web browsing and navigation
- And many more macOS applications
Feature | Library | MCP Server |
---|---|---|
Performance | Direct function calls | JSON serialization overhead |
Integration | Swift Package Manager / Xcode | MCP protocol clients |
Type Safety | Native Swift types | String-based JSON |
Use Case | Embed in Swift apps | AI agent integration |
Dependencies | Minimal (just MCP for tools) | Full MCP + Service Lifecycle |
- Building native Swift/macOS applications
- Need maximum performance for UI automation
- Want type safety and direct debugging
- Integrating into existing Swift projects
- Building custom automation tools
- Working with AI agents (Claude, GPT, etc.)
- Need language-agnostic integration
- Want to expose tools to external systems
- Building distributed automation systems
- Using MCP-compatible clients
The server provides comprehensive error handling for:
- Missing accessibility permissions
- Application not found or not running
- Invalid UI elements or element IDs
- Element registry inconsistencies
- Network and protocol errors
- Invalid arguments and requests
- Multi-application state conflicts
All errors are returned as structured MCP error responses with descriptive messages and proper error recovery mechanisms.
The project includes comprehensive tests covering:
- Workflow Integration: Complete end-to-end workflows
- State Management: UI element registry and state consistency
- Error Handling: All error scenarios and recovery paths
- Performance: Timing and efficiency of operations
- Multi-Application: Concurrent application handling
Run all tests with:
swift test
When started, the server provides:
- 🚀 Auto-opening applications
- 📊 Tree-based UI structure discovery
- ⚡ Direct AXUIElement performance
- 🎯 Element ID-based interactions
- 🔄 UI tree updates and exploration
- 🛠️ Comprehensive error handling
- 📱 Multi-application support
Ready for advanced macOS UI automation tasks!