This Cloudflare Worker provides functionality for searching through vector embeddings using Cloudflare Vectorize and Workers AI.
- Search for similar embeddings across multiple query terms
- Return results sorted by similarity score
- Support for parallel processing of multiple queries
- Deduplication of results
POST /
Content-Type: application/json
{
"queries": ["Your search query", "Another search query"],
"collection_id": "your-collection-id",
"topK": 5
}queries: A string or array of strings to search forcollection_id: The ID of the Vectorize collection to search intopK(optional): Number of results to return (default: 5)
{
"status": "success",
"matches": [
{
"id": "chunk-123",
"score": 0.95,
"metadata": {
"text": "The matching text content",
"additional_field": "Any additional metadata"
}
},
{
"id": "chunk-456",
"score": 0.92,
"metadata": {
"text": "Another relevant text match",
"additional_field": "Any additional metadata"
}
}
]
}Make sure to bind the following:
- Cloudflare Workers AI with the binding name "AI"
- Cloudflare Vectorize with the binding name "VECTORIZE"
Example wrangler.jsonc configuration:
The API will return appropriate error responses with descriptive messages when:
- Required parameters are missing
- Embedding generation fails
- Vector database queries fail
- Any other errors occur during processing
- Uses the
@cf/baai/bge-base-en-v1.5embedding model for query embedding generation - Processes multiple query terms in parallel for efficiency
- Results are combined and sorted by similarity score
- Duplicate results are filtered out by ID
- Returns the top K most similar results across all queries
A Cloudflare Worker for finding and retrieving chunks of text from embedded documents.
POST /
Finds chunks of text similar to the provided query text.
| Parameter | Type | Description |
|---|---|---|
| queries | string or array of strings | The query text to search for similar chunks |
| collection_id | string | The ID of the collection to search in |
| topK | number (optional) | Number of results to return, default is 5 |
| corpus | string (optional) | Filter by corpus (e.g., 'cia', 'frus', 'clinton') |
| doc_id | string (optional) | Filter by specific document ID |
| authored_start | string (optional) | Start date for filtering (YYYY-MM-DD format) |
| authored_end | string (optional) | End date for filtering (YYYY-MM-DD format) |
cfpf: Central Federal Policy Files (1.67M docs)cia: CIA documents (440K docs)frus: Foreign Relations of the United States (159K docs)un: United Nations documents (93K docs)worldbank: World Bank reports (68K docs)clinton: Clinton administration documents (30K docs)nato: NATO documents (23K docs)cabinet: U.S. Cabinet meeting records (20K docs)cpdoc: Brazilian historical archives (6K docs)kissinger: Henry Kissinger's diplomatic work (2K docs)briefing: Government briefing documents (924 docs)
{
"queries": "Cold War diplomacy in Eastern Europe",
"collection_id": "history-lab-1",
"topK": 10,
"corpus": "cia",
"authored_start": "1965-01-01",
"authored_end": "1975-12-31"
}{
"status": "success",
"matches": [
{
"id": "chunk-123",
"text": "The Soviet Union's influence in Eastern Europe remained strong throughout the Cold War period...",
"score": 0.92,
"metadata": {
"corpus": "cia",
"doc_id": "doc-456",
"authored": 157680000,
"title": "Analysis of Eastern Bloc Politics"
}
},
// Additional matches...
]
}- Node.js 18 or later
- Wrangler CLI (
npm install -g wrangler)
- Clone the repository
- Install dependencies:
npm install - Configure your environment variables in
wrangler.jsonc
wrangler publish
{ "name": "vector-search-worker", "main": "src/index.ts", "compatibility_date": "2025-02-11", "compatibility_flags": ["nodejs_compat"], "ai": { "binding": "AI" }, "vectorize": [ { "binding": "VECTORIZE", "index_name": "files-1" } ], "observability": { "enabled": true } }