Wine‑AI is an open‑source semantic search / recommendation engine that lets you discover the perfect wine pairing using vector search powered by Vespa.ai and dense embeddings from SentenceTransformers model paraphrase-multilingual-MiniLM-L12-v2
.
It indexes the 150K‑review Wine Enthusiast dataset and serves instant results via a lightweight FastAPI micro‑service.
Ask questions like “budget‑friendly Napa Cabernet for steak” or “wines that go with spicy Thai food” and get context‑aware matches ranked by both semantic similarity and BM25 relevance.
- Hybrid ranking: vector closeness + Vespa BM25 / nativeRank
- Approximate Nearest Neighbor (ANN) search for sub‑100 ms latency
- End‑to‑end Docker workflow: one script to spin up the Vespa cluster, ML model server, and ETL pipeline
- Scales to million‑plus vectors thanks to Vespa’s streaming HNSW indexes
- Tensor Server embeds user queries and document descriptions with SentenceTransformers.
- Vespa app stores each wine review plus its
description_vector
(384‑d). - Hybrid rank profiles fuse vector similarity and text relevance.
Prerequisite: Docker Engine ≥ 20.10, curl
In a terminal from the project root run ./run.sh
What it does
# Launch Vespa & model‑server & streamlit containers
bin/deploy_servers.sh
# Verify Vespa is live
curl -s --head http://localhost:19071/ApplicationStatus
# Build Vespa application
bin/build_vespa_app.sh
# Deploy Vespa application
bin/deploy_vespa_app.sh
# Check application is up
curl -s --head http://localhost:8080/ApplicationStatus
# Transform CSV → Vespa JSON with embeddings
bin/transform_data.sh
# Feed documents into Vespa
bin/load_data.sh
Run a semantic query
bin/search_wines.sh "goes with seafood" vector
Switch to classic BM25 ranking:
bin/search_wines.sh "goes with seafood" default
Switch to Vespa NativeRank ranking:
bin/search_wines.sh "goes with seafood" default_2
open http://localhost:8501/
bin/ End‑to‑end scripts: deploy, transform, feed, search
data/ Raw Kaggle CSV files
load/ Load transformed data into Vespa app
tensor_server/ FastAPI + SentenceTransformers model service
transform/ CSV → Vespa JSON ETL utilities
ui/ Streamlit ui http://localhost:8501/
vespa_app/ Vespa application package (schema, services, query‑profiles)
run.sh To run the demo
Schema file: vespa_app/src/main/application/schemas/wine.sd
Field type | Example fields |
---|---|
Text | province, variety, description, winery, region |
Numeric | points, price |
Vector | description_vector (tensor(x[384])) |
Rank Profiles
Profile | Expression |
---|---|
default | bm25(description) |
default_2 | nativeRank(description) |
vector | closeness(description_vector, query_vector) |
vector_2 | first-phase: closeness(description_vector, query_vector), second-phase: attribute(points) |
csv_to_vespa_json.py
— reads CSV, removes duplicates, embeds descriptions, writes feed files.transform.sh
wraps the above in a Python Docker container.- Feeds are loaded into Vespa via
load_data.sh
. - SentenceTransformers can be accelerated with ONNX or quantization if desired.
Pull requests are welcome! If you use Wine‑AI in research or production, please ★ star the repo and open an issue to share your story. For significant changes, discuss your proposal in an issue first.
Apache 2.0
Citation: If you build upon this work in an academic context, please cite the Kaggle Wine Enthusiast dataset and link back to this repository.