Wine‑AI — Semantic Wine Recommendation & Vector Search with Vespa

Wine‑AI is an open‑source semantic search / recommendation engine that lets you discover the perfect wine pairing using vector search powered by Vespa.ai and dense embeddings from SentenceTransformers model paraphrase-multilingual-MiniLM-L12-v2. It indexes the 150K‑review Wine Enthusiast dataset and serves instant results via a lightweight FastAPI micro‑service.

Ask questions like “budget‑friendly Napa Cabernet for steak” or “wines that go with spicy Thai food” and get context‑aware matches ranked by both semantic similarity and BM25 relevance.

Features

Hybrid ranking: vector closeness + Vespa BM25 / nativeRank
Approximate Nearest Neighbor (ANN) search for sub‑100 ms latency
End‑to‑end Docker workflow: one script to spin up the Vespa cluster, ML model server, and ETL pipeline
Scales to million‑plus vectors thanks to Vespa’s streaming HNSW indexes

Architecture

Tensor Server embeds user queries and document descriptions with SentenceTransformers.
Vespa app stores each wine review plus its description_vector (384‑d).
Hybrid rank profiles fuse vector similarity and text relevance.

Quick Start

Prerequisite: Docker Engine ≥ 20.10, curl

In a terminal from the project root run ./run.sh

What it does

# Launch Vespa & model‑server & streamlit containers
bin/deploy_servers.sh

# Verify Vespa is live
curl -s --head http://localhost:19071/ApplicationStatus

# Build Vespa application
bin/build_vespa_app.sh

# Deploy Vespa application
bin/deploy_vespa_app.sh

# Check application is up
curl -s --head http://localhost:8080/ApplicationStatus

# Transform CSV → Vespa JSON with embeddings
bin/transform_data.sh

# Feed documents into Vespa
bin/load_data.sh

Run a semantic query

bin/search_wines.sh "goes with seafood" vector

Switch to classic BM25 ranking:

bin/search_wines.sh "goes with seafood" default

Switch to Vespa NativeRank ranking:

bin/search_wines.sh "goes with seafood" default_2

Use the Streamlit UI

open http://localhost:8501/

Repository Layout

bin/              End‑to‑end scripts: deploy, transform, feed, search
data/             Raw Kaggle CSV files
load/             Load transformed data into Vespa app
tensor_server/    FastAPI + SentenceTransformers model service
transform/        CSV → Vespa JSON ETL utilities
ui/               Streamlit ui http://localhost:8501/
vespa_app/        Vespa application package (schema, services, query‑profiles)
run.sh            To run the demo

Vespa Schema & Ranking

Schema file: vespa_app/src/main/application/schemas/wine.sd

Field type	Example fields
Text	province, variety, description, winery, region
Numeric	points, price
Vector	description_vector (tensor(x[384]))

Rank Profiles

Profile	Expression
default	bm25(description)
default_2	nativeRank(description)
vector	closeness(description_vector, query_vector)
vector_2	first-phase: closeness(description_vector, query_vector), second-phase: attribute(points)

Data Pipeline

csv_to_vespa_json.py — reads CSV, removes duplicates, embeds descriptions, writes feed files.
transform.sh wraps the above in a Python Docker container.
Feeds are loaded into Vespa via load_data.sh.
SentenceTransformers can be accelerated with ONNX or quantization if desired.

Contributing

Pull requests are welcome! If you use Wine‑AI in research or production, please ★ star the repo and open an issue to share your story. For significant changes, discuss your proposal in an issue first.

License

Apache 2.0

Citation: If you build upon this work in an academic context, please cite the Kaggle Wine Enthusiast dataset and link back to this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Wine‑AI — Semantic Wine Recommendation & Vector Search with Vespa

Features

Architecture

Quick Start

Use the Streamlit UI

Repository Layout

Vespa Schema & Ranking

Data Pipeline

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
bin		bin
data		data
load		load
tensor_server		tensor_server
transform		transform
ui		ui
vespa_app		vespa_app
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
run.sh		run.sh
streamlit-ui.png		streamlit-ui.png

License

dmcqueen/wine-ai

Folders and files

Latest commit

History

Repository files navigation

Wine‑AI — Semantic Wine Recommendation & Vector Search with Vespa

Features

Architecture

Quick Start

Use the Streamlit UI

Repository Layout

Vespa Schema & Ranking

Data Pipeline

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages