Skip to content

dmcqueen/wine-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wine‑AI — Semantic Wine Recommendation & Vector Search with Vespa

Wine‑AI is an open‑source semantic search / recommendation engine that lets you discover the perfect wine pairing using vector search powered by Vespa.ai and dense embeddings from SentenceTransformers model paraphrase-multilingual-MiniLM-L12-v2. It indexes the 150K‑review Wine Enthusiast dataset and serves instant results via a lightweight FastAPI micro‑service.

Ask questions like “budget‑friendly Napa Cabernet for steak” or “wines that go with spicy Thai food” and get context‑aware matches ranked by both semantic similarity and BM25 relevance.


Features

  • Hybrid ranking: vector closeness + Vespa BM25 / nativeRank
  • Approximate Nearest Neighbor (ANN) search for sub‑100 ms latency
  • End‑to‑end Docker workflow: one script to spin up the Vespa cluster, ML model server, and ETL pipeline
  • Scales to million‑plus vectors thanks to Vespa’s streaming HNSW indexes

Architecture

  1. Tensor Server embeds user queries and document descriptions with SentenceTransformers.
  2. Vespa app stores each wine review plus its description_vector (384‑d).
  3. Hybrid rank profiles fuse vector similarity and text relevance.

Quick Start

Prerequisite: Docker Engine ≥ 20.10, curl

In a terminal from the project root run ./run.sh

What it does

# Launch Vespa & model‑server & streamlit containers
bin/deploy_servers.sh

# Verify Vespa is live
curl -s --head http://localhost:19071/ApplicationStatus

# Build Vespa application
bin/build_vespa_app.sh

# Deploy Vespa application
bin/deploy_vespa_app.sh

# Check application is up
curl -s --head http://localhost:8080/ApplicationStatus

# Transform CSV → Vespa JSON with embeddings
bin/transform_data.sh

# Feed documents into Vespa
bin/load_data.sh

Run a semantic query

bin/search_wines.sh "goes with seafood" vector

Switch to classic BM25 ranking:

bin/search_wines.sh "goes with seafood" default

Switch to Vespa NativeRank ranking:

bin/search_wines.sh "goes with seafood" default_2

Use the Streamlit UI

open http://localhost:8501/

Streamlit UI screenshot


Repository Layout

bin/              End‑to‑end scripts: deploy, transform, feed, search
data/             Raw Kaggle CSV files
load/             Load transformed data into Vespa app
tensor_server/    FastAPI + SentenceTransformers model service
transform/        CSV → Vespa JSON ETL utilities
ui/               Streamlit ui http://localhost:8501/
vespa_app/        Vespa application package (schema, services, query‑profiles)
run.sh            To run the demo

Vespa Schema & Ranking

Schema file: vespa_app/src/main/application/schemas/wine.sd

Field type Example fields
Text province, variety, description, winery, region
Numeric points, price
Vector description_vector (tensor(x[384]))

Rank Profiles

Profile Expression
default bm25(description)
default_2 nativeRank(description)
vector closeness(description_vector, query_vector)
vector_2 first-phase: closeness(description_vector, query_vector), second-phase: attribute(points)

Data Pipeline

  • csv_to_vespa_json.py — reads CSV, removes duplicates, embeds descriptions, writes feed files.
  • transform.sh wraps the above in a Python Docker container.
  • Feeds are loaded into Vespa via load_data.sh.
  • SentenceTransformers can be accelerated with ONNX or quantization if desired.

Contributing

Pull requests are welcome! If you use Wine‑AI in research or production, please ★ star the repo and open an issue to share your story. For significant changes, discuss your proposal in an issue first.


License

Apache 2.0

Citation: If you build upon this work in an academic context, please cite the Kaggle Wine Enthusiast dataset and link back to this repository.

About

🍷 Semantic wine recommendation engine powered by Vespa vector search & SentenceTransformers

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published