An open-source arXiv Semantic Search engine.
- FastAPI
- Qdrant
- light-embed (sentence-transformers/all-MiniLM-L12-v2)
- Kaggle API
The project is currently hosted on Google Cloud Compute Engine. The arXiv metadataset is acquired from the Kaggle API, and embedded with light-embed in batches. The vectors are then stored in Qdrant.
Used in densAIr - densAIr repository