🧠 Semantica

Semantica is a lightweight semantic search engine for PDF documents. It processes PDF files, converts them into vectorized text chunks, and enables intelligent retrieval using embedding-based similarity search — all without using an LLM.

🚀 Features

📄 Upload any PDF file
✂️ Automatic chunking of document content
🔢 Embedding with HuggingFace (MiniLM)
🧠 Vector search with Qdrant
⚡ Fast and local — no OpenAI API required
📆 Built with FastAPI, LangChain, and Qdrant

🖥️ Tech Stack

Layer	Tool
Backend	FastAPI
Parsing	`pymupdf4llm`
Chunking	`LangChain MarkdownTextSplitter`
Embedding	`sentence-transformers/all-MiniLM-L6-v2`
Vector DB	Qdrant via Docker

🔧 Setup & Run

1. Clone the repository

git clone https://github.com/yourname/semantica.git
cd semantica

2. Install dependencies

pip install -r requirements.txt

3. Run Qdrant locally (Docker)

docker run -p 6333:6333 qdrant/qdrant

4. Start the FastAPI server

fastapi dev main

Then open the Swagger UI at: 📍 http://localhost:8000/docs

🦪 API Endpoints

`POST /upload`

Uploads and parses a PDF file. Chunks it and saves to Qdrant with embeddings.

`POST /search`

Send a semantic query and receive relevant chunks. Example request:

{
  "query": "Does this PDF mention 'fun' keyword?"
}

Example response:

[
  {
    "score": 0.92,
    "text": "This is a simple PDF file. Fun fun fun.",
    "source_file": "sample.pdf",
    "chunk_id": 1
  }
]

📜 Future Plans

LLM-based answer generation
Multi-document support
Frontend interface for document search (possibly separated)

🤝 Contributing

Pull requests, feedback and ideas are always welcome. If you use this project, feel free to ⭐️ the repo and share your feedback.

📄 License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
models		models
services		services
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Semantica

🚀 Features

🖥️ Tech Stack

🔧 Setup & Run

1. Clone the repository

2. Install dependencies

3. Run Qdrant locally (Docker)

4. Start the FastAPI server

🦪 API Endpoints

`POST /upload`

`POST /search`

📜 Future Plans

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

oguzsh/semantica

Folders and files

Latest commit

History

Repository files navigation

🧠 Semantica

🚀 Features

🖥️ Tech Stack

🔧 Setup & Run

1. Clone the repository

2. Install dependencies

3. Run Qdrant locally (Docker)

4. Start the FastAPI server

🦪 API Endpoints

POST /upload

POST /search

📜 Future Plans

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`POST /upload`

`POST /search`

Packages