The Semantic Data Navigator is a Jupyter Notebook designed to process and analyze textual data using advanced natural language processing (NLP) techniques. It leverages OpenAI's embeddings and LangChain to load, split, and semantically search documents while providing visualization capabilities with t-SNE and Plotly.
- Load and preprocess textual data from a directory.
- Generate embeddings using OpenAI's
text-embedding-ada-002model. - Store and retrieve document embeddings using ChromaDB.
- Implement interactive visualizations with Plotly and Matplotlib.
- Facilitate semantic search and similarity-based document retrieval.
- Utilize Gradio for interactive exploration of search results.
- Load the Notebook: Open semantic-data-navigator.ipynb in Jupyter or VS Code.
- Set Up API Keys: Ensure your OpenAI API key is properly configured.
- Run the Notebook: Execute each cell sequentially to process and analyze textual data.
- Explore Visualizations: Use the provided interactive t-SNE and semantic search functionalities.
- Deploy with Gradio: Run the Gradio UI for an interactive document retrieval experience.
This is a demo of the working principle
*Figure: A side-by-side comparison of ChatGPT’s out-of-the-box response (left) versus a Retrieval-Augmented Generation (RAG)–based approach (right). The RAG approach references an external knowledge source to provide more contextual and targeted answers, demonstrating how the Semantic Data Navigator can seamlessly integrate advanced NLP capabilities and knowledge retrieval to improve query responses.*
- If you'd like to contribute to the development of this project, feel free to submit a pull request or raise an issue.
This project is open-source and distributed under the MIT License.
