Skip to content

Suddhu018/fullTextElasticSearchEngine

Repository files navigation

Full-Text Search Engine

##NOTE: Read the documentation file for complete understanding.

Overview

This Full-Text Search Engine is designed to efficiently index and search large sets of documents. It supports powerful features such as inverted indexing, auto-suggestions, and text tokenization with stemming and stop-word removal. By implementing multithreaded document processing, the system is optimized for speed and scalability.

Key Features:

  • Inverted Indexing: Enables fast lookup of documents containing specific terms.
  • Tokenization: Converts documents into a list of searchable tokens.
  • Stop-Word Removal: Filters out common words (like "the", "and") to enhance search relevance.
  • Multithreading Support: Speeds up processing by concurrently handling multiple documents.

Getting Started

Prerequisites

  • C++17 or higher: Make sure your environment supports C++17 for filesystem operations and threading.
  • Boost Library: The project uses the Boost C++ library for string manipulation.

Install the necessary dependencies:

sudo apt-get install libboost-all-dev

Building the Project

Make sure you have .txt files in the Documents/ folder for indexing. The engine will read all text files and generate the inverted index.


Usage

Preprocessing Documents

The system tokenizes each document, removes stop words, and stems tokens to their root forms. This step is essential to create a more compact and efficient index.

void preProcessTheData();

Indexing

The inverted index structure allows for quick retrieval of documents containing specific terms.

void buildInvertedIndex(const vector<string>& tokens, int docNumber);

Searching

Once the documents are indexed, users can search for keywords. The system will retrieve and rank documents based on relevance.


Future Enhancements

  • Phrase Searching: Support for searching exact phrases instead of single keywords.
  • Synonym Support: Adding synonyms to improve search results.
  • Real-Time Updates: Dynamically update the index as documents are added or modified.

License

This project is licensed under the MIT License - see the LICENSE file for details.


Authors

  • Sudhanshu Shekhar - Full-text search engine developer

Feel free to contribute, submit pull requests, or report issues!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published