Full-Text Search Engine

##NOTE: Read the documentation file for complete understanding.

Overview

This Full-Text Search Engine is designed to efficiently index and search large sets of documents. It supports powerful features such as inverted indexing, auto-suggestions, and text tokenization with stemming and stop-word removal. By implementing multithreaded document processing, the system is optimized for speed and scalability.

Key Features:

Inverted Indexing: Enables fast lookup of documents containing specific terms.
Tokenization: Converts documents into a list of searchable tokens.
Stop-Word Removal: Filters out common words (like "the", "and") to enhance search relevance.
Multithreading Support: Speeds up processing by concurrently handling multiple documents.

Getting Started

Prerequisites

C++17 or higher: Make sure your environment supports C++17 for filesystem operations and threading.
Boost Library: The project uses the Boost C++ library for string manipulation.

Install the necessary dependencies:

sudo apt-get install libboost-all-dev

Building the Project

Make sure you have .txt files in the Documents/ folder for indexing. The engine will read all text files and generate the inverted index.

Usage

Preprocessing Documents

The system tokenizes each document, removes stop words, and stems tokens to their root forms. This step is essential to create a more compact and efficient index.

void preProcessTheData();

Indexing

The inverted index structure allows for quick retrieval of documents containing specific terms.

void buildInvertedIndex(const vector<string>& tokens, int docNumber);

Searching

Once the documents are indexed, users can search for keywords. The system will retrieve and rank documents based on relevance.

Future Enhancements

Phrase Searching: Support for searching exact phrases instead of single keywords.
Synonym Support: Adding synonyms to improve search results.
Real-Time Updates: Dynamically update the index as documents are added or modified.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Authors

Sudhanshu Shekhar - Full-text search engine developer

Feel free to contribute, submit pull requests, or report issues!

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.vscode		.vscode
Documents		Documents
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
LRUcache.cpp		LRUcache.cpp
README.md		README.md
TF_IDF.cpp		TF_IDF.cpp
crow_all.h		crow_all.h
documentation.txt		documentation.txt
driverCode		driverCode
driverCode.cpp		driverCode.cpp
invertedIndex.cpp		invertedIndex.cpp
portersStemming.cpp		portersStemming.cpp
preprocessingDoc.cpp		preprocessingDoc.cpp
query.cpp		query.cpp
trieAutoSuggestion.cpp		trieAutoSuggestion.cpp
typoCorrection.cpp		typoCorrection.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Full-Text Search Engine

Overview

Key Features:

Getting Started

Prerequisites

Building the Project

Usage

Preprocessing Documents

Indexing

Searching

Future Enhancements

License

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Suddhu018/fullTextElasticSearchEngine

Folders and files

Latest commit

History

Repository files navigation

Full-Text Search Engine

Overview

Key Features:

Getting Started

Prerequisites

Building the Project

Usage

Preprocessing Documents

Indexing

Searching

Future Enhancements

License

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages