Building a search engine from scratch. We plan on implementing the 3 major components in a search engine - Crawler, Parser and Indexing. We will begin by developing command line tools for these components and then wrapping these with an API service to be used by a frontend. This project is being done under IEEE-NITK.
To establish a VPN connection to NITK-NET:
- Login at the Sophos portal - link.
- Download SSL-VPN config file for the necessary OS.
- Execute
sudo openvpn <path-to-config-file>to initiate the connection sequence. Keep this terminal open.
- Execute
ssh <user>@<container-ip>and then enter necessary details on being prompted.
- Install
Docker Engineby following this link.
# Install Chrome
RUN curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add \
&& echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list \
&& apt-get -y update \
&& apt-get -y install google-chrome-stable
# Install chromedriver
RUN wget -N https://chromedriver.storage.googleapis.com/108.0.5359.71/chromedriver_linux64.zip -P ~/ \
&& unzip ~/chromedriver_linux64.zip -d ~/ \
&& rm ~/chromedriver_linux64.zip \
&& mv -f ~/chromedriver /usr/local/bin/chromedriverWarning
Take care to usecompatibleversions forgoogle-chromeandchromedriver. Refer this answer on StackOverflow.
- Touch a file -
.env.MONGO_USER=admin MONGO_PASSWORD=adminpw MONGO_DATABASE=test
- Create a virtual environment and then install all the dependencies in
andromeda/requirements.txtafter activating the environment.
- Activate the virtual environment.
- Execute
docker-compose up -dto bring up theMongoDBserver. - Execute
python3 andromeda/crawler.py startto start the process of crawling.
Note
In the Docker network, the MongoDB server will be running at port -27017and a service known as Mongo-Express will be running at port -8081which provides a GUI to access the database.
- Execute
pylint andromeda/before making a PR and get rid of any lint errors.
- Python
- NextJS
- click
- Flask-RESTful