Web Scraper & Search Engine for Archived News Articles | Built with Python & BeautifulSoup
Project NASS is a self-initiated endeavor that showcases my proficiency in Python and web scraping.
It is a command line application with no UI.
- Scrapes all articles from The Hindu's 2010 archivs.
- Stores the scraped data in a structured text file (`newsarticles.txt)
- Implements a search engine to query articles by names of notable individuas.
- Utilizes
BeautifulSoup
for efficient HTML parsig.
- Python3.x
- BeautifulSoup
- requests` libary
- Python 3.x installed on your macine.
-
Clone the reposiory:
git clone https://github.com/mishalalex/ProjectWebScrapper.git
-
Navigate to the project direcory:
cd ProjectWebScrapper
-
Install the required dependenies:
pip install -r requirements.txt
ArticleScrapper.py
– Script to scrape articles from The Hindu's 2010 arcive.SearchEngine.py
– Script to search for names within the scraped artcle.newsarticles.txt
– Output file containing all scraped artcle.requirements.txt
– List of Python dependecies.
This project was conceived and developed independently, reflecting my dedication to learning and applying Python for real-world applications. It serves as a testament to my ability to self-learn and implement complex functionalities without external gudance.
For any queries or suggestions, feel free to rech ou:
- Email: [[email protected]]