Web Scraper & Search Engine for Archived News Articles | Built with Python & BeautifulSoup
Project NASS is a self-initiated endeavor that showcases my proficiency in Python and web scraping.
It is a command line application with no UI.
- Scrapes all articles from The Hindu's 2010 archivs.
- Stores the scraped data in a structured text file (`newsarticles.txt)
- Implements a search engine to query articles by names of notable individuas.
- Utilizes
BeautifulSoupfor efficient HTML parsig.
- Python3.x
- BeautifulSoup
- requests` libary
- Python 3.x installed on your macine.
-
Clone the reposiory:
git clone https://github.com/mishalalex/ProjectWebScrapper.git
-
Navigate to the project direcory:
cd ProjectWebScrapper -
Install the required dependenies:
pip install -r requirements.txt
ArticleScrapper.pyβ Script to scrape articles from The Hindu's 2010 arcive.SearchEngine.pyβ Script to search for names within the scraped artcle.newsarticles.txtβ Output file containing all scraped artcle.requirements.txtβ List of Python dependecies.
This project was conceived and developed independently, reflecting my dedication to learning and applying Python for real-world applications. It serves as a testament to my ability to self-learn and implement complex functionalities without external gudance.
For any queries or suggestions, feel free to rech ou:
- Email: [[email protected]]