Skip to content

Web Scrapper | Search Engine for Scrapped Results | Python | BeautifulSoup

Notifications You must be signed in to change notification settings

mishalalex/ProjectWebScrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project NASS – News Archives Scraper & Searcher

Web Scraper & Search Engine for Archived News Articles | Built with Python & BeautifulSoup


📌 Overview

Project NASS is a self-initiated endeavor that showcases my proficiency in Python and web scraping.

It is a command line application with no UI.


🔍 Features

  • Scrapes all articles from The Hindu's 2010 archivs.
  • Stores the scraped data in a structured text file (`newsarticles.txt)
  • Implements a search engine to query articles by names of notable individuas.
  • Utilizes BeautifulSoup for efficient HTML parsig.

🛠️ Technologies Used


🚀 Getting Started

Pre-requisite

  • Python 3.x installed on your macine.

Installation

  1. Clone the reposiory:

    git clone https://github.com/mishalalex/ProjectWebScrapper.git
  2. Navigate to the project direcory:

    cd ProjectWebScrapper
  3. Install the required dependenies:

    pip install -r requirements.txt

📂 Project Structue

  • ArticleScrapper.py – Script to scrape articles from The Hindu's 2010 arcive.
  • SearchEngine.py – Script to search for names within the scraped artcle.
  • newsarticles.txt – Output file containing all scraped artcle.
  • requirements.txt – List of Python dependecies.

🧠 Inspiration & Learning

This project was conceived and developed independently, reflecting my dedication to learning and applying Python for real-world applications. It serves as a testament to my ability to self-learn and implement complex functionalities without external gudance.


📬 Contact

For any queries or suggestions, feel free to rech ou:


About

Web Scrapper | Search Engine for Scrapped Results | Python | BeautifulSoup

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages