Project NASS – News Archives Scraper & Searcher

Web Scraper & Search Engine for Archived News Articles | Built with Python & BeautifulSoup

📌 Overview

Project NASS is a self-initiated endeavor that showcases my proficiency in Python and web scraping.

It is a command line application with no UI.

🔍 Features

Scrapes all articles from The Hindu's 2010 archivs.
Stores the scraped data in a structured text file (`newsarticles.txt)
Implements a search engine to query articles by names of notable individuas.
Utilizes BeautifulSoup for efficient HTML parsig.

🛠️ Technologies Used

Python3.x
BeautifulSoup
requests` libary

🚀 Getting Started

Pre-requisite

Python 3.x installed on your macine.

Installation

Clone the reposiory:

git clone https://github.com/mishalalex/ProjectWebScrapper.git

Navigate to the project direcory:
```
cd ProjectWebScrapper
```
Install the required dependenies:
```
pip install -r requirements.txt
```

📂 Project Structue

ArticleScrapper.py – Script to scrape articles from The Hindu's 2010 arcive.
SearchEngine.py – Script to search for names within the scraped artcle.
newsarticles.txt – Output file containing all scraped artcle.
requirements.txt – List of Python dependecies.

🧠 Inspiration & Learning

This project was conceived and developed independently, reflecting my dedication to learning and applying Python for real-world applications. It serves as a testament to my ability to self-learn and implement complex functionalities without external gudance.

📬 Contact

For any queries or suggestions, feel free to rech ou:

Email: [[email protected]]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project NASS – News Archives Scraper & Searcher

📌 Overview

🔍 Features

🛠️ Technologies Used

🚀 Getting Started

Pre-requisite

Installation

📂 Project Structue

🧠 Inspiration & Learning

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
ArticleScrapper.py		ArticleScrapper.py
README.md		README.md
SearchEngine.py		SearchEngine.py
newsarticles.txt		newsarticles.txt
requirements.txt		requirements.txt

mishalalex/ProjectWebScrapper

Folders and files

Latest commit

History

Repository files navigation

Project NASS – News Archives Scraper & Searcher

📌 Overview

🔍 Features

🛠️ Technologies Used

🚀 Getting Started

Pre-requisite

Installation

📂 Project Structue

🧠 Inspiration & Learning

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages