Online News Popularity Analysis

Project Overview

An in-depth analysis of the Online News Popularity UCI dataset using data mining and machine learning techniques. This project explores how various factors influence the popularity of online articles through advanced analytics approaches.

Dataset Source

UCI Machine Learning Repository

About the Dataset

Content from articles published by Mashable
Acquisition date: January 8, 2015
The dataset contains statistics associated with articles rather than original content
Performance values were estimated using Random Forest classification with rolling window assessment

Dataset Attributes

The dataset contains 61 attributes including:

58 predictive features
2 non-predictive fields
1 target variable

Key Features Include:

Article Metadata: URL, publication timing
Content Statistics: Word counts, uniqueness metrics
Media Elements: Links, images, videos
Channel Categories: Lifestyle, Tech, Business, etc.
Keyword Performance: Min, max, and average shares
Temporal Features: Day of week indicators
Semantic Analysis: LDA topic proximity
Sentiment Analysis: Polarity, subjectivity metrics
Target Variable: Number of article shares

Challenges Encountered

Scale Management: Handling a large-scale dataset
Data Quality: Identifying and removing outliers
Computational Constraints: Resource limitations for complex models
Performance Optimization: Balancing accuracy with processing time
Business Perspective: Translating technical insights into business value

Key Learnings

Technical Skills:
- Data preparation and modeling best practices
- Efficient implementation of machine learning algorithms
- Working with various R environments (RStudio, Jupyter, Colab)
Professional Development:
- Academic reporting in ACM format
- Research methodology for data science projects
- Leveraging data science communities (Kaggle, KDnuggets)
Business Application:
- Viewing data through business impact lens
- Narrative construction from analytical findings
- Balancing roles of business analyst and data scientist

Tools Used

RStudio
Jupyter with R kernel
Google Colab with R kernel

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Latex_ACM_Format_Report		Latex_ACM_Format_Report
NoteBooks		NoteBooks
DataMining_FinalCode.r		DataMining_FinalCode.r
README.md		README.md
YogeshKumarPilli_DataMining_Report_M1.pdf		YogeshKumarPilli_DataMining_Report_M1.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Online News Popularity Analysis

Project Overview

Dataset Source

About the Dataset

Dataset Attributes

Key Features Include:

Challenges Encountered

Key Learnings

Tools Used

About

Uh oh!

Releases

Packages

Languages

yogeshkumarpilli/Data_Mining_M1_Project

Folders and files

Latest commit

History

Repository files navigation

Online News Popularity Analysis

Project Overview

Dataset Source

About the Dataset

Dataset Attributes

Key Features Include:

Challenges Encountered

Key Learnings

Tools Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages