Skip to content

yogeshkumarpilli/Data_Mining_M1_Project

Repository files navigation

Online News Popularity Analysis

Project Overview

An in-depth analysis of the Online News Popularity UCI dataset using data mining and machine learning techniques. This project explores how various factors influence the popularity of online articles through advanced analytics approaches.

Dataset Source

About the Dataset

  • Content from articles published by Mashable
  • Acquisition date: January 8, 2015
  • The dataset contains statistics associated with articles rather than original content
  • Performance values were estimated using Random Forest classification with rolling window assessment

Dataset Attributes

The dataset contains 61 attributes including:

  • 58 predictive features
  • 2 non-predictive fields
  • 1 target variable

Key Features Include:

  • Article Metadata: URL, publication timing
  • Content Statistics: Word counts, uniqueness metrics
  • Media Elements: Links, images, videos
  • Channel Categories: Lifestyle, Tech, Business, etc.
  • Keyword Performance: Min, max, and average shares
  • Temporal Features: Day of week indicators
  • Semantic Analysis: LDA topic proximity
  • Sentiment Analysis: Polarity, subjectivity metrics
  • Target Variable: Number of article shares

Challenges Encountered

  • Scale Management: Handling a large-scale dataset
  • Data Quality: Identifying and removing outliers
  • Computational Constraints: Resource limitations for complex models
  • Performance Optimization: Balancing accuracy with processing time
  • Business Perspective: Translating technical insights into business value

Key Learnings

  • Technical Skills:

    • Data preparation and modeling best practices
    • Efficient implementation of machine learning algorithms
    • Working with various R environments (RStudio, Jupyter, Colab)
  • Professional Development:

    • Academic reporting in ACM format
    • Research methodology for data science projects
    • Leveraging data science communities (Kaggle, KDnuggets)
  • Business Application:

    • Viewing data through business impact lens
    • Narrative construction from analytical findings
    • Balancing roles of business analyst and data scientist

Tools Used

  • RStudio
  • Jupyter with R kernel
  • Google Colab with R kernel

About

Analysis of data mining and machine learning concepts on Online News Popularity UCI dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published