Skip to content

Quantum-Software-Development/6-DataMining_Pre-Processing

Repository files navigation


[πŸ‡§πŸ‡· PortuguΓͺs] [πŸ‡ΊπŸ‡Έ English]



6- Data Mining / Data Cleaning, Preparation and Detection of Anomalies (Outlier Detection)



Institution: Pontifical Catholic University of SΓ£o Paulo (PUC-SP)
School: Faculty of Interdisciplinary Studies
Program: Humanistic AI and Data Science Semester: 2nd Semester 2025
Professor: Professor Doctor in Mathematics Daniel Rodrigues da Silva



Sponsor Quantum Software Development






Important

⚠️ Heads Up







🎢 Prelude Suite no.1 (J. S. Bach) - Sound Design Remix
Statistical.Measures.and.Banking.Sector.Analysis.at.Bovespa.mp4

πŸ“Ί For better resolution, watch the video on YouTube.



Tip

This repository is a review of the Statistics course from the undergraduate program Humanities, AI and Data Science at PUC-SP.

Access Data Mining Main Repository

If you’d like to explore the full materials from the 1st year (not only the review), you can visit the complete repository here.



Explore datasets from the University of California Irvine (UCI) Machine Learning Repository : such as the Balloon, Bank Marketing, and Mammogram datasets to practice these concepts of data pre-processing and mining.



Table of Contents



Introduction

Real-world data are almost always incomplete, inconsistent, and noisy. These problems must be addressed via pre-processing to ensure clean, reliable data, a prerequisite for successful data mining.

The pre-processing step manipulates raw data into a form that enables better and more accurate knowledge extraction.


Common Problems in Raw Data

Incompleteness

Missing attribute values, records, or features.

Example: "?" in the credit card field or missing rows.

Inconsistency

Contradictory or conflicting entries within the data, e.g., units mixing kg with lbs.

Noise

Random variations or errors that obscure real data trends.


Garbage In, Garbage Out (GIGO)

Poor quality input data produce poor quality outputs and insights. Cleaning data beforehand is critical.



Types of Data


Type Description Examples
Structured Fixed fields, clear schema CSV, SQL tables
Semi-Structured Partial structure with markers XML, JSON, Emails
Unstructured No strict structure or schema Text, images, video files



Data Attributes and Their Types


Attribute Type Description Example
Binary Two possible values Yes/No, 0/1
Nominal Categorical, no order Marital Status
Ordinal Ordered categories Education Level
Ratio Numeric with meaningful zero Age, Salary

























1. Castro, L. N. & Ferrari, D. G. (2016). Introduction to Data Mining: Basic Concepts, Algorithms, and Applications. Saraiva.

2. Ferreira, A. C. P. L. et al. (2024). Artificial Intelligence – A Machine Learning Approach. 2nd Ed. LTC.

3. Larson & Farber (2015). Applied Statistics. Pearson.







πŸ›ΈΰΉ‹ My Contacts Hub





────────────── πŸ”­β‹† ──────────────

➣➒➀ Back to Top

Copyright 2025 Quantum Software Development. Code released under the MIT License license.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published