COMP3602-Project

project for the course Data analysis and visualization with python

Part 1:

Dataset description
Attributes Description
bar plot to display the frequency distribution of all attributes
frequency distribution table for continues numerical attributes
Research questions (2-3)

Part 2:

Fill mising values with the mean of that column
Use boxplots to indetify features with outliers, replace them with the mean of that feature
Use labelEncoder for catigorical features
Scale the features with minMax normalization
Analyze the dataset as its in long (tidy) or wide format. If it is already in long format, then convert it (by selecting 2 or more variables) to wide and visa-versa.
Implement ANOVA method to identify irrelevant features, compute the F-statistics, bar chat to visualize the computed F- statistics and provide a list of the identified irrelevant features

Part 3

Download the acute-inflammations dataset
Perform all necessary preprocessing steps to clean the dataset
Determine the optimal size of training dataset using the K-NN algorithm
Implement the K-NN algorithm using Euclidean distance
Implement the K-NN algorithm using cosine similarity
Implement the decision tree algorithm
Provide your detailed analysis based on the obtained box-plots of these three algorithms

Part 4

Collect text data from the internet
pre-process the web documents to prepare them for clustering
Apply term frequency-inverse document frequency to extract important keywords
Apply random projection to transform the n x d TF-IDF matrix into an n x p data matrix, set the value of p = 500
Cluster the transformed dataset using the k-means algorithm with different values of k
Display each centeroid of the best clustering result by plotting a graph with feature on x axis and centeroid on the y axis

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
Part1&2		Part1&2
Part3		Part3
Part4		Part4
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COMP3602-Project

Part 1:

Part 2:

Part 3

Part 4

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

aLmktr/COMP3602-Project

Folders and files

Latest commit

History

Repository files navigation

COMP3602-Project

Part 1:

Part 2:

Part 3

Part 4

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages