Columnist Author Identification Project

This project aims to perform author identification on a dataset consisting of columns from Turkish newspapers written by 18 different authors.

Dataset

The dataset comprises 630 Turkish columns, each written by one of 18 different authors. The data is sourced from "news-paper".

Algorithms Used

The project employs the following machine learning algorithms for author identification:

Bert
LSTM
XGBoost
Decision Tree
Naive Bayes
Random Forest
KNN (K-Nearest Neighbors)
Gradient Boost
SGD (Stochastic Gradient Descent)
SVM (Support Vector Machine)

Application Steps

Loading and Preprocessing the Dataset
Implementation and Training of Each Algorithm
Evaluation of Model Performance
Selection of the Best Performing Models

Best Performing Models

The project has identified Bert and SVM algorithms to achieve the highest accuracy scores. Below are the confusion matrices and accuracy values for these models:

Bert Model

Confusion Matrix:

Accuracy: 0.9793650793650793

SVM Model

Confusion Matrix:

Accuracy: 0.8492063492063492

Stopwords

The Turkish stopwords used in this project were obtained from countwordsfree.

Contributors

Special thanks to my friend Levent Demirkaya for their valuable contributions to this project.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
dataset		dataset
images		images
stopwordsTR		stopwordsTR
BERT.ipynb		BERT.ipynb
LSTM.ipynb		LSTM.ipynb
README.md		README.md
XgBoost.ipynb		XgBoost.ipynb
kararAğacı_NaiveBayes_RandomForest_KNN_GradientBoost.ipynb		kararAğacı_NaiveBayes_RandomForest_KNN_GradientBoost.ipynb
sgd.ipynb		sgd.ipynb
svm.ipynb		svm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Columnist Author Identification Project

Dataset

Algorithms Used

Application Steps

Best Performing Models

Bert Model

SVM Model

Stopwords

Contributors

About

Uh oh!

Releases

Packages

Languages

leventDemirkaya/TextMiningProject

Folders and files

Latest commit

History

Repository files navigation

Columnist Author Identification Project

Dataset

Algorithms Used

Application Steps

Best Performing Models

Bert Model

SVM Model

Stopwords

Contributors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages