indic-hate-classifier

A Hate classifier for low resource Indian languages

DataSets

This project uses datasets collected from various research papers and AI workshops.

Dataset format

Column	Description	Format
UID	Unique identifier to trace the origin of the dataset and act as index for dataset.	<language_code><train/test/val>_<index_number>
text	The text content used for classifier	utf-8 encoded text
label_yn	A binary label indicating whether text is classified as hate / non-hate in respective datasets.	1 - hate 0 - non-hate

Sources

Datasets used for each language are mentioned below.

Hindi :

Marathi :

Telugu :

https://github.com/ShareChatAI/MACD/tree/main/dataset

Development Setup

Please install Poetry
Run poetry install in the project root to install required dependencies.
Run poetry shell to create a new poetry shell.
Run jupyter notebook to run jupyter notebook server.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

indic-hate-classifier

DataSets

Dataset format

Sources

Hindi :

Marathi :

Telugu :

Development Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

sarathsomana/indichateclassifier

Folders and files

Latest commit

History

Repository files navigation

indic-hate-classifier

DataSets

Dataset format

Sources

Hindi :

Marathi :

Telugu :

Development Setup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages