Skip to content

Commit 3b90e19

Browse files
authored
Merge pull request #45 from dayyass/develop
release v0.1.1
2 parents b083cf1 + 4646ecd commit 3b90e19

20 files changed

+422
-124
lines changed

.coveragerc

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
[run]
2+
branch = True
3+
source = text_clf
4+
5+
[report]
6+
exclude_lines =
7+
pragma: no cover
8+
if self\.debug
9+
raise AssertionError
10+
raise NotImplementedError
11+
if __name__ == .__main__.:
12+
13+
omit =
14+
text_clf/__main__.py
15+
16+
show_missing = True
17+
ignore_errors = False

.github/workflows/codecov.yml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# This workflow will install Python dependencies and run codecov
2+
# https://github.com/codecov/codecov-action#example-workflowyml-with-codecov-action
3+
4+
name: codecov
5+
6+
on:
7+
push:
8+
branches: [main, develop]
9+
pull_request:
10+
branches: [main, develop]
11+
12+
jobs:
13+
build:
14+
runs-on: ${{ matrix.os }}
15+
strategy:
16+
matrix:
17+
os: [ubuntu-latest]
18+
steps:
19+
- uses: actions/checkout@master
20+
- name: Set up Python
21+
uses: actions/setup-python@master
22+
with:
23+
python-version: 3.7
24+
- name: Install dependencies
25+
run: |
26+
pip install --upgrade pip
27+
pip install -r requirements.txt
28+
pip install pytest pytest-cov
29+
- name: Generate coverage report
30+
run: |
31+
pytest --cov=./ --cov-report=xml
32+
- name: Upload coverage to Codecov
33+
uses: codecov/codecov-action@v1
34+
with:
35+
flags: unittests
36+
env_vars: OS,PYTHON
37+
fail_ci_if_error: true
38+
verbose: true

.github/workflows/linter.yml

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# This workflow will install Python dependencies and run linter
2+
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
3+
4+
name: linter
5+
6+
on:
7+
push:
8+
branches: [main, develop]
9+
pull_request:
10+
branches: [main, develop]
11+
12+
jobs:
13+
build:
14+
runs-on: ${{ matrix.os }}
15+
strategy:
16+
matrix:
17+
os: [ubuntu-latest]
18+
steps:
19+
- uses: actions/checkout@v2
20+
- name: Set up Python
21+
uses: actions/setup-python@v2
22+
with:
23+
python-version: 3.7
24+
- name: Install dependencies
25+
run: |
26+
pip install --upgrade pip
27+
pip install isort black flake8 types-PyYAML mypy
28+
- name: Code format check with isort
29+
run: |
30+
isort --check-only --profile black .
31+
- name: Code format check with black
32+
run: |
33+
black --check .
34+
- name: Lint with flake8
35+
run: |
36+
# stop the build if there are Python syntax errors or undefined names
37+
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
38+
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
39+
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
40+
- name: Type check with mypy
41+
run: mypy --ignore-missing-imports .

.github/workflows/tests.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# This workflow will install Python dependencies and run tests with a variety of Python versions
2+
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
3+
4+
name: tests
5+
6+
on:
7+
push:
8+
branches: [main, develop]
9+
pull_request:
10+
branches: [main, develop]
11+
12+
jobs:
13+
build:
14+
runs-on: ${{ matrix.os }}
15+
strategy:
16+
matrix:
17+
python-version: ['3.6', '3.7', '3.8', '3.9']
18+
os: [ubuntu-latest, macOS-latest, windows-latest]
19+
steps:
20+
- uses: actions/checkout@v2
21+
- name: Set up Python
22+
uses: actions/setup-python@v2
23+
with:
24+
python-version: ${{ matrix.python-version }}
25+
- name: Install dependencies
26+
run: |
27+
pip install --upgrade pip
28+
pip install -r requirements.txt
29+
- name: Tests
30+
run: |
31+
python -m unittest discover

.gitignore

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,8 @@ dist
1818

1919
*.egg-info/
2020

21-
data/*
22-
!data/README.md
23-
!data/fetch_20newsgroups.py
21+
data/train.csv
22+
data/valid.csv
2423

2524
models/*
2625
!models/README.md

Dockerfile

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
FROM python:3.7-slim-buster
2+
MAINTAINER Dani El-Ayyass <[email protected]>
3+
4+
WORKDIR /workdir
5+
6+
COPY config.yaml ./
7+
COPY data/train.csv data/valid.csv data/
8+
9+
RUN pip install --upgrade pip && \
10+
pip install --no-cache-dir text-classification-baseline
11+
12+
CMD ["bash"]

README.md

Lines changed: 38 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
1+
[![tests](https://github.com/dayyass/text-classification-baseline/actions/workflows/tests.yml/badge.svg)](https://github.com/dayyass/text-classification-baseline/actions/workflows/tests.yml)
2+
[![linter](https://github.com/dayyass/text-classification-baseline/actions/workflows/linter.yml/badge.svg)](https://github.com/dayyass/text-classification-baseline/actions/workflows/linter.yml)
3+
[![codecov](https://codecov.io/gh/dayyass/text-classification-baseline/branch/main/graph/badge.svg?token=ABFF3YQBJV)](https://codecov.io/gh/dayyass/text-classification-baseline)
4+
5+
[![python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)](https://github.com/dayyass/text-classification-baseline#requirements)
6+
[![release (latest by date)](https://img.shields.io/github/v/release/dayyass/text-classification-baseline)](https://github.com/dayyass/text-classification-baseline/releases/latest)
7+
[![license](https://img.shields.io/github/license/dayyass/text-classification-baseline?color=blue)](https://github.com/dayyass/text-classification-baseline/blob/main/LICENSE)
8+
9+
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-black)](https://github.com/dayyass/text-classification-baseline/blob/main/.pre-commit-config.yaml)
10+
[![code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
11+
12+
[![pypi version](https://img.shields.io/pypi/v/text-classification-baseline)](https://pypi.org/project/text-classification-baseline)
13+
[![pypi downloads](https://img.shields.io/pypi/dm/text-classification-baseline)](https://pypi.org/project/text-classification-baseline)
14+
115
### Text Classification Baseline
2-
Pipeline for building text classification **TF-IDF + LogReg** baselines using **sklearn**.
16+
Pipeline for fast building text classification **TF-IDF + LogReg** baselines.
317

418
### Usage
519
Instead of writing custom code for specific text classification task, you just need:
@@ -8,18 +22,16 @@ Instead of writing custom code for specific text classification task, you just n
822
pip install text-classification-baseline
923
```
1024
2. run pipeline:
25+
- either in **terminal**:
26+
```shell script
27+
text-clf-train
28+
```
29+
- or in **python**:
30+
```python3
31+
import text_clf
1132

12-
- either in **terminal**:
13-
```shell script
14-
text-clf --config config.yaml
15-
```
16-
17-
- or in **python**:
18-
```python3
19-
import text_clf
20-
21-
text_clf.train(path_to_config="config.yaml")
22-
```
33+
text_clf.train()
34+
```
2335

2436
No data preparation is needed, only a **csv** file with two raw columns (with arbitrary names):
2537
- `text`
@@ -30,7 +42,17 @@ No data preparation is needed, only a **csv** file with two raw columns (with ar
3042
#### Config
3143
The user interface consists of only one file [**config.yaml**](https://github.com/dayyass/text-classification-baseline/blob/main/config.yaml).
3244

33-
Change **config.yaml** to create the desired configuration and train text classification model.
45+
Change **config.yaml** to create the desired configuration and train text classification model with the following command:
46+
- **terminal**:
47+
```shell script
48+
text-clf-train --path_to_config config.yaml
49+
```
50+
- **python**:
51+
```python3
52+
import text_clf
53+
54+
text_clf.train(path_to_config="config.yaml")
55+
```
3456

3557
Default **config.yaml**:
3658
```yaml
@@ -63,6 +85,8 @@ logreg:
6385
n_jobs: -1
6486
```
6587
88+
**NOTE**: `tf-idf` and `logreg` are sklearn [**TfidfVectorizer**](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html?highlight=tfidf#sklearn.feature_extraction.text.TfidfVectorizer) and [**LogisticRegression**](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) parameters correspondingly, so you can parameterize instances of these classes however you want.
89+
6690
#### Output
6791
After training the model, the pipeline will return the following files:
6892
- `model.joblib` - sklearn pipeline with TF-IDF and LogReg steps
@@ -71,7 +95,7 @@ After training the model, the pipeline will return the following files:
7195
- `logging.txt` - logging file
7296

7397
### Requirements
74-
Python >= 3.7
98+
Python >= 3.6
7599

76100
### Citation
77101
If you use **text-classification-baseline** in a scientific publication, we would appreciate references to the following BibTex entry:

codecov.yml

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
codecov:
2+
require_ci_to_pass: yes
3+
4+
ignore:
5+
- "text_clf/__main__.py"
6+
7+
coverage:
8+
status:
9+
project:
10+
default: false
11+
source:
12+
paths:
13+
- "text_clf/"
14+
target: 90%
15+
patch: off

data/__init__.py

Whitespace-only changes.

data/fetch_20newsgroups.py

Lines changed: 0 additions & 45 deletions
This file was deleted.

0 commit comments

Comments
 (0)