You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[](https://github.com/dayyass/text-classification-baseline/releases/latest)
Pipeline for building text classification **TF-IDF + LogReg** baselines using **sklearn**.
16
+
Pipeline for fast building text classification **TF-IDF + LogReg** baselines.
3
17
4
18
### Usage
5
19
Instead of writing custom code for specific text classification task, you just need:
@@ -8,18 +22,16 @@ Instead of writing custom code for specific text classification task, you just n
8
22
pip install text-classification-baseline
9
23
```
10
24
2. run pipeline:
25
+
- either in **terminal**:
26
+
```shell script
27
+
text-clf-train
28
+
```
29
+
- or in **python**:
30
+
```python3
31
+
import text_clf
11
32
12
-
- either in **terminal**:
13
-
```shell script
14
-
text-clf --config config.yaml
15
-
```
16
-
17
-
- or in**python**:
18
-
```python3
19
-
import text_clf
20
-
21
-
text_clf.train(path_to_config="config.yaml")
22
-
```
33
+
text_clf.train()
34
+
```
23
35
24
36
No data preparation is needed, only a **csv** file with two raw columns (with arbitrary names):
25
37
-`text`
@@ -30,7 +42,17 @@ No data preparation is needed, only a **csv** file with two raw columns (with ar
30
42
#### Config
31
43
The user interface consists of only one file [**config.yaml**](https://github.com/dayyass/text-classification-baseline/blob/main/config.yaml).
32
44
33
-
Change **config.yaml** to create the desired configuration and train text classification model.
45
+
Change **config.yaml** to create the desired configuration and train text classification model with the following command:
46
+
-**terminal**:
47
+
```shell script
48
+
text-clf-train --path_to_config config.yaml
49
+
```
50
+
-**python**:
51
+
```python3
52
+
import text_clf
53
+
54
+
text_clf.train(path_to_config="config.yaml")
55
+
```
34
56
35
57
Default **config.yaml**:
36
58
```yaml
@@ -63,6 +85,8 @@ logreg:
63
85
n_jobs: -1
64
86
```
65
87
88
+
**NOTE**: `tf-idf` and `logreg` are sklearn [**TfidfVectorizer**](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html?highlight=tfidf#sklearn.feature_extraction.text.TfidfVectorizer) and [**LogisticRegression**](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) parameters correspondingly, so you can parameterize instances of these classes however you want.
89
+
66
90
#### Output
67
91
After training the model, the pipeline will return the following files:
68
92
- `model.joblib`- sklearn pipeline with TF-IDF and LogReg steps
@@ -71,7 +95,7 @@ After training the model, the pipeline will return the following files:
71
95
- `logging.txt`- logging file
72
96
73
97
### Requirements
74
-
Python >= 3.7
98
+
Python >= 3.6
75
99
76
100
### Citation
77
101
If you use **text-classification-baseline** in a scientific publication, we would appreciate references to the following BibTex entry:
0 commit comments