Skip to content

Commit c20d250

Browse files
authored
Merge pull request #242 from fastnlp/dev0.5.0
0.5.0 ready to release!
2 parents 04a54df + ddaf6ed commit c20d250

File tree

443 files changed

+24515
-14228
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

443 files changed

+24515
-14228
lines changed

.coverage

+1
Large diffs are not rendered by default.

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,5 @@ caches
1414
.fitlog
1515
logs/
1616
.fitconfig
17+
18+
docs/build

.travis.yml

+3-1
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,13 @@ python:
44
# command to install dependencies
55
install:
66
- pip install --quiet -r requirements.txt
7+
- pip install --quiet fitlog
78
- pip install pytest>=3.6
89
- pip install pytest-cov
910
# command to run tests
1011
script:
11-
- pytest --cov=./ test/
12+
- python -m spacy download en
13+
- pytest --cov=fastNLP test/
1214

1315
after_success:
1416
- bash <(curl -s https://codecov.io/bash)

README.md

+25-18
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@
66
![Hex.pm](https://img.shields.io/hexpm/l/plug.svg)
77
[![Documentation Status](https://readthedocs.org/projects/fastnlp/badge/?version=latest)](http://fastnlp.readthedocs.io/?badge=latest)
88

9-
fastNLP 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个序列标注([NER](reproduction/seqence_labelling/ner)、POS-Tagging等)、中文分词、[文本分类](reproduction/text_classification)[Matching](reproduction/matching)[指代消解](reproduction/coreference_resolution)[摘要](reproduction/Summarization)等任务; 也可以使用它构建许多复杂的网络模型,进行科研。它具有如下的特性:
9+
fastNLP 是一款轻量级的 NLP 工具包。你既可以使用它快速地完成一个序列标注([NER](reproduction/sequence_labelling/ner)、POS-Tagging等)、中文分词、[文本分类](reproduction/text_classification)[Matching](reproduction/matching)[指代消解](reproduction/coreference_resolution)[摘要](reproduction/Summarization)等任务; 也可以使用它快速构建许多复杂的网络模型,进行科研。它具有如下的特性:
1010

11-
- 统一的Tabular式数据容器,让数据预处理过程简洁明了。内置多种数据集的DataSet Loader,省去预处理代码;
11+
- 统一的Tabular式数据容器,让数据预处理过程简洁明了。内置多种数据集的Loader和Pipe,省去预处理代码;
1212
- 多种训练、测试组件,例如训练器Trainer;测试器Tester;以及各种评测metrics等等;
1313
- 各种方便的NLP工具,例如预处理embedding加载(包括ELMo和BERT); 中间数据cache等;
14+
- 部分[数据集与预训练模型](https://docs.qq.com/sheet/DVnpkTnF6VW9UeXdh?c=A1A0A0)的自动下载
1415
- 详尽的中文[文档](https://fastnlp.readthedocs.io/)[教程](https://fastnlp.readthedocs.io/zh/latest/user/tutorials.html)以供查阅;
1516
- 提供诸多高级模块,例如Variational LSTM, Transformer, CRF等;
1617
- 在序列标注、中文分词、文本分类、Matching、指代消解、摘要等任务上封装了各种模型可供直接使用,详细内容见 [reproduction](reproduction) 部分;
@@ -27,6 +28,7 @@ fastNLP 依赖以下包:
2728
+ nltk>=3.4.1
2829
+ requests
2930
+ spacy
31+
+ prettytable>=0.7.2
3032

3133
其中torch的安装可能与操作系统及 CUDA 的版本相关,请参见 [PyTorch 官网](https://pytorch.org/)
3234
在依赖包安装完成后,您可以在命令行执行如下指令完成安装
@@ -36,24 +38,30 @@ pip install fastNLP
3638
python -m spacy download en
3739
```
3840

39-
目前使用pip安装fastNLP的版本是0.4.1,有较多功能仍未更新,最新内容以master分支为准。
40-
fastNLP0.5.0版本将在近期推出,请密切关注。
41-
4241

4342
## fastNLP教程
4443

44+
### 快速入门
45+
4546
- [0. 快速入门](https://fastnlp.readthedocs.io/zh/latest/user/quickstart.html)
47+
48+
### 详细使用教程
49+
4650
- [1. 使用DataSet预处理文本](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_1_data_preprocess.html)
47-
- [2. 使用DataSetLoader加载数据集](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_2_load_dataset.html)
51+
- [2. 使用Vocabulary转换文本与index](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_2_vocabulary.html)
4852
- [3. 使用Embedding模块将文本转成向量](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_3_embedding.html)
49-
- [4. 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_4_loss_optimizer.html)
50-
- [5. 动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_5_datasetiter.html)
51-
- [6. 快速实现序列标注模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_6_seq_labeling.html)
52-
- [7. 使用Modules和Models快速搭建自定义模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_7_modules_models.html)
53-
- [8. 使用Metric快速评测你的模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_8_metrics.html)
54-
- [9. 使用Callback自定义你的训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_9_callback.html)
55-
- [10. 使用fitlog 辅助 fastNLP 进行科研](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_10_fitlog.html)
53+
- [4. 使用Loader和Pipe加载并处理数据集](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_4_load_dataset.html)
54+
- [5. 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_5_loss_optimizer.html)
55+
- [6. 动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_6_datasetiter.html)
56+
- [7. 使用Metric快速评测你的模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_7_metrics.html)
57+
- [8. 使用Modules和Models快速搭建自定义模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_8_modules_models.html)
58+
- [9. 快速实现序列标注模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_9_seq_labeling.html)
59+
- [10. 使用Callback自定义你的训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_10_callback.html)
60+
61+
### 扩展教程
5662

63+
- [Extend-1. BertEmbedding的各种用法](https://fastnlp.readthedocs.io/zh/latest/tutorials/extend_1_bert_embedding.html)
64+
- [Extend-2. 使用fitlog 辅助 fastNLP 进行科研](https://fastnlp.readthedocs.io/zh/latest/tutorials/extend_2_fitlog.html)
5765

5866

5967
## 内置组件
@@ -79,19 +87,19 @@ fastNLP 在 embeddings 模块中内置了几种不同的embedding:静态embedd
7987
<tr>
8088
<td> encoder </td>
8189
<td> 将输入编码为具有具有表示能力的向量 </td>
82-
<td> embedding, RNN, CNN, transformer
90+
<td> Embedding, RNN, CNN, Transformer, ...
8391
</tr>
8492
<tr>
8593
<td> decoder </td>
8694
<td> 将具有某种表示意义的向量解码为需要的输出形式 </td>
87-
<td> MLP, CRF </td>
95+
<td> MLP, CRF, ... </td>
8896
</tr>
8997
</table>
9098

9199

92100
## 项目结构
93101

94-
<img src="./docs/source/figures/workflow.png" width="60%" height="60%">
102+
![](./docs/source/figures/workflow.png)
95103

96104
fastNLP的大致工作流程如上图所示,而项目结构如下:
97105

@@ -118,11 +126,10 @@ fastNLP的大致工作流程如上图所示,而项目结构如下:
118126
</tr>
119127
<tr>
120128
<td><b> fastNLP.io </b></td>
121-
<td> 实现了读写功能,包括数据读入,模型读写等 </td>
129+
<td> 实现了读写功能,包括数据读入与预处理,模型读写,数据与模型自动下载等 </td>
122130
</tr>
123131
</table>
124132

125-
126133
<hr>
127134

128135
*In memory of @FengZiYjun. May his soul rest in peace. We will miss you very very much!*

docs/Makefile

+2-2
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ help:
1414
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
1515

1616
apidoc:
17-
$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ)
17+
$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ) && python3 format.py
1818

1919
server:
2020
cd build/html && python -m http.server
2121

2222
dev:
23-
rm -rf build/html && make html && make server
23+
rm -rf build && make html && make server
2424

2525
.PHONY: help Makefile
2626

docs/README.md

-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@ Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
3232

3333
我们在[这里](./source/user/example.rst)列举了fastNLP文档经常用到的reStructuredText语法(网页查看请结合Raw模式),
3434
您可以通过阅读它进行快速上手。FastNLP大部分的文档都是写在代码中通过Sphinx工具进行抽取生成的,
35-
您还可以参考这篇[未完成的文章](./source/user/docs_in_code.rst)了解代码内文档编写的规范。
3635

3736
## 文档维护人员
3837

docs/count.py

+158
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
import inspect
2+
import os
3+
import sys
4+
5+
6+
def _colored_string(string: str, color: str or int) -> str:
7+
"""在终端中显示一串有颜色的文字
8+
:param string: 在终端中显示的文字
9+
:param color: 文字的颜色
10+
:return:
11+
"""
12+
if isinstance(color, str):
13+
color = {
14+
"black": 30, "Black": 30, "BLACK": 30,
15+
"red": 31, "Red": 31, "RED": 31,
16+
"green": 32, "Green": 32, "GREEN": 32,
17+
"yellow": 33, "Yellow": 33, "YELLOW": 33,
18+
"blue": 34, "Blue": 34, "BLUE": 34,
19+
"purple": 35, "Purple": 35, "PURPLE": 35,
20+
"cyan": 36, "Cyan": 36, "CYAN": 36,
21+
"white": 37, "White": 37, "WHITE": 37
22+
}[color]
23+
return "\033[%dm%s\033[0m" % (color, string)
24+
25+
26+
def gr(string, flag):
27+
if flag:
28+
return _colored_string(string, "green")
29+
else:
30+
return _colored_string(string, "red")
31+
32+
33+
def find_all_modules():
34+
modules = {}
35+
children = {}
36+
to_doc = set()
37+
root = '../fastNLP'
38+
for path, dirs, files in os.walk(root):
39+
for file in files:
40+
if file.endswith('.py'):
41+
name = ".".join(path.split('/')[1:])
42+
if file.split('.')[0] != "__init__":
43+
name = name + '.' + file.split('.')[0]
44+
__import__(name)
45+
m = sys.modules[name]
46+
modules[name] = m
47+
try:
48+
m.__all__
49+
except:
50+
print(name, "__all__ missing")
51+
continue
52+
if m.__doc__ is None:
53+
print(name, "__doc__ missing")
54+
continue
55+
if "undocumented" not in m.__doc__:
56+
to_doc.add(name)
57+
for module in to_doc:
58+
t = ".".join(module.split('.')[:-1])
59+
if t in to_doc:
60+
if t not in children:
61+
children[t] = set()
62+
children[t].add(module)
63+
for m in children:
64+
children[m] = sorted(children[m])
65+
return modules, to_doc, children
66+
67+
68+
def create_rst_file(modules, name, children):
69+
m = modules[name]
70+
with open("./source/" + name + ".rst", "w") as fout:
71+
t = "=" * len(name)
72+
fout.write(name + "\n")
73+
fout.write(t + "\n")
74+
fout.write("\n")
75+
fout.write(".. automodule:: " + name + "\n")
76+
if name != "fastNLP.core" and len(m.__all__) > 0:
77+
fout.write(" :members: " + ", ".join(m.__all__) + "\n")
78+
short = name[len("fastNLP."):]
79+
if not (short.startswith('models') or short.startswith('modules') or short.startswith('embeddings')):
80+
fout.write(" :inherited-members:\n")
81+
fout.write("\n")
82+
if name in children:
83+
fout.write("子模块\n------\n\n.. toctree::\n :maxdepth: 1\n\n")
84+
for module in children[name]:
85+
fout.write(" " + module + "\n")
86+
87+
88+
def check_file(m, name):
89+
names = name.split('.')
90+
test_name = "test." + ".".join(names[1:-1]) + ".test_" + names[-1]
91+
try:
92+
__import__(test_name)
93+
tm = sys.modules[test_name]
94+
except ModuleNotFoundError:
95+
tm = None
96+
tested = tm is not None
97+
funcs = {}
98+
classes = {}
99+
for item, obj in inspect.getmembers(m):
100+
if inspect.isclass(obj) and obj.__module__ == name and not obj.__name__.startswith('_'):
101+
this = (obj.__doc__ is not None, tested and obj.__name__ in dir(tm), {})
102+
for i in dir(obj):
103+
func = getattr(obj, i)
104+
if inspect.isfunction(func) and not i.startswith('_'):
105+
this[2][i] = (func.__doc__ is not None, False)
106+
classes[obj.__name__] = this
107+
if inspect.isfunction(obj) and obj.__module__ == name and not obj.__name__.startswith('_'):
108+
this = (obj.__doc__ is not None, tested and obj.__name__ in dir(tm)) # docs
109+
funcs[obj.__name__] = this
110+
return funcs, classes
111+
112+
113+
def check_files(modules, out=None):
114+
for name in sorted(modules.keys()):
115+
print(name, file=out)
116+
funcs, classes = check_file(modules[name], name)
117+
if out is None:
118+
for f in funcs:
119+
print("%-30s \t %s \t %s" % (f, gr("文档", funcs[f][0]), gr("测试", funcs[f][1])))
120+
for c in classes:
121+
print("%-30s \t %s \t %s" % (c, gr("文档", classes[c][0]), gr("测试", classes[c][1])))
122+
methods = classes[c][2]
123+
for f in methods:
124+
print(" %-28s \t %s" % (f, gr("文档", methods[f][0])))
125+
else:
126+
for f in funcs:
127+
if not funcs[f][0]:
128+
print("缺少文档 %s" % (f), file=out)
129+
if not funcs[f][1]:
130+
print("缺少测试 %s" % (f), file=out)
131+
for c in classes:
132+
if not classes[c][0]:
133+
print("缺少文档 %s" % (c), file=out)
134+
if not classes[c][1]:
135+
print("缺少测试 %s" % (c), file=out)
136+
methods = classes[c][2]
137+
for f in methods:
138+
if not methods[f][0]:
139+
print("缺少文档 %s" % (c + "." + f), file=out)
140+
print(file=out)
141+
142+
143+
def main():
144+
sys.path.append("..")
145+
print(_colored_string('Getting modules...', "Blue"))
146+
modules, to_doc, children = find_all_modules()
147+
print(_colored_string('Done!', "Green"))
148+
print(_colored_string('Creating rst files...', "Blue"))
149+
for name in to_doc:
150+
create_rst_file(modules, name, children)
151+
print(_colored_string('Done!', "Green"))
152+
print(_colored_string('Checking all files...', "Blue"))
153+
check_files(modules, out=open("results.txt", "w"))
154+
print(_colored_string('Done!', "Green"))
155+
156+
157+
if __name__ == "__main__":
158+
main()

docs/source/conf.py

+11-7
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,9 @@
2424
author = 'xpqiu'
2525

2626
# The short X.Y version
27-
version = '0.4.5'
27+
version = '0.5.0'
2828
# The full version, including alpha/beta/rc tags
29-
release = '0.4.5'
29+
release = '0.5.0'
3030

3131
# -- General configuration ---------------------------------------------------
3232

@@ -48,12 +48,14 @@
4848
autodoc_default_options = {
4949
'member-order': 'bysource',
5050
'special-members': '__init__',
51-
'undoc-members': True,
51+
'undoc-members': False,
5252
}
5353

54+
autoclass_content = "class"
55+
5456
# Add any paths that contain templates here, relative to this directory.
5557
templates_path = ['_templates']
56-
58+
# template_bridge
5759
# The suffix(es) of source filenames.
5860
# You can specify multiple suffix as a list of string:
5961
#
@@ -113,7 +115,7 @@
113115
# -- Options for HTMLHelp output ---------------------------------------------
114116

115117
# Output file base name for HTML help builder.
116-
htmlhelp_basename = 'fastNLPdoc'
118+
htmlhelp_basename = 'fastNLP doc'
117119

118120
# -- Options for LaTeX output ------------------------------------------------
119121

@@ -166,10 +168,12 @@
166168

167169
# -- Extension configuration -------------------------------------------------
168170
def maybe_skip_member(app, what, name, obj, skip, options):
169-
if name.startswith("_"):
170-
return True
171171
if obj.__doc__ is None:
172172
return True
173+
if name == "__init__":
174+
return False
175+
if name.startswith("_"):
176+
return True
173177
return False
174178

175179

docs/source/fastNLP.core.batch.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@ fastNLP.core.batch
22
==================
33

44
.. automodule:: fastNLP.core.batch
5-
:members:
6-
:undoc-members:
7-
:show-inheritance:
5+
:members: BatchIter, DataSetIter, TorchLoaderIter
6+
:inherited-members:
7+

docs/source/fastNLP.core.callback.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@ fastNLP.core.callback
22
=====================
33

44
.. automodule:: fastNLP.core.callback
5-
:members:
6-
:undoc-members:
7-
:show-inheritance:
5+
:members: Callback, GradientClipCallback, EarlyStopCallback, FitlogCallback, EvaluateCallback, LRScheduler, ControlC, LRFinder, TensorboardCallback, WarmupCallback, SaveModelCallback, CallbackException, EarlyStopError
6+
:inherited-members:
7+

docs/source/fastNLP.core.const.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@ fastNLP.core.const
22
==================
33

44
.. automodule:: fastNLP.core.const
5-
:members:
6-
:undoc-members:
7-
:show-inheritance:
5+
:members: Const
6+
:inherited-members:
7+

docs/source/fastNLP.core.dataset.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@ fastNLP.core.dataset
22
====================
33

44
.. automodule:: fastNLP.core.dataset
5-
:members:
6-
:undoc-members:
7-
:show-inheritance:
5+
:members: DataSet
6+
:inherited-members:
7+

0 commit comments

Comments
 (0)