fastnlp
diff --git a/‎.coverage
Lines changed: 1 addition & 0 deletions b/‎.coverage
Lines changed: 1 addition & 0 deletions
diff --git a/‎.gitignore
Lines changed: 2 additions & 0 deletions b/‎.gitignore
Lines changed: 2 additions & 0 deletions
diff --git a/‎.travis.yml
Lines changed: 3 additions & 1 deletion b/‎.travis.yml
Lines changed: 3 additions & 1 deletion
diff --git a/‎README.md
Lines changed: 25 additions & 18 deletions b/‎README.md
Lines changed: 25 additions & 18 deletions
diff --git a/‎docs/Makefile
Lines changed: 2 additions & 2 deletions b/‎docs/Makefile
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/README.md
Lines changed: 0 additions & 1 deletion b/‎docs/README.md
Lines changed: 0 additions & 1 deletion
diff --git a/‎docs/count.py
Lines changed: 158 additions & 0 deletions b/‎docs/count.py
Lines changed: 158 additions & 0 deletions
diff --git a/‎docs/source/conf.py
Lines changed: 11 additions & 7 deletions b/‎docs/source/conf.py
Lines changed: 11 additions & 7 deletions
diff --git a/‎docs/source/fastNLP.core.batch.rst
Lines changed: 3 additions & 3 deletions b/‎docs/source/fastNLP.core.batch.rst
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/fastNLP.core.callback.rst
Lines changed: 3 additions & 3 deletions b/‎docs/source/fastNLP.core.callback.rst
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/fastNLP.core.const.rst
Lines changed: 3 additions & 3 deletions b/‎docs/source/fastNLP.core.const.rst
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/source/fastNLP.core.dataset.rst
Lines changed: 3 additions & 3 deletions b/‎docs/source/fastNLP.core.dataset.rst
Lines changed: 3 additions & 3 deletions
@@ -14,3 +14,5 @@ caches
 .fitlog
 logs/
 .fitconfig
+
+docs/build
@@ -4,11 +4,13 @@ python:
 # command to install dependencies
 install:
   - pip install --quiet -r requirements.txt
+  - pip install --quiet fitlog
   - pip install pytest>=3.6
   - pip install pytest-cov
 # command to run tests
 script:
-  - pytest --cov=./ test/
+  - python -m spacy download en
+  - pytest --cov=fastNLP test/
 
 after_success:
   - bash <(curl -s https://codecov.io/bash)
@@ -6,11 +6,12 @@
 ![Hex.pm](https://img.shields.io/hexpm/l/plug.svg)
 [![Documentation Status](https://readthedocs.org/projects/fastnlp/badge/?version=latest)](http://fastnlp.readthedocs.io/?badge=latest)
 
-fastNLP 是一款轻量级的 NLP 处理套件。你既可以使用它快速地完成一个序列标注（[NER](reproduction/seqence_labelling/ner)、POS-Tagging等）、中文分词、[文本分类](reproduction/text_classification)、[Matching](reproduction/matching)、[指代消解](reproduction/coreference_resolution)、[摘要](reproduction/Summarization)等任务； 也可以使用它构建许多复杂的网络模型，进行科研。它具有如下的特性：
+fastNLP 是一款轻量级的 NLP 工具包。你既可以使用它快速地完成一个序列标注（[NER](reproduction/sequence_labelling/ner)、POS-Tagging等）、中文分词、[文本分类](reproduction/text_classification)、[Matching](reproduction/matching)、[指代消解](reproduction/coreference_resolution)、[摘要](reproduction/Summarization)等任务； 也可以使用它快速构建许多复杂的网络模型，进行科研。它具有如下的特性：
 
-- 统一的Tabular式数据容器，让数据预处理过程简洁明了。内置多种数据集的DataSet Loader，省去预处理代码;
+- 统一的Tabular式数据容器，让数据预处理过程简洁明了。内置多种数据集的Loader和Pipe，省去预处理代码;
 - 多种训练、测试组件，例如训练器Trainer；测试器Tester；以及各种评测metrics等等;
 - 各种方便的NLP工具，例如预处理embedding加载（包括ELMo和BERT）; 中间数据cache等;
+- 部分[数据集与预训练模型](https://docs.qq.com/sheet/DVnpkTnF6VW9UeXdh?c=A1A0A0)的自动下载
 - 详尽的中文[文档](https://fastnlp.readthedocs.io/)、[教程](https://fastnlp.readthedocs.io/zh/latest/user/tutorials.html)以供查阅;
 - 提供诸多高级模块，例如Variational LSTM, Transformer, CRF等;
 - 在序列标注、中文分词、文本分类、Matching、指代消解、摘要等任务上封装了各种模型可供直接使用，详细内容见 [reproduction](reproduction) 部分;
@@ -27,6 +28,7 @@ fastNLP 依赖以下包:
 + nltk>=3.4.1
 + requests
 + spacy
++ prettytable>=0.7.2
 
 其中torch的安装可能与操作系统及 CUDA 的版本相关，请参见 [PyTorch 官网](https://pytorch.org/) 。 
 在依赖包安装完成后，您可以在命令行执行如下指令完成安装
@@ -36,24 +38,30 @@ pip install fastNLP
 python -m spacy download en
 ```
 
-目前使用pip安装fastNLP的版本是0.4.1，有较多功能仍未更新，最新内容以master分支为准。
-fastNLP0.5.0版本将在近期推出，请密切关注。
-
 
 ## fastNLP教程
 
+### 快速入门
+
 - [0. 快速入门](https://fastnlp.readthedocs.io/zh/latest/user/quickstart.html)
+
+### 详细使用教程
+
 - [1. 使用DataSet预处理文本](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_1_data_preprocess.html)
-- [2. 使用DataSetLoader加载数据集](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_2_load_dataset.html)
+- [2. 使用Vocabulary转换文本与index](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_2_vocabulary.html)
 - [3. 使用Embedding模块将文本转成向量](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_3_embedding.html)
-- [4. 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_4_loss_optimizer.html)
-- [5. 动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_5_datasetiter.html)
-- [6. 快速实现序列标注模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_6_seq_labeling.html)
-- [7. 使用Modules和Models快速搭建自定义模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_7_modules_models.html)
-- [8. 使用Metric快速评测你的模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_8_metrics.html)
-- [9. 使用Callback自定义你的训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_9_callback.html)
-- [10. 使用fitlog 辅助 fastNLP 进行科研](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_10_fitlog.html)
+- [4. 使用Loader和Pipe加载并处理数据集](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_4_load_dataset.html)
+- [5. 动手实现一个文本分类器I-使用Trainer和Tester快速训练和测试](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_5_loss_optimizer.html)
+- [6. 动手实现一个文本分类器II-使用DataSetIter实现自定义训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_6_datasetiter.html)
+- [7. 使用Metric快速评测你的模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_7_metrics.html)
+- [8. 使用Modules和Models快速搭建自定义模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_8_modules_models.html)
+- [9. 快速实现序列标注模型](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_9_seq_labeling.html)
+- [10. 使用Callback自定义你的训练过程](https://fastnlp.readthedocs.io/zh/latest/tutorials/tutorial_10_callback.html)
+
+### 扩展教程
 
+- [Extend-1. BertEmbedding的各种用法](https://fastnlp.readthedocs.io/zh/latest/tutorials/extend_1_bert_embedding.html)
+- [Extend-2. 使用fitlog 辅助 fastNLP 进行科研](https://fastnlp.readthedocs.io/zh/latest/tutorials/extend_2_fitlog.html)
 
 
 ## 内置组件
@@ -79,19 +87,19 @@ fastNLP 在 embeddings 模块中内置了几种不同的embedding：静态embedd
 <tr>
     <td> encoder </td>
     <td> 将输入编码为具有具有表示能力的向量 </td>
-    <td> embedding, RNN, CNN, transformer
+    <td> Embedding, RNN, CNN, Transformer, ...
 </tr>
 <tr>
     <td> decoder </td>
     <td> 将具有某种表示意义的向量解码为需要的输出形式 </td>
-    <td> MLP, CRF </td>
+    <td> MLP, CRF, ... </td>
 </tr>
 </table>
 
 
 ## 项目结构
 
-<img src="./docs/source/figures/workflow.png" width="60%" height="60%">
+![](./docs/source/figures/workflow.png)
 
 fastNLP的大致工作流程如上图所示，而项目结构如下：
 
@@ -118,11 +126,10 @@ fastNLP的大致工作流程如上图所示，而项目结构如下：
 </tr>
 <tr>
     <td><b> fastNLP.io </b></td>
-    <td> 实现了读写功能，包括数据读入，模型读写等 </td>
+    <td> 实现了读写功能，包括数据读入与预处理，模型读写，数据与模型自动下载等 </td>
 </tr>
 </table>
 
-
 <hr>
 
 *In memory of @FengZiYjun.  May his soul rest in peace. We will miss you very very much!*
@@ -14,13 +14,13 @@ help:
 	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
 
 apidoc:
-	$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ)
+	$(SPHINXAPIDOC) -efM -o source ../$(SPHINXPROJ) && python3 format.py
 
 server:
 	cd build/html && python -m http.server
 
 dev:
-	rm -rf build/html && make html && make server
+	rm -rf build && make html && make server
 
 .PHONY: help Makefile
 
 
@@ -32,7 +32,6 @@ Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...
 
 我们在[这里](./source/user/example.rst)列举了fastNLP文档经常用到的reStructuredText语法（网页查看请结合Raw模式），
 您可以通过阅读它进行快速上手。FastNLP大部分的文档都是写在代码中通过Sphinx工具进行抽取生成的，
-您还可以参考这篇[未完成的文章](./source/user/docs_in_code.rst)了解代码内文档编写的规范。
 
 ## 文档维护人员
 
 
@@ -0,0 +1,158 @@
+import inspect
+import os
+import sys
+
+
+def _colored_string(string: str, color: str or int) -> str:
+    """在终端中显示一串有颜色的文字
+    :param string: 在终端中显示的文字
+    :param color: 文字的颜色
+    :return:
+    """
+    if isinstance(color, str):
+        color = {
+            "black": 30, "Black": 30, "BLACK": 30,
+            "red": 31, "Red": 31, "RED": 31,
+            "green": 32, "Green": 32, "GREEN": 32,
+            "yellow": 33, "Yellow": 33, "YELLOW": 33,
+            "blue": 34, "Blue": 34, "BLUE": 34,
+            "purple": 35, "Purple": 35, "PURPLE": 35,
+            "cyan": 36, "Cyan": 36, "CYAN": 36,
+            "white": 37, "White": 37, "WHITE": 37
+        }[color]
+    return "\033[%dm%s\033[0m" % (color, string)
+
+
+def gr(string, flag):
+    if flag:
+        return _colored_string(string, "green")
+    else:
+        return _colored_string(string, "red")
+
+
+def find_all_modules():
+    modules = {}
+    children = {}
+    to_doc = set()
+    root = '../fastNLP'
+    for path, dirs, files in os.walk(root):
+        for file in files:
+            if file.endswith('.py'):
+                name = ".".join(path.split('/')[1:])
+                if file.split('.')[0] != "__init__":
+                    name = name + '.' + file.split('.')[0]
+                __import__(name)
+                m = sys.modules[name]
+                modules[name] = m
+                try:
+                    m.__all__
+                except:
+                    print(name, "__all__ missing")
+                    continue
+                if m.__doc__ is None:
+                    print(name, "__doc__ missing")
+                    continue
+                if "undocumented" not in m.__doc__:
+                    to_doc.add(name)
+    for module in to_doc:
+        t = ".".join(module.split('.')[:-1])
+        if t in to_doc:
+            if t not in children:
+                children[t] = set()
+            children[t].add(module)
+    for m in children:
+        children[m] = sorted(children[m])
+    return modules, to_doc, children
+
+
+def create_rst_file(modules, name, children):
+    m = modules[name]
+    with open("./source/" + name + ".rst", "w") as fout:
+        t = "=" * len(name)
+        fout.write(name + "\n")
+        fout.write(t + "\n")
+        fout.write("\n")
+        fout.write(".. automodule:: " + name + "\n")
+        if name != "fastNLP.core" and len(m.__all__) > 0:
+            fout.write("   :members: " + ", ".join(m.__all__) + "\n")
+            short = name[len("fastNLP."):]
+            if not (short.startswith('models') or short.startswith('modules') or short.startswith('embeddings')):
+                fout.write("   :inherited-members:\n")
+        fout.write("\n")
+        if name in children:
+            fout.write("子模块\n------\n\n.. toctree::\n   :maxdepth: 1\n\n")
+            for module in children[name]:
+                fout.write("   " + module + "\n")
+
+
+def check_file(m, name):
+    names = name.split('.')
+    test_name = "test." + ".".join(names[1:-1]) + ".test_" + names[-1]
+    try:
+        __import__(test_name)
+        tm = sys.modules[test_name]
+    except ModuleNotFoundError:
+        tm = None
+    tested = tm is not None
+    funcs = {}
+    classes = {}
+    for item, obj in inspect.getmembers(m):
+        if inspect.isclass(obj) and obj.__module__ == name and not obj.__name__.startswith('_'):
+            this = (obj.__doc__ is not None, tested and obj.__name__ in dir(tm), {})
+            for i in dir(obj):
+                func = getattr(obj, i)
+                if inspect.isfunction(func) and not i.startswith('_'):
+                    this[2][i] = (func.__doc__ is not None, False)
+            classes[obj.__name__] = this
+        if inspect.isfunction(obj) and obj.__module__ == name and not obj.__name__.startswith('_'):
+            this = (obj.__doc__ is not None, tested and obj.__name__ in dir(tm))  # docs
+            funcs[obj.__name__] = this
+    return funcs, classes
+
+
+def check_files(modules, out=None):
+    for name in sorted(modules.keys()):
+        print(name, file=out)
+        funcs, classes = check_file(modules[name], name)
+        if out is None:
+            for f in funcs:
+                print("%-30s \t %s \t %s" % (f, gr("文档", funcs[f][0]), gr("测试", funcs[f][1])))
+            for c in classes:
+                print("%-30s \t %s \t %s" % (c, gr("文档", classes[c][0]), gr("测试", classes[c][1])))
+                methods = classes[c][2]
+                for f in methods:
+                    print("  %-28s \t %s" % (f, gr("文档", methods[f][0])))
+        else:
+            for f in funcs:
+                if not funcs[f][0]:
+                    print("缺少文档 %s" % (f), file=out)
+                if not funcs[f][1]:
+                    print("缺少测试 %s" % (f), file=out)
+            for c in classes:
+                if not classes[c][0]:
+                    print("缺少文档 %s" % (c), file=out)
+                if not classes[c][1]:
+                    print("缺少测试 %s" % (c), file=out)
+                methods = classes[c][2]
+                for f in methods:
+                    if not methods[f][0]:
+                        print("缺少文档 %s" % (c + "." + f), file=out)
+            print(file=out)
+
+
+def main():
+    sys.path.append("..")
+    print(_colored_string('Getting modules...', "Blue"))
+    modules, to_doc, children = find_all_modules()
+    print(_colored_string('Done!', "Green"))
+    print(_colored_string('Creating rst files...', "Blue"))
+    for name in to_doc:
+        create_rst_file(modules, name, children)
+    print(_colored_string('Done!', "Green"))
+    print(_colored_string('Checking all files...', "Blue"))
+    check_files(modules, out=open("results.txt", "w"))
+    print(_colored_string('Done!', "Green"))
+
+
+if __name__ == "__main__":
+    main()
@@ -24,9 +24,9 @@
 author = 'xpqiu'
 
 # The short X.Y version
-version = '0.4.5'
+version = '0.5.0'
 # The full version, including alpha/beta/rc tags
-release = '0.4.5'
+release = '0.5.0'
 
 # -- General configuration ---------------------------------------------------
 
@@ -48,12 +48,14 @@
 autodoc_default_options = {
     'member-order': 'bysource',
     'special-members': '__init__',
-    'undoc-members': True,
+    'undoc-members': False,
 }
 
+autoclass_content = "class"
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
-
+# template_bridge
 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 #
@@ -113,7 +115,7 @@
 # -- Options for HTMLHelp output ---------------------------------------------
 
 # Output file base name for HTML help builder.
-htmlhelp_basename = 'fastNLPdoc'
+htmlhelp_basename = 'fastNLP doc'
 
 # -- Options for LaTeX output ------------------------------------------------
 
@@ -166,10 +168,12 @@
 
 # -- Extension configuration -------------------------------------------------
 def maybe_skip_member(app, what, name, obj, skip, options):
-    if name.startswith("_"):
-        return True
     if obj.__doc__ is None:
         return True
+    if name == "__init__":
+        return False
+    if name.startswith("_"):
+        return True
     return False
 
 
 
@@ -2,6 +2,6 @@ fastNLP.core.batch
 ==================
 
 .. automodule:: fastNLP.core.batch
-   :members:
-   :undoc-members:
-   :show-inheritance:
+   :members: BatchIter, DataSetIter, TorchLoaderIter
+   :inherited-members:
+
@@ -2,6 +2,6 @@ fastNLP.core.callback
 =====================
 
 .. automodule:: fastNLP.core.callback
-   :members:
-   :undoc-members:
-   :show-inheritance:
+   :members: Callback, GradientClipCallback, EarlyStopCallback, FitlogCallback, EvaluateCallback, LRScheduler, ControlC, LRFinder, TensorboardCallback, WarmupCallback, SaveModelCallback, CallbackException, EarlyStopError
+   :inherited-members:
+
@@ -2,6 +2,6 @@ fastNLP.core.const
 ==================
 
 .. automodule:: fastNLP.core.const
-   :members:
-   :undoc-members:
-   :show-inheritance:
+   :members: Const
+   :inherited-members:
+
@@ -2,6 +2,6 @@ fastNLP.core.dataset
 ====================
 
 .. automodule:: fastNLP.core.dataset
-   :members:
-   :undoc-members:
-   :show-inheritance:
+   :members: DataSet
+   :inherited-members:
+