Skip to content

Commit be9b3ee

Browse files
committed
修改tutorial中的错误
1 parent 0b401e2 commit be9b3ee

File tree

1 file changed

+16
-74
lines changed

1 file changed

+16
-74
lines changed

docs/source/quickstart/文本分类.rst

+16-74
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
88
1, 商务大床房,房间很大,床有2M宽,整体感觉经济实惠不错!
99
10-
其中开头的1是只这条评论的标签,表示是正面的情绪。我们将使用到的数据可以通过 `此链接 <http://dbcloud.irocn.cn:8989/api/public/dl/dataset/chn\_senti\_corp.zip>`_
10+
其中开头的1是只这条评论的标签,表示是正面的情绪。我们将使用到的数据可以通过 `此链接 <http://212.129.155.247/dataset/chn_senti_corp.zip>`_
1111
下载并解压,当然也可以通过fastNLP自动下载该数据。
1212

1313
数据中的内容如下图所示。接下来,我们将用fastNLP在这个数据上训练一个分类网络。
@@ -73,11 +73,12 @@ DataBundle的相关介绍,可以参考 :class:`~fastNLP.io.DataBundle` 。我
7373
7474
.. code-block:: text
7575
76-
DataSet({'raw_chars': 选择珠江花园的原因就是方便,有电动扶梯直接到达海边,周围餐馆、食廊、商场、超市、摊位一应俱全。酒店装修一般,但还算整洁。 泳池在大堂的屋顶,因此很小,不过女儿倒是喜欢。 包的早餐是西式的,还算丰富。 服务吗,一般 type=str,
77-
'target': 1 type=str},
78-
{'raw_chars': 15.4寸笔记本的键盘确实爽,基本跟台式机差不多了,蛮喜欢数字小键盘,输数字特方便,样子也很美观,做工也相当不错 type=str,
79-
'target': 1 type=str})
80-
76+
+-----------------------------+--------+
77+
| raw_chars | target |
78+
+-----------------------------+--------+
79+
| 选择珠江花园的原因就是方... | 1 |
80+
| 15.4寸笔记本的键盘确实爽... | 1 |
81+
+-----------------------------+--------+
8182
8283
(2) 预处理数据
8384
~~~~~~~~~~~~~~~~~~~~
@@ -121,14 +122,12 @@ fastNLP中也提供了多种数据集的处理类,这里我们直接使用fast
121122
122123
.. code-block:: text
123124
124-
DataSet({'raw_chars': 选择珠江花园的原因就是方便,有电动扶梯直接到达海边,周围餐馆、食廊、商场、超市、摊位一应俱全。酒店装修一般,但还算整洁。 泳池在大堂的屋顶,因此很小,不过女儿倒是喜欢。 包的早餐是西式的,还算丰富。 服务吗,一般 type=str,
125-
'target': 1 type=int,
126-
'chars': [338, 464, 1400, 784, 468, 739, 3, 289, 151, 21, 5, 88, 143, 2, 9, 81, 134, 2573, 766, 233, 196, 23, 536, 342, 297, 2, 405, 698, 132, 281, 74, 744, 1048, 74, 420, 387, 74, 412, 433, 74, 2021, 180, 8, 219, 1929, 213, 4, 34, 31, 96, 363, 8, 230, 2, 66, 18, 229, 331, 768, 4, 11, 1094, 479, 17, 35, 593, 3, 1126, 967, 2, 151, 245, 12, 44, 2, 6, 52, 260, 263, 635, 5, 152, 162, 4, 11, 336, 3, 154, 132, 5, 236, 443, 3, 2, 18, 229, 761, 700, 4, 11, 48, 59, 653, 2, 8, 230] type=list,
127-
'seq_len': 106 type=int},
128-
{'raw_chars': 15.4寸笔记本的键盘确实爽,基本跟台式机差不多了,蛮喜欢数字小键盘,输数字特方便,样子也很美观,做工也相当不错 type=str,
129-
'target': 1 type=int,
130-
'chars': [50, 133, 20, 135, 945, 520, 343, 24, 3, 301, 176, 350, 86, 785, 2, 456, 24, 461, 163, 443, 128, 109, 6, 47, 7, 2, 916, 152, 162, 524, 296, 44, 301, 176, 2, 1384, 524, 296, 259, 88, 143, 2, 92, 67, 26, 12, 277, 269, 2, 188, 223, 26, 228, 83, 6, 63] type=list,
131-
'seq_len': 56 type=int})
125+
+-----------------+--------+-----------------+---------+
126+
| raw_chars | target | chars | seq_len |
127+
+-----------------+--------+-----------------+---------+
128+
| 选择珠江花园... | 0 | [338, 464, 1... | 106 |
129+
| 15.4寸笔记本... | 0 | [50, 133, 20... | 56 |
130+
+-----------------+--------+-----------------+---------+
132131
133132
134133
新增了一列为数字列表的chars,以及变为数字的target列。可以看出这两列的名称和刚好与data\_bundle中两个Vocabulary的名称是一致的,我们可以打印一下Vocabulary看一下里面的内容。
@@ -183,11 +182,6 @@ fastNLP支持使用名字指定的Embedding以及相关说明可以参见 :mod:`
183182
(4) 创建模型
184183
~~~~~~~~~~~~
185184

186-
这里我们使用到的模型结构如下所示
187-
188-
.. todo::
189-
补图
190-
191185
.. code-block:: python
192186
193187
from torch import nn
@@ -261,64 +255,24 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
261255
Evaluate data in 0.01 seconds!
262256
training epochs started 2019-09-03-23-57-10
263257
264-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=3000), HTML(value='')), layout=Layout(display…
265-
266-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
267-
268258
Evaluate data in 0.43 seconds!
269259
Evaluation on dev at Epoch 1/10. Step:300/3000:
270260
AccuracyMetric: acc=0.81
271261
272-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
273-
274262
Evaluate data in 0.44 seconds!
275263
Evaluation on dev at Epoch 2/10. Step:600/3000:
276264
AccuracyMetric: acc=0.8675
277265
278-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
279-
280266
Evaluate data in 0.44 seconds!
281267
Evaluation on dev at Epoch 3/10. Step:900/3000:
282268
AccuracyMetric: acc=0.878333
283269
284-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
285-
286-
Evaluate data in 0.43 seconds!
287-
Evaluation on dev at Epoch 4/10. Step:1200/3000:
288-
AccuracyMetric: acc=0.873333
289-
290-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
291-
292-
Evaluate data in 0.44 seconds!
293-
Evaluation on dev at Epoch 5/10. Step:1500/3000:
294-
AccuracyMetric: acc=0.878333
295-
296-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
297-
298-
Evaluate data in 0.42 seconds!
299-
Evaluation on dev at Epoch 6/10. Step:1800/3000:
300-
AccuracyMetric: acc=0.895833
301-
302-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
303-
304-
Evaluate data in 0.44 seconds!
305-
Evaluation on dev at Epoch 7/10. Step:2100/3000:
306-
AccuracyMetric: acc=0.8975
307-
308-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
309-
310-
Evaluate data in 0.43 seconds!
311-
Evaluation on dev at Epoch 8/10. Step:2400/3000:
312-
AccuracyMetric: acc=0.894167
313-
314-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
270+
....
315271
316272
Evaluate data in 0.48 seconds!
317273
Evaluation on dev at Epoch 9/10. Step:2700/3000:
318274
AccuracyMetric: acc=0.8875
319275
320-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
321-
322276
Evaluate data in 0.43 seconds!
323277
Evaluation on dev at Epoch 10/10. Step:3000/3000:
324278
AccuracyMetric: acc=0.895833
@@ -327,8 +281,6 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
327281
AccuracyMetric: acc=0.8975
328282
Reloaded the best model.
329283
330-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=19), HTML(value='')), layout=Layout(display='…
331-
332284
Evaluate data in 0.34 seconds!
333285
[tester]
334286
AccuracyMetric: acc=0.8975
@@ -375,8 +327,8 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
375327
376328
.. code-block:: text
377329
378-
loading vocabulary file /home/yh/.fastNLP/embedding/bert-chinese-wwm/vocab.txt
379-
Load pre-trained BERT parameters from file /home/yh/.fastNLP/embedding/bert-chinese-wwm/chinese_wwm_pytorch.bin.
330+
loading vocabulary file ~/.fastNLP/embedding/bert-chinese-wwm/vocab.txt
331+
Load pre-trained BERT parameters from file ~/.fastNLP/embedding/bert-chinese-wwm/chinese_wwm_pytorch.bin.
380332
Start to generating word pieces for word.
381333
Found(Or segment into word pieces) 4286 words out of 4409.
382334
input fields after batch(if batch size is 2):
@@ -390,22 +342,14 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
390342
Evaluate data in 0.05 seconds!
391343
training epochs started 2019-09-04-00-02-37
392344
393-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=3600), HTML(value='')), layout=Layout(display…
394-
395-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=150), HTML(value='')), layout=Layout(display=…
396-
397345
Evaluate data in 15.89 seconds!
398346
Evaluation on dev at Epoch 1/3. Step:1200/3600:
399347
AccuracyMetric: acc=0.9
400348
401-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=150), HTML(value='')), layout=Layout(display=…
402-
403349
Evaluate data in 15.92 seconds!
404350
Evaluation on dev at Epoch 2/3. Step:2400/3600:
405351
AccuracyMetric: acc=0.904167
406352
407-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=150), HTML(value='')), layout=Layout(display=…
408-
409353
Evaluate data in 15.91 seconds!
410354
Evaluation on dev at Epoch 3/3. Step:3600/3600:
411355
AccuracyMetric: acc=0.918333
@@ -415,8 +359,6 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
415359
Reloaded the best model.
416360
Performance on test is:
417361
418-
HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=19), HTML(value='')), layout=Layout(display='…
419-
420362
Evaluate data in 29.24 seconds!
421363
[tester]
422364
AccuracyMetric: acc=0.919167

0 commit comments

Comments
 (0)