7
7
8
8
1, 商务大床房,房间很大,床有2M宽,整体感觉经济实惠不错!
9
9
10
- 其中开头的1是只这条评论的标签,表示是正面的情绪。我们将使用到的数据可以通过 `此链接 <http://dbcloud.irocn.cn:8989/api/public/dl/ dataset/chn \_ senti \_ corp .zip >`_
10
+ 其中开头的1是只这条评论的标签,表示是正面的情绪。我们将使用到的数据可以通过 `此链接 <http://212.129.155.247/ dataset/chn_senti_corp .zip >`_
11
11
下载并解压,当然也可以通过fastNLP自动下载该数据。
12
12
13
13
数据中的内容如下图所示。接下来,我们将用fastNLP在这个数据上训练一个分类网络。
@@ -73,11 +73,12 @@ DataBundle的相关介绍,可以参考 :class:`~fastNLP.io.DataBundle` 。我
73
73
74
74
.. code-block :: text
75
75
76
- DataSet({'raw_chars': 选择珠江花园的原因就是方便,有电动扶梯直接到达海边,周围餐馆、食廊、商场、超市、摊位一应俱全。酒店装修一般,但还算整洁。 泳池在大堂的屋顶,因此很小,不过女儿倒是喜欢。 包的早餐是西式的,还算丰富。 服务吗,一般 type=str,
77
- 'target': 1 type=str},
78
- {'raw_chars': 15.4寸笔记本的键盘确实爽,基本跟台式机差不多了,蛮喜欢数字小键盘,输数字特方便,样子也很美观,做工也相当不错 type=str,
79
- 'target': 1 type=str})
80
-
76
+ +-----------------------------+--------+
77
+ | raw_chars | target |
78
+ +-----------------------------+--------+
79
+ | 选择珠江花园的原因就是方... | 1 |
80
+ | 15.4寸笔记本的键盘确实爽... | 1 |
81
+ +-----------------------------+--------+
81
82
82
83
(2) 预处理数据
83
84
~~~~~~~~~~~~~~~~~~~~
@@ -121,14 +122,12 @@ fastNLP中也提供了多种数据集的处理类,这里我们直接使用fast
121
122
122
123
.. code-block :: text
123
124
124
- DataSet({'raw_chars': 选择珠江花园的原因就是方便,有电动扶梯直接到达海边,周围餐馆、食廊、商场、超市、摊位一应俱全。酒店装修一般,但还算整洁。 泳池在大堂的屋顶,因此很小,不过女儿倒是喜欢。 包的早餐是西式的,还算丰富。 服务吗,一般 type=str,
125
- 'target': 1 type=int,
126
- 'chars': [338, 464, 1400, 784, 468, 739, 3, 289, 151, 21, 5, 88, 143, 2, 9, 81, 134, 2573, 766, 233, 196, 23, 536, 342, 297, 2, 405, 698, 132, 281, 74, 744, 1048, 74, 420, 387, 74, 412, 433, 74, 2021, 180, 8, 219, 1929, 213, 4, 34, 31, 96, 363, 8, 230, 2, 66, 18, 229, 331, 768, 4, 11, 1094, 479, 17, 35, 593, 3, 1126, 967, 2, 151, 245, 12, 44, 2, 6, 52, 260, 263, 635, 5, 152, 162, 4, 11, 336, 3, 154, 132, 5, 236, 443, 3, 2, 18, 229, 761, 700, 4, 11, 48, 59, 653, 2, 8, 230] type=list,
127
- 'seq_len': 106 type=int},
128
- {'raw_chars': 15.4寸笔记本的键盘确实爽,基本跟台式机差不多了,蛮喜欢数字小键盘,输数字特方便,样子也很美观,做工也相当不错 type=str,
129
- 'target': 1 type=int,
130
- 'chars': [50, 133, 20, 135, 945, 520, 343, 24, 3, 301, 176, 350, 86, 785, 2, 456, 24, 461, 163, 443, 128, 109, 6, 47, 7, 2, 916, 152, 162, 524, 296, 44, 301, 176, 2, 1384, 524, 296, 259, 88, 143, 2, 92, 67, 26, 12, 277, 269, 2, 188, 223, 26, 228, 83, 6, 63] type=list,
131
- 'seq_len': 56 type=int})
125
+ +-----------------+--------+-----------------+---------+
126
+ | raw_chars | target | chars | seq_len |
127
+ +-----------------+--------+-----------------+---------+
128
+ | 选择珠江花园... | 0 | [338, 464, 1... | 106 |
129
+ | 15.4寸笔记本... | 0 | [50, 133, 20... | 56 |
130
+ +-----------------+--------+-----------------+---------+
132
131
133
132
134
133
新增了一列为数字列表的chars,以及变为数字的target列。可以看出这两列的名称和刚好与data\_ bundle中两个Vocabulary的名称是一致的,我们可以打印一下Vocabulary看一下里面的内容。
@@ -183,11 +182,6 @@ fastNLP支持使用名字指定的Embedding以及相关说明可以参见 :mod:`
183
182
(4) 创建模型
184
183
~~~~~~~~~~~~
185
184
186
- 这里我们使用到的模型结构如下所示
187
-
188
- .. todo ::
189
- 补图
190
-
191
185
.. code-block :: python
192
186
193
187
from torch import nn
@@ -261,64 +255,24 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
261
255
Evaluate data in 0.01 seconds!
262
256
training epochs started 2019-09-03-23-57-10
263
257
264
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=3000), HTML(value='')), layout=Layout(display…
265
-
266
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
267
-
268
258
Evaluate data in 0.43 seconds!
269
259
Evaluation on dev at Epoch 1/10. Step:300/3000:
270
260
AccuracyMetric: acc=0.81
271
261
272
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
273
-
274
262
Evaluate data in 0.44 seconds!
275
263
Evaluation on dev at Epoch 2/10. Step:600/3000:
276
264
AccuracyMetric: acc=0.8675
277
265
278
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
279
-
280
266
Evaluate data in 0.44 seconds!
281
267
Evaluation on dev at Epoch 3/10. Step:900/3000:
282
268
AccuracyMetric: acc=0.878333
283
269
284
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
285
-
286
- Evaluate data in 0.43 seconds!
287
- Evaluation on dev at Epoch 4/10. Step:1200/3000:
288
- AccuracyMetric: acc=0.873333
289
-
290
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
291
-
292
- Evaluate data in 0.44 seconds!
293
- Evaluation on dev at Epoch 5/10. Step:1500/3000:
294
- AccuracyMetric: acc=0.878333
295
-
296
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
297
-
298
- Evaluate data in 0.42 seconds!
299
- Evaluation on dev at Epoch 6/10. Step:1800/3000:
300
- AccuracyMetric: acc=0.895833
301
-
302
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
303
-
304
- Evaluate data in 0.44 seconds!
305
- Evaluation on dev at Epoch 7/10. Step:2100/3000:
306
- AccuracyMetric: acc=0.8975
307
-
308
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
309
-
310
- Evaluate data in 0.43 seconds!
311
- Evaluation on dev at Epoch 8/10. Step:2400/3000:
312
- AccuracyMetric: acc=0.894167
313
-
314
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
270
+ ....
315
271
316
272
Evaluate data in 0.48 seconds!
317
273
Evaluation on dev at Epoch 9/10. Step:2700/3000:
318
274
AccuracyMetric: acc=0.8875
319
275
320
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=38), HTML(value='')), layout=Layout(display='…
321
-
322
276
Evaluate data in 0.43 seconds!
323
277
Evaluation on dev at Epoch 10/10. Step:3000/3000:
324
278
AccuracyMetric: acc=0.895833
@@ -327,8 +281,6 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
327
281
AccuracyMetric: acc=0.8975
328
282
Reloaded the best model.
329
283
330
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=19), HTML(value='')), layout=Layout(display='…
331
-
332
284
Evaluate data in 0.34 seconds!
333
285
[tester]
334
286
AccuracyMetric: acc=0.8975
@@ -375,8 +327,8 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
375
327
376
328
.. code-block :: text
377
329
378
- loading vocabulary file /home/yh /.fastNLP/embedding/bert-chinese-wwm/vocab.txt
379
- Load pre-trained BERT parameters from file /home/yh /.fastNLP/embedding/bert-chinese-wwm/chinese_wwm_pytorch.bin.
330
+ loading vocabulary file ~ /.fastNLP/embedding/bert-chinese-wwm/vocab.txt
331
+ Load pre-trained BERT parameters from file ~ /.fastNLP/embedding/bert-chinese-wwm/chinese_wwm_pytorch.bin.
380
332
Start to generating word pieces for word.
381
333
Found(Or segment into word pieces) 4286 words out of 4409.
382
334
input fields after batch(if batch size is 2):
@@ -390,22 +342,14 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
390
342
Evaluate data in 0.05 seconds!
391
343
training epochs started 2019-09-04-00-02-37
392
344
393
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=3600), HTML(value='')), layout=Layout(display…
394
-
395
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=150), HTML(value='')), layout=Layout(display=…
396
-
397
345
Evaluate data in 15.89 seconds!
398
346
Evaluation on dev at Epoch 1/3. Step:1200/3600:
399
347
AccuracyMetric: acc=0.9
400
348
401
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=150), HTML(value='')), layout=Layout(display=…
402
-
403
349
Evaluate data in 15.92 seconds!
404
350
Evaluation on dev at Epoch 2/3. Step:2400/3600:
405
351
AccuracyMetric: acc=0.904167
406
352
407
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=150), HTML(value='')), layout=Layout(display=…
408
-
409
353
Evaluate data in 15.91 seconds!
410
354
Evaluation on dev at Epoch 3/3. Step:3600/3600:
411
355
AccuracyMetric: acc=0.918333
@@ -415,8 +359,6 @@ fastNLP提供了Trainer对象来组织训练过程,包括完成loss计算(所
415
359
Reloaded the best model.
416
360
Performance on test is:
417
361
418
- HBox(children=(IntProgress(value=0, layout=Layout(flex='2'), max=19), HTML(value='')), layout=Layout(display='…
419
-
420
362
Evaluate data in 29.24 seconds!
421
363
[tester]
422
364
AccuracyMetric: acc=0.919167
0 commit comments