面向学术全文本的机器学习模型分类性能对比研究【字数:13872】
目录
摘要 2
关键词 2
Abstract 2
引言
引言
一、 前期准备 5
(一)数据源简介与语料预处理 5
(二)传统机器学习模型 5
1.NB 5
2.SVM 6
3.CRF 6
(三)深度学习模型 7
1.RNN 7
2.BiLSTM 8
3.IDCNN 10
4.BERT 11
(四)分类性能评价指标 12
二、 基于机器学习的分类实验 13
(一)NB 13
(二)SVM 14
(三)CRF 14
(四)RNN 15
(五)BiLSTM 16
(六)IDCNN 16
(七)BERT 16
(八)机器学习模型的分类性能对比 17
三、 结论 18
致谢 18
参考文献 18
图 1 本文整体流程图 5
图 2 SVM分类原理示意图 6
图 3 线性链CRF模型的拓扑结构 7
图 4 BiLSTM模型的主要架构 *景先生毕设|www.jxszl.com +Q: ^351916072#
9
图 5 BiLSTMCRF模型的主要架构 10
图 6 IDCNN模型的主要架构 11
图 7 BERT模型的主要架构 12
表 1 篇章结构信息统计结果 5
表 2 以TF和TFIDF构建特征向量的NB分类效果对比 14
表 3 以TF和TFIDF构建特征向量的SVM分类效果对比 14
表 4 CRF分类效果 15
表 5 RNN模型组分类效果对比 15
表 6 RNN模型组分类效果对比 16
表 7 IDCNN模型组分类效果对比 16
表 8 BERT模型组分类效果对比 16
表 9 机器学习模型的分类性能对比 17
面向学术全文本的机器学习模型分类性能对比研究
Comparative Study on Classification Performance of Machine Learning Models for Academic Full Texts
Student majoring in Information Management and Information System Hu Haotian
Tutor Wang Dongbo
Abstract:[Objective] In order to compare the classification performance of various machine learning models, explore the classification effects of traditional machine learning models and deep learning models, and how to efficiently select the machine learning model for the specific classification tasks. [Method] 31888 academic articles in the journal "PLOS ONE" were selected. After data cleaning, segmentation and manual labeling, a text classification corpus containing 313952 chapter structure category information was constructed, based on traditional machine learning models NB, SVM, CRF, and the deep learning model RNN model group, BiLSTM model group, IDCNN model group, BERT model group, a total of 17 machine learning models to carry out chapter structure division experiment. [Results] Among the classification tasks, the BERTBiLSTMCRF model has the best classification performance, with an average F value of 80.07%, which is 9.4% and 17.84% higher than the second CRF and the third BiLSTMCRF, respectively. For deep learning models, the use of BERT for text representation is better than word2vec. Adding the Attention mechanism and replacing the Softmax layer with the CRF layer can achieve better classification results.
原文链接:http://www.jxszl.com/jsj/xxaq/564044.html