面向学术全文本的机器学习模型分类性能对比研究【字数：13872】

2024-02-25 17:07编辑: www.jxszl.com景先生毕设

[目的]为了对比各种机器学习模型的分类性能，探究传统机器学习模型和深度学习模型的分类效果优劣，解决如何高效的选择适用所开展分类任务的机器学习模型问题。[方法]选取了《PLOS ONE》期刊上31888篇学术文献，经过数据清洗、分段和人工标注，构建了包含313952条篇章结构类别信息的文本分类语料库，基于传统机器学习模型NB、SVM、CRF和深度学习模型RNN模型组、Bi-LSTM模型组、IDCNN模型组、BERT模型组共计17种机器学习模型开展篇章结构划分实验。[结果]在分类任务中，BERT-Bi-LSTM-CRF模型的分类性能最佳，平均F值为80.07%，比第二名CRF和第三名Bi-LSTM-CRF分别高出9.4%和17.84%。对于深度学习模型，使用BERT进行文本表示的效果优于word2vec，加入Attention机制和用CRF层取代Softmax层可以获得更优的分类效果。
目录
摘要２
关键词２
Abstract ２
引言
引言
一、前期准备５
（一）数据源简介与语料预处理５
（二）传统机器学习模型５
1．NB ５
2．SVM ６
3．CRF ６
（三）深度学习模型７
1．RNN ７
2．BiLSTM ８
3．IDCNN １０
4．BERT １１
（四）分类性能评价指标１２
二、基于机器学习的分类实验１３
（一）NB １３
（二）SVM １４
（三）CRF １４
（四）RNN １５
（五）BiLSTM １６
（六）IDCNN １６
（七）BERT １６
（八）机器学习模型的分类性能对比１７
三、结论１８
致谢１８
参考文献１８
图 1 本文整体流程图５
图 2 SVM分类原理示意图６
图 3 线性链CRF模型的拓扑结构７
图 4 BiLSTM模型的主要架构 *景先生毕设|www.jxszl.com +Q: ^351916072#
９
图 5 BiLSTMCRF模型的主要架构１０
图 6 IDCNN模型的主要架构１１
图 7 BERT模型的主要架构１２
表 1 篇章结构信息统计结果５
表 2 以TF和TFIDF构建特征向量的NB分类效果对比１４
表 3 以TF和TFIDF构建特征向量的SVM分类效果对比１４
表 4 CRF分类效果１５
表 5 RNN模型组分类效果对比１５
表 6 RNN模型组分类效果对比１６
表 7 IDCNN模型组分类效果对比１６
表 8 BERT模型组分类效果对比１６
表 9 机器学习模型的分类性能对比１７
面向学术全文本的机器学习模型分类性能对比研究
Comparative Study on Classification Performance of Machine Learning Models for Academic Full Texts
Student majoring in Information Management and Information System Hu Haotian
Tutor Wang Dongbo
Abstract：[Objective] In order to compare the classification performance of various machine learning models, explore the classification effects of traditional machine learning models and deep learning models, and how to efficiently select the machine learning model for the specific classification tasks. [Method] 31888 academic articles in the journal "PLOS ONE" were selected. After data cleaning, segmentation and manual labeling, a text classification corpus containing 313952 chapter structure category information was constructed, based on traditional machine learning models NB, SVM, CRF, and the deep learning model RNN model group, BiLSTM model group, IDCNN model group, BERT model group, a total of 17 machine learning models to carry out chapter structure division experiment. [Results] Among the classification tasks, the BERTBiLSTMCRF model has the best classification performance, with an average F value of 80.07%, which is 9.4% and 17.84% higher than the second CRF and the third BiLSTMCRF, respectively. For deep learning models, the use of BERT for text representation is better than word2vec. Adding the Attention mechanism and replacing the Softmax layer with the CRF layer can achieve better classification results.

原文链接：http://www.jxszl.com/jsj/xxaq/564044.html

"景先生毕设|www.jxszl.com

面向学术全文本的机器学习模型分类性能对比研究【字数：13872】

查看完整版论文请

扫码加QQ

扫码加微信

在线客服

[QQ:351916072]