Web信息抽取中的文本分类

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

摘要摘要在机器学习理论中支持向量机(SVM)有着重要的地位,无论是求解分类问题还是求解回归问题,SVM都有着广泛的应用。本文简单的介绍了SVM的基本原理,讨论了SVM在文本分类中的应用,并详细的分析了如何利用SVM构造文本分类器。这里说明了文本分类的详细处理过程,并介绍了这些过程中的关键技术,如:分词技术、向量空间模型(VSM)、特征选取技术和SVM的交叉验证技术等等。结合着分析和讨论又概略的说明了利用MicrosoftVisualC++6.0创建文本分类系统的过程,介绍了重要的类和关键处理函数的实现和优化,以及如何利用动态链接库来实现C++到Java的迁移。最后给出了由本系统得到的实验数据和结论。关键字:机器学习文本分类支持向量机(SVM)ABSTRACTABSTRACTSupportVectorMachines(SVM)hasanimportantpositioninMachinelearningtheory,whetheritistosolvetheclassificationproblemorrequestforthereunificationissue,SVMhasawiderangeofapplications.Inthispaper,ashortintroductionintothebasicprinciplesofSVM,adetaileddiscussionoftheSVMinthetextclassification,andacarefulanalysisofhowtomakeuseofSVMtoconstructclassifierforatextclassification.Here'sthetextofthedetailedclassificationprocessandintroducedinthecourseofthesekeytechnologies,suchas:segmentationtechnology,vectorspacemodel(VSM),featuresselectiontechnology,cross-verificationtechnologyoftheSVMandsoon.WiththeanalysisanddiscussionalsobrieflydescribedtheprocessofmakinguseofMicrosoftVisualC++6.0tocreatethetextclassificationsystem,introducedtherealizationandoptimizationofthekeyclassandimportantfunctions,andhowtouseofdynamiclinklibrarytoachievethemigrationfromC++toJava.Finally,theexperimentaldataandconclusionsproducedbythissystemareshown.Keywords:machinelearningtextclassificationSVM(supportvectormachine)目录目录第一章引言.....................................................................................................................11.1总体项目背景.......................................................................................................11.1.1基于Web的信息集成系统.....................................................................11.1.2基于Web的信息集成系统的需求和系统结构.....................................21.2文本分类系统的任务和目标...............................................................................31.3本文主要研究内容...............................................................................................4第二章相关理论.............................................................................................................72.1文本自动分类.......................................................................................................72.3支持向量机(SVM)................................................................................................82.4SVM的原理..........................................................................................................92.4.1线性支持向量机.......................................................................................92.4.2非线性支持向量机.................................................................................112.5SVM文本分类....................................................................................................13第三章需求分析...........................................................................................................153.1SVM的两个阶段................................................................................................153.2训练阶段目标.....................................................................................................163.3测试阶段目标.....................................................................................................183.4外部接口.............................................................................................................18第四章总体设计与实现工具的选择..........................................................................214.1总体结构.............................................................................................................214.2训练阶段.............................................................................................................214.2.1分词及词频统计.....................................................................................214.2.2文本向量空间模型(VSM)及文本特征选取.........................................274.2.3文本向量化.............................................................................................314.2.4文本分类器.............................................................................................324.3测试阶段.............................................................................................................364.3.1分词及词频统计.....................................................................................36目录4.3.2文本向量化.............................................................................................364.3.3分类处理.................................................................................................374.4实现工具的选择与跨语言迁移.........................................................................37第五章详细设计与实现..............................................................................................395.1界面设计.............................................................................................................395.2配置文件config.xml..........................................................................................405.3LIST类................................................................................................................405.4Frequency类................................................

1 / 72
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功