湖南大学毕业论文第I页HUNANUNIVERSITY毕业设计(论文)设计论文题目:全文检索及其在公文处理系统中的应用学生姓名:赵湘舰学生学号:20041610334专业班级:软件开发2班学院名称:软件学院指导老师:陆绍飞学院院长:林亚平2008年5月23日湖南大学毕业论文第II页全文检索及其在公文处理系统中的应用摘要随着信息技术的发展,人们对信息查询效率的要求越来越高。全文检索的发展和应用满足了大部分的需求。LUCEN是一个开放源代码的全文检索引擎,并提供了完整的查询引擎和索引引擎,其目的是为软件开发人员提供一个简单易用的工具包,以方便的在目标系统中实现全文检索的功能,或者是以此为基础建立起完整的全文检索引擎。本课题实际需求来源于本人在企业里实习时参与开发的公文处理系统,此系统要求具有站内搜索引擎的功能,能够对公文内容进行检索。项目中的搜索引擎采用LUCENE实现,使得公文处理系统在内容检索郊率得到极大提高,站内搜索使整个系统的功能更加强大,为用户提供了更为便利的搜索功能。本文对搜索引擎的原理、组成、数据结构、工作流程等方面做了深入而细致地研究与分析。并且通过LUCENE来设计和实现一个全文检索站内搜索引擎系统,最后通过增量索引和优化索引两个方面来说明如何提高LUCENE的高效性。关键词:全文检索,搜索引擎,LUCENE湖南大学毕业论文第III页Full-TextSearchandItsImplementationinDocumentProcessingSystemAuthor:XiangjianZhaoTutor:ShaofeiLuWiththedevelopmentoftheinformationtechnology,thedemandontheefficiencyofsearchinginformationhasbeengettinghigherandhigher.Andthedevelopmentandapplicationoffull-textretrievalsatisfymostofpeople.LUCENEisatoolkitoffull-textsearchengineofopensource,anditprovideintegralinquireengineandindexengine.LUCENEisdesignedtoprovideasimple,easy-to-usetoolkitforsoftwaredevelopers,andit’sconvenienttorealizethefull-textretrievalfunctioninthetargetsystem,orasabasistoestablishtheintegralfull-textsearchengine.Theactualrequirementsofthissystemoriginfrommydevelopmenton“DocumentProcessingSystem”whenIpracticedintheenterprise.Thesystemusesthefunctionofstationsearchengine.IusetheLUCENEtorealizethesearchengineinit.Nowforstableoperation,stationsearchmakethefunctionofthewholesystembecomemorepowerful,andtoprovideuserswithamoreconvenientsearchfunction.Ihavecarefullystudiedandanalysissearchenginesprinciple,composition,datastructure,andworkflow,andhavedesignedandrealizedafull-textretrievalstationssearchenginebymeansofLUCENE.Finally,bothtoIillustratehowtoimprovetheefficiencyofLUCENEthroughtwoaspects,theincrementindexandtheoptimizationindex.KeyWords:Full-TextSearch,SearchEngine,Lucene,湖南大学毕业论文第IV页目录1.绪论...................................................................11.1课题背景.......................................................................................................................11.2课题目前研究情况及存在问题...................................................................................11.3论文组织结构...............................................................................................................22.全文检索与LUCENE.......................................................32.1全文检索与全文检索简介...........................................................................................32.2全文检索系统与数据库比较.......................................................................................42.3LUCENE简介..............................................................................................................62.4LUCENE的应用、特点及优势..................................................................................72.5互联网搜索引擎的研究...............................................................................................82.6中文分词的简单介绍...................................................................................................93.LUCENE系统结构........................................................113.1LUCENE系统结构组织............................................................................................113.2数据流分析.................................................................................................................113.3LUCENE索引文件格式分析....................................................................................133.3.1LUCENE源码实现分析的说明......................................................................133.3.2LUCENE索引文件格式..................................................................................133.4LUCENE的倒排序原理............................................................................................163.5LUCENE搜索结果排序............................................................................................184.系统设计与实现........................................................204.1系统需求.....................................................................................................................204.2开发环境与工具.........................................................................................................204.3系统组织结构.............................................................................................................224.4全文检索流程的实现.................................................................................................274.4.1生成索引...........................................................................................................284.4.2更新索引...........................................................................................................314.4.3检索..................................................................................................................325.检索结果分析..........................................................355.1系统中的检索结果....................................................................................................355.2测试总结................................................................................................