基于Solr的搜索引擎研究与实现

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

-i-i摘要随着信息时代的来临,人们的生活、学习、工作和娱乐已经与信息技术充分的融为一体。随着群众对互联网参与度的增加和企事业单位信息化的逐步深入,信息量也成倍地增长,如何能更好地从浩如烟海的数字信息中快速、精准地查找到需要的信息,成为全国人民的迫切需求。尤其是处于信息化建设道路上的中小企业,如何快速且廉价地开放自己的信息检索系统,对于企业的成长至关重要。本文介绍了搜索引擎的基本原理,对搜索引擎的一些核心技术进行了介绍与深入的分析;介绍了Lucene搜索引擎工具包的架构及其基本使用;对基于开源搜索引擎包Lucene的搜索框架Solr进行架构、代码、配置等方面的研究;最后设计并实现了一个基于Solr1.3的简单可用的多库搜索引擎。整个设计过程致力于提高管理维护的方便性和可扩展性。关键词:Lucene;Solr;搜索引擎;爬虫;中文分词-ii-iiAbstractWiththeadventoftheinformationera,people'slives,study,workandentertainmenthavebeenfullyintegratedwithinformationtechnology.WiththeparticipationofthemassesontheInternetandincreaseinthenumberofinformation-basedenterprises,theamountofinformationhasbecomeseveraltimesasbefore.Howtogetusefulinformationquicklyandaccuratelyhasbecomeanimportantthingtoeveryone.Forthesmallandmedium-sizedenterprises,howtodevelopitsownmessageretrievalsystemquicklyandcheaplyisessentialforthegrowthofenterprises.Inthispaper,weintroducethebasicprinciplesofsearchengineandanalysesanumberofcoretechnologies.IalsointroduceLucenesearchenginetools,itsbasicframeworkandhowtouseit.WeanalysesSolr,whichisanopensourcesearchenginebasedonLucene,anditsarchitecture,code,configuration.Atlast,wedesignandimplementasimplemulti-databasesearchenginebasedonSolr1.3.Keywords:Lucene;Solr;SearchEngine;Spider;ChineseWordSegmentation-iii-iii目录第一章前言..............................................................................................................................................11.1绪论............................................................................................................................11.2开源搜索引擎研究的意义和现状............................................................................3第二章中文搜索引擎关键技术..............................................................................................................42.1搜索引擎基本结构....................................................................................................42.2中文分词....................................................................................................................62.3相关排序....................................................................................................................92.4搜索引擎响应速度..................................................................................................132.5网络蜘蛛..................................................................................................................14第三章开源搜索引擎SOLR...................................................................................................................163.1搜索引擎包LUCENE...................................................................................................163.1.1Lucene简介..................................................................................................163.1.2Lucene与Solr的关系................................................................................163.1.3Lucene的结构..............................................................................................173.1.4Lucene的使用..............................................................................................183.1.5Lucene的评分公式......................................................................................223.1.6Lucene的搜索结果排序..............................................................................233.2SOLR的介绍..............................................................................................................243.2.1Solr的特点与优势......................................................................................243.2.2Solr1.3的新特性........................................................................................253.3SOLR的配置和使用..................................................................................................253.3.1Solr1.3服务器的部署................................................................................253.3.2Solr1.3体系结构图....................................................................................263.3.3solr.xml配置文件......................................................................................273.3.4schema.xml配置文件..................................................................................273.3.5solrconfig.xml配置文件..........................................................................293.3.6查询HTTP接口参数.....................................................................................323.4SOLR1.3的核心机制................................................................................................323.4.1内核调用机制...............................................................................................323.4.2分库机制.......................................................................................................333.4.3缓存机制.......................................................................................................33第四章基于SOLR的搜索引擎总体设计...........................................................................................354.1设计背景和原则......................................................................................................35-iv-iv4.2整体结构与模块关系....

1 / 56
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功