i本科生毕业论文题目:(中文)一种搜索引擎的查询意图发现的新方法(英文)ANewMethodofDetectingQueryIntentforSearchEngines姓名:徐谷子学号:00748337院系:信息科学技术学院专业:计算机科学与技术指导教师:彭波二〇二〇年一月十五日北京大学本科生毕业论文一种搜索引擎的查询意图发现的新方法徐谷子i摘要搜索引擎每天收到数量巨大的查询请求,这些查询背后的用户意图可能是不同的。搜索引擎可以根据不同的查询意图,提供不同的服务功能(如对研究型的查询意图提供一个记事本功能,对导航型查询展开网站的重要内容),或对同样的服务进行不同的实现(如采用不同的排序函数)。用户的查询意图,最主要的可以分成信息类和导航类。对用户查询意图的分类,现有的大多数方法都基于查询串文本本身的特征和查询的用户点击数据特征。这两种方法存在如下的困难:对于查询串的文本特征,查询比较短,特征比较稀疏,要进行比较准确地理解会比较困难;对于用户点击特征,由于用户提交查询的长尾性分布,大多数查询的提交次数都是较少的,对于这些查询,要判别它们的意图是比较困难的。为了克服长尾查询上查询意图判断的不可靠问题,本文提出利用查询结果的相关性分数的分布作为特征来判断查询的意图。这种方法会依赖查询结果的特征,比查询串本身的特征更加丰富;同时不依赖于用户的点击数据,因此可以克服长尾查询上的困难。结果表明,使用结果分数分布,可以提高意图判别的准确程度。另外,不同的用户在提交同一个查询的时候,他们的意图有可能是彼此不同的。我们可以通过一个用户的查询提交和点击历史,来推断用户的查询意图。但是,仅仅利用一个用户的查询和点击历史,其数据会比较稀疏。因此,我们使用不同用户的查询和点击历史,来判断行为模式相近的其他用户的查询意图。关键词:查询意图,用户点击,查询分类,个性化北京大学本科生毕业论文一种搜索引擎的查询意图发现的新方法徐谷子iiAbstractSearchenginesreceivelargeamountsofquerieseveryday,buttheuserintentbehindthemmightbedifferent.Accordingtodifferentquerytypes,searchenginescanoffervariousservicessuchasofferinganotebookfunctionforresearch-basedqueriesorlistingimportantcontentfornavigationalqueries.Searchenginescanalsobringaboutdifferentresultsonthegroundofsametask,forexampleitcanadoptdifferentsortfunctions.Thequeryintentofuserscanbemainlydividedtotwokinds:informationalandnavigational.Currentmethodsofclassificationofqueryintentmostrelyonthetextofqueriesandtheclickinformationofusers.Thosetwomethodshavefollowingdisadvantages:forthepropertiesofthetextofqueries,thequeriesareusuallyshortandlackofproperties,soitishardtounderstandtheintentcorrectly.Fortheclickthroughinformation,mostquerieshavebeensubmittedafewtimesbecauseofthedistributionofqueriessubmitted.Therefore,itisdifficulttodistinguishtheirqueryintent.Inordertosolvetheunreliabilityofthejudgmentsaboutthequeryintentoflongtailqueries,weproposethatusingthedistributionofcorrelationscoresofresultofqueriestojudgethequeryintent.Thismethodreliesontheresultofqueriesandwecangetricherpropertiesthanqueriesthemselves.Meanwhile,wedonotneedtheinformationdataofuserssothatwecanovercomethedifficultyoflongtailqueries.Theresultshowsthatwecouldincreasetheaccuracyofclassificationbyusingthedistributionofscoresofqueryresults.Furthermore,whendifferentuserssubmitthesamequery,theirintentmightbedifferent.Wecouldspeculatethequeryintentofoneuserwiththehelpofthequerysubmittedandclickhistoryofanotheruser.However,thedatawouldbescarceifwecouldonlyusethequeryandclickhistoryofoneuser.Therefore,weusethequeriesandclickhistoryofvarioususerstospeculatethequeryintentofusersofsimilarpatterns.Keywords:queryintent,clickinformationofusers,theclassificationofqueries,personalized北京大学本科生毕业论文一种搜索引擎的查询意图发现的新方法徐谷子iii目录摘要...................................................................................................................................................iAbstract.............................................................................................................................................ii目录................................................................................................................................................iii1引言..............................................................................................................................................11.1背景概述..............................................................................................................................11.2分类框架..............................................................................................................................21.3分类特征和方法..................................................................................................................31.4论文结构..............................................................................................................................52基于用户查询日志发现查询意图...............................................................................................62.1用户查询日志简介..............................................................................................................62.2利用点击信息进行分类......................................................................................................62.2.1nCS特征值..................................................................................................................72.2.2nRS特征值..................................................................................................................72.2.3Clickdistribution特征值.............................................................................................82.2.4结果说明与分析.........................................................................................................82.3构造决策树.........................................................................................................................102.4结果与分析........................................................................................................................113基于搜索结果的查询意图发现.................................................................................................133.1抓取查询返回结果网页.................