Running Title Searching the Web by Constrained Spr

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

SearchingtheWebbyConstrainedSpreadingActivationFabioCrestaniPuayLengLeeDepartmentofComputingScienceUniversityofGlasgowGlasgowG128QQ,ScotlandTel.+44-(0)141-3306292Fax.+44-(0)141-3304913Email:ffabio,leeplg@dcs.gla.ac.uk.RunningTitle:SearchingtheWebbyConstrainedSpreadingActivation.Keywords(fromACMComputingClassicationSystem):hyper-text/hypermedia,informationsearchandretrieval,spreadingactivation,queryformulation,intelligentagents.1AbstractIntelligentInformationRetrievalisconcernedwiththeapplicationofintelligenttechniques,likeforexamplesemanticnetworks,neuralnetworksandinferencenetstoInformationRetrieval.TheeldofresearchhasseenanumberofapplicationsofConstrainedSpreadingActivation(CSA)techniquesondomainknowledgenetworks.How-ever,therehasneverbeenanyapplicationofthesetechniquestotheWorldWideWeb.TheWebisaveryimportantinformationresource,butusersndthatlookingforarelevantpieceofinformationintheWebcanbelike\lookingforaneedleinahaystack.Wewerethere-foremotivatedtodesignanddevelopaprototypesystem,WebSCSA(WebSearchbyCSA),thatappliesaCSAtechniquetoretrieveinfor-mationfromtheWebusinganostensiveapproachtoqueryingsimilartoquery-by-example.Inthispaperwedescribethesystemanditsun-derlyingmodel.Furthermore,wereportonanexperimentcarriedoutwithhumansubjectstoevaluatetheeectivenessofWebSCSA.WetestedwhetherWebSCSAimprovesretrievalofrelevantinformationontopofWebsearchenginesresultsandhowwellWebSCSAservesasanagentbrowserfortheuser.Theresultsoftheexperimentsarepromising,andshowthatthereismuchpotentialforfurtherresearchontheuseofCSAtechniquestosearchtheWeb.21IntroductionThispaperisconcernedwiththeapplicationofConstrainedSpreadingAc-tivation(CSA)techniquesforretrievinginformationfromtheWorldWideWeb(herebyreferredtoastheWeb).TheWebpresentsaformidablestoreofinformation.Itisaninterconnectedsystemofover7millionsitesandtheirpages(inDecember1998)accessiblethroughbrowserslikeMosaic,NetscapeNavigatororMicrosoft’sInternetExplorer.AlthoughtheWebisoneoftheneweradditionstotheInternet,ithasgainedpopularityveryquickly,be-comingthesecondmostfrequently-usedfeatureoftheInternet,themostwidely-usedonebeingelectronicmail(Berners-Leeetal.,1992).TheinformationstoredintheWebdiersfromtheinformationtraditionallydealtbyInformationRetrieval(IR)systemsinseveralaspects.Informationorganization.TheWebisnotorganized,inthesensethatassociatedorsimilardocumentsarenotplacedinclosephysicalproximitylikethecollectionsinaphysicallibraryorstoredinsomearchive.Internetdirectories,likeYahoo!,helporganizelinkstosimilardocumentstoeasetheretrievalproblem,butthecategorizationprocessisoftendonemanuallyandthisisexpensiveandtime-consuming.SincetheWebisahypertext/hypermediasystemandwedonotpossesstheresourceswhichInternetdirectoriesdo,thenaturalwayofreachingsimilardocumentsfromgivendocumentswouldbetotraversethelinksonthelatter.AretrievaltoolfortheWebshouldexploitthelinksintheWebdocuments(i.e.Webpages)initssearchfordocumentsrelevanttoauserrequest.Informationrange.SomeconventionalIRsystemscontainspecial-izedinformation,suchas,forexample,medicaldocumentation,orpatents.Hence,IRsystemscansometimesexploitdomainknowledgetoenhanceretrievalperformance.Incontrast,thesubjectrangeofin-formationontheWebisverywide.AnyretrievalprogrambuiltfortheWebmustbeexibletoretrieveinformationofawiderangeofsubjectsandwrittenindierentnaturallanguages.Retrievalmodelsthatex-ploitassociationsbetweendocumentsareappropriateforretrievalontheWebbecausethesemodelsdonotdictatethetopicalrangeofrele-vantinformationprovidedatthebeginningofthesearch.Theysimply3searchforsimilarinformationregardlessofthetopicofthequery(Ellis,1996).Changeofcontent.TheWebisaverydynamicinformationcollec-tion.Everysecond,changesarebeingmadetoexistingWebpages,andpagesareaddedtoordeletedfromtheWeb.ConventionalIRsystemsarelessdynamicandthereismuchmorecontroloverthechangesmadetothedocumentcollection.AretrievalsystemfortheWebshouldbeabletoretrievedocumentsthatareup-to-dateandshouldnotrely(atleastnotcompletely)onindexesthatcouldbecomeoutdatedveryquickly.InthispaperwepresentaprototypeWebsearchsystemthatexploitstheabovedistinctionbetweendocumentsusuallymanagedbyIRsystemsandthosemanagedbytheWeb.TheunderlyingIRmodelofthisprototypeisavariationofthemodelknownasAssociativeRetrieval.AssociativeRetrievalwasrstintroducedbySalton(1968)andisconcernedwithexploitingasso-ciationsbetweeninformationitemsatretrievaltime.Associationsarerstdeterminedusingcitationsorstatisticaltechniques(likeforexampletermco-occurrence)andthenusedbycomplexretrievalfunctions.Intheworkpresentedinthispaperwedonotusecitationsorstatisticalassociations,butweusetheexistingassociationsrepresentedbyhypertextlinksbetweenWebdocuments.WhatweconsiderimportantofAssociativeRetrievalistheideabehindthisformofretrieval,i.e.thatitispossibletoretrieverelevantdocumentsbyretrievingthosethatareexplicitlyassociatedwithsomethattheuserknowstoberelevant.TheworkpresentedinthispaperintegratesAssociativeRetrievalwithOs-tensiveRetrieval.ThisnovelapproachtoIRwasproposedbyCampbellandVanRijsbergen(1996)andisconcerned

1 / 33
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功