Running Title Searching the Web by Constrained Spr

redteeth
0 ℃
2020-06-26

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

SearchingtheWebbyConstrainedSpreadingActivationFabioCrestaniPuayLengLeeDepartmentofComputingScienceUniversityofGlasgowGlasgowG128QQ,ScotlandTel.+44-(0)141-3306292Fax.+44-(0)141-3304913Email:ffabio,leeplg@dcs.gla.ac.uk.RunningTitle:SearchingtheWebbyConstrainedSpreadingActivation.Keywords(fromACMComputingClassicationSystem):hyper-text/hypermedia,informationsearchandretrieval,spreadingactivation,queryformulation,intelligentagents.1AbstractIntelligentInformationRetrievalisconcernedwiththeapplicationofintelligenttechniques,likeforexamplesemanticnetworks,neuralnetworksandinferencenetstoInformationRetrieval.TheeldofresearchhasseenanumberofapplicationsofConstrainedSpreadingActivation(CSA)techniquesondomainknowledgenetworks.How-ever,therehasneverbeenanyapplicationofthesetechniquestotheWorldWideWeb.TheWebisaveryimportantinformationresource,butusersndthatlookingforarelevantpieceofinformationintheWebcanbelike\lookingforaneedleinahaystack.Wewerethere-foremotivatedtodesignanddevelopaprototypesystem,WebSCSA(WebSearchbyCSA),thatappliesaCSAtechniquetoretrieveinfor-mationfromtheWebusinganostensiveapproachtoqueryingsimilartoquery-by-example.Inthispaperwedescribethesystemanditsun-derlyingmodel.Furthermore,wereportonanexperimentcarriedoutwithhumansubjectstoevaluatetheeectivenessofWebSCSA.WetestedwhetherWebSCSAimprovesretrievalofrelevantinformationontopofWebsearchenginesresultsandhowwellWebSCSAservesasanagentbrowserfortheuser.Theresultsoftheexperimentsarepromising,andshowthatthereismuchpotentialforfurtherresearchontheuseofCSAtechniquestosearchtheWeb.21IntroductionThispaperisconcernedwiththeapplicationofConstrainedSpreadingAc-tivation(CSA)techniquesforretrievinginformationfromtheWorldWideWeb(herebyreferredtoastheWeb).TheWebpresentsaformidablestoreofinformation.Itisaninterconnectedsystemofover7millionsitesandtheirpages(inDecember1998)accessiblethroughbrowserslikeMosaic,NetscapeNavigatororMicrosoft’sInternetExplorer.AlthoughtheWebisoneoftheneweradditionstotheInternet,ithasgainedpopularityveryquickly,be-comingthesecondmostfrequently-usedfeatureoftheInternet,themostwidely-usedonebeingelectronicmail(Berners-Leeetal.,1992).TheinformationstoredintheWebdiersfromtheinformationtraditionallydealtbyInformationRetrieval(IR)systemsinseveralaspects.Informationorganization.TheWebisnotorganized,inthesensethatassociatedorsimilardocumentsarenotplacedinclosephysicalproximitylikethecollectionsinaphysicallibraryorstoredinsomearchive.Internetdirectories,likeYahoo!,helporganizelinkstosimilardocumentstoeasetheretrievalproblem,butthecategorizationprocessisoftendonemanuallyandthisisexpensiveandtime-consuming.SincetheWebisahypertext/hypermediasystemandwedonotpossesstheresourceswhichInternetdirectoriesdo,thenaturalwayofreachingsimilardocumentsfromgivendocumentswouldbetotraversethelinksonthelatter.AretrievaltoolfortheWebshouldexploitthelinksintheWebdocuments(i.e.Webpages)initssearchfordocumentsrelevanttoauserrequest.Informationrange.SomeconventionalIRsystemscontainspecial-izedinformation,suchas,forexample,medicaldocumentation,orpatents.Hence,IRsystemscansometimesexploitdomainknowledgetoenhanceretrievalperformance.Incontrast,thesubjectrangeofin-formationontheWebisverywide.AnyretrievalprogrambuiltfortheWebmustbeexibletoretrieveinformationofawiderangeofsubjectsandwrittenindierentnaturallanguages.Retrievalmodelsthatex-ploitassociationsbetweendocumentsareappropriateforretrievalontheWebbecausethesemodelsdonotdictatethetopicalrangeofrele-vantinformationprovidedatthebeginningofthesearch.Theysimply3searchforsimilarinformationregardlessofthetopicofthequery(Ellis,1996).Changeofcontent.TheWebisaverydynamicinformationcollec-tion.Everysecond,changesarebeingmadetoexistingWebpages,andpagesareaddedtoordeletedfromtheWeb.ConventionalIRsystemsarelessdynamicandthereismuchmorecontroloverthechangesmadetothedocumentcollection.AretrievalsystemfortheWebshouldbeabletoretrievedocumentsthatareup-to-dateandshouldnotrely(atleastnotcompletely)onindexesthatcouldbecomeoutdatedveryquickly.InthispaperwepresentaprototypeWebsearchsystemthatexploitstheabovedistinctionbetweendocumentsusuallymanagedbyIRsystemsandthosemanagedbytheWeb.TheunderlyingIRmodelofthisprototypeisavariationofthemodelknownasAssociativeRetrieval.AssociativeRetrievalwasrstintroducedbySalton(1968)andisconcernedwithexploitingasso-ciationsbetweeninformationitemsatretrievaltime.Associationsarerstdeterminedusingcitationsorstatisticaltechniques(likeforexampletermco-occurrence)andthenusedbycomplexretrievalfunctions.Intheworkpresentedinthispaperwedonotusecitationsorstatisticalassociations,butweusetheexistingassociationsrepresentedbyhypertextlinksbetweenWebdocuments.WhatweconsiderimportantofAssociativeRetrievalistheideabehindthisformofretrieval,i.e.thatitispossibletoretrieverelevantdocumentsbyretrievingthosethatareexplicitlyassociatedwithsomethattheuserknowstoberelevant.TheworkpresentedinthispaperintegratesAssociativeRetrievalwithOs-tensiveRetrieval.ThisnovelapproachtoIRwasproposedbyCampbellandVanRijsbergen(1996)andisconcerned