翻译

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

PIDALION:ImplementationissuesofaJava-basedMultimediaSearchEngineoverthewebDimitrisE.Charilas,OuraniaI.MarkakiNationalTechnicalUniversityofAthens,DepartmentofElectricalandComputerEngineering,Keywords:multimediacontent,queries,content-basedretrieval,multimediacrawler,metadata,imagehistogram,hierarchicalpresentationAbstract-Fuelledbytherapidexpansionofbroadbandconnectivityandincreasinginterestinonlinemultimedia-richapplications,thegrowthofdigitalmultimediacontenthasskyrocketed.Amongothers,thisgrowthiscompoundingtheneedformoreeffectivemethodsforsearchingmultimediainformation.Theautomatedwebsearchenginesthatarecurrentlyusedrelyonlyontextdescriptionsandasaresultprovidematchesofpoorqualityincaseofmultimediacontent.Theservicesofamultimediasearchenginearethereforeapossibilitythattheinternetusersstilllack.Thus,thescopeofthispaperistopresentanimplementationapproachforapersonalizedweb-basedmultimediasearchengineintheJavaprogramminglanguage.Thisapproachcombinesthecharacteristicsofthecurrentsearchenginesaswellasnewinnovativefeatureswhichguaranteeatthesametimethesystem’squickresponseandbettersearchresults.Inthispaperthereadercanfindananalyticalpresentationofallthecomponentsrequiredtoformamultimediasearchengine,aswellasindicationsonhowtoimplementkeyalgorithmsandfunctions.1.INTRODUCTIONThewebcreatesnewchallengesforinformationretrieval.Theamountofinformationonthewebisgrowingrapidlyandsoisthenumberofnewusersinexperiencedintheartofwebresearch.Itisestimatedthat1-2Exa-Bytes(millionsofTera-Bytes)ofnewinformationarecreatedeachyearovertheWeb.Thishugeamountofinformationisanticipatedtogrowbyafactorof10inthefollowingtwoyears.Automatedsearchenginesthatrelyonkeywordmatchingusuallyreturntoomanylowqualitymatches.Thesituationisworseasfarasmultimediacontentisconcerned.Themostpopularsearchengine,Google[1],reliesonlyonkeywordstosearchforimagesanddoesnotcontainanyinformationonsemanticcontent.Content-basedimageretrievalsystems(CBIR)trytosolvethisproblem.ManyCBIRsystemshavebeenrecentlyproposedandimplementedintheliterature.ExamplesincludetheQBICsystem[2],wherecolourinformationisexploited,thePicToSeeksystem[3],whichcombinescolourandshapeinvariantfeaturestoperformimageretrievalandVirage[4]thatallowstheuserstomanuallyregulatetheimportanceoftheextracteddescriptorsaccordingtotheirownperception.Fuzzyorganizationofthedescriptorsisproposedin[5]forincreasingtheretrievalprecisionatacertainrecallvalue,while3Dsearchingisdiscussedin[6].Applicationsofcontent-basedretrievalsystemsareexaminedin[7],whilein[8]asystemregardingmusicaccessisproposed.Personalizedretrievalisexaminedintheworkpresentedin[9].Lastbutnotleast,Marvelthelatestandmoreintelligentcontent-basedsearchengine,developedbytheIBMresearchcentre,USAin2004[10],triestoincreasetheretrievalprecisionaccuracybyincorporatingsemanticannotationinthemediavolumes.However,alltheadoptedapproacheshavestaticandlocalaccessonlytothesystem’sdatabaseandthuscannotretrievecontentfromtheweb[11].Furthermore,theaforementionedworksfocusonthealgorithmsforefficientcontent-basedretrievalandnotonthepracticalissuesregardingtheimplementationofalargescalemultimediasearchengineovertheWeb.Sofar,severaldifferenttechniquesformakingdistributedmultimediacontentsearchablehavebeenproposed.In[12]thereisinformationonthetechniquesofcheckingtheoutgoinglinks,analyzingthereferringpage,miningfortextualinformationinthemediafileandutilizingmetadatausingtheDublinCoremetadatamodelortheMPEG-7standard.Thispaperfocusesondescribingamultimediasearchenginethatcombinesfeaturesfromexistingsearchenginesandenhancestheirfunctionalitiesthroughinnovativealgorithmsandmechanisms.Ourgoalisnotonlytodescribethesystem’sarchitectureandinterconnectivity,butalsotoexplainhowthealgorithmscanbeimplementedinJavacode.Theproposedsystem,namedPIDALION,runsonWindowsenvironment,whiletheJavaServerPages(JSP)andJavaServletstechnologiesareadoptedtoensurethesystem’sinteroperabilityanddynamicbehaviour.Thesystem’sdatabaserunsonSQLServer2000.Oneofthekeyfeaturesoftheproposedsearchengineistheprovisionoffullypersonalizedretrievalservices:usersofPIDALIONmaysharetheirpersonalcontenteitherwithallwebusersorwithintheframeofgroups,aswellasmaintainapersonalprofile,wheretheirpreferencesarestored.Personalizedretrievalcanbeachievedthroughthecreationofsocialgroupsandtheuseofdynamicrelevancefeedbackmechanisms,whichtailorthesystem’sperformancetothecurrentuser’spreferences.Thispaperisorganizedasfollows:Section2presentsthesystem’sarchitecture,explainingbrieflytheroleofeachmaincomponent.Sections3to7presentthefunctionality,architectureandkeyfeatures-innovationsofeachcomponent.Keyalgorithmsaredepictedintheformofpseudo-code.Finally,inSection8theissuescoveredinthispaperaresummarizedandfutureexpansionsareproposed.2.SYSTEMOVERVIEWTheplatformdescribedinthispaperconsistsofthefollowingsubsystems:•Themultimediacrawlingsubsystem,whoseroleistoindexmultimediacontentandhandletheupdatingofheindexingprocess•Themultimediametadatasubsystem,whichextractsmetadatafrommultimediacontent,accordingtotheMPEG-7descriptorsachievinginthiswa

1 / 23
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功