Learning in real-time search A unifying framework

淡雅de格调
1 ℃
2020-07-02

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

JournalofArtiﬁcialIntelligenceResearch25(2006)119-157Submitted04/05;published02/06LearninginReal-TimeSearch:AUnifyingFrameworkVadimBulitkoBULITKO@UALBERTA.CAGregLeeGREGLEE@CS.UALBERTA.CADepartmentofComputingScienceUniversityofAlbertaEdmonton,AlbertaT6G2E8,CANADAAbstractReal-timesearchmethodsaresuitedfortasksinwhichtheagentisinteractingwithaninitiallyunknownenvironmentinrealtime.Insuchsimultaneousplanningandlearningproblems,theagenthastoselectitsactionsinalimitedamountoftime,whilesensingonlyalocalpartoftheenviron-mentcenteredattheagent’scurrentlocation.Real-timeheuristicsearchagentsselectactionsusingalimitedlookaheadsearchandevaluatingthefrontierstateswithaheuristicfunction.Overre-peatedexperiences,theyreﬁneheuristicvaluesofstatestoavoidinﬁniteloopsandtoconvergetobettersolutions.Thewidespreadofsuchsettingsinautonomoussoftwareandhardwareagentshasledtoanexplosionofreal-timesearchalgorithmsoverthelasttwodecades.Notonlyisapotentialuserconfrontedwithahodgepodgeofalgorithms,buthealsofacesthechoiceofcontrolparameterstheyuse.Inthispaperweaddressbothproblems.Theﬁrstcontributionisanintroductionofasim-plethree-parameterframework(namedLRTS)whichextractsthecoreideasbehindmanyexistingalgorithms.WethenprovethatLRTA*,-LRTA*,SLA*,andγ-Trapalgorithmsarespecialcasesofourframework.Thus,theyareuniﬁedandextendedwithadditionalfeatures.Second,weprovecompletenessandconvergenceofanyalgorithmcoveredbytheLRTSframework.Third,weproveseveralupper-boundsrelatingthecontrolparametersandsolutionquality.Finally,weanalyzetheinﬂuenceofthethreecontrolparametersempiricallyintherealisticscalabledomainsofreal-timenavigationoninitiallyunknownmapsfromacommercialrole-playinggameaswellasroutinginadhocsensornetworks.1.MotivationInthispaper,weconsiderasimultaneousplanningandlearningproblem.Onemotivatingapplica-tionlieswithnavigationonaninitiallyunknownmapunderreal-timeconstraints.Asanexample,considerarobotdrivingtoworkeverymorning.Imaginetherobottobeanewcomertothetown.Theﬁrstroutetherobotﬁndsmaynotbeoptimalbecausetrafﬁcjams,roadconditions,andotherfactorsareinitiallyunknown.Withthepassageoftime,therobotcontinuestolearnandeventuallyconvergestoanearlyoptimalcommute.Notethatplanningandlearninghappenwhiletherobotisdrivingandthereforearesubjecttotimeconstraints.Present-daymobilerobotsareoftenplaguedbylocalizationproblemsandpowerlimitations,buttheirsimulationcounter-partsalreadyallowresearcherstofocusontheplanningandlearningproblem.Forinstance,theRoboCupRescuesimulationleague(Kitano,Tadokoro,Noda,Matsub-ara,Takahashi,Shinjou,&Shimada,1999)requiresreal-timeplanningandlearningwithmultipleagentsmappingoutanunknownterrain.Pathﬁndingisdoneinrealtimeasvariouscrises,involvingﬁrespreadandhumanvictimstrappedinrubble,progresswhiletheagentsplan.Similarly,manycurrent-generationreal-timestrategygamesemployaprioriknownmaps.FullknowledgeofthemapsenablescompletesearchmethodssuchasA*(Hart,Nilsson,&Raphael,c2006AIAccessFoundation.Allrightsreserved.BULITKO&LEE1968)andDijkstra’salgorithm(Dijkstra,1959).Prioravailabilityofthemapsallowspathﬁndingenginestopre-computevariousdatatospeedupon-linenavigation.Examplesofsuchdataincludevisibilitygraphs(Woodcock,2000),inﬂuencemaps(Pottinger,2000),spacetriangulation(Kall-mann,Bieri,&Thalmann,2003),stateabstractionhierarchies(Holte,Drummond,Perez,Zimmer,&MacDonald,1994;Holte,1996;Botea,M¨uller,&Schaeffer,2004)androutewaypoints(Reece,Krauss,&Dumanoir,2000).However,theforthcominggenerationsofcommercialandacademicgames(Buro,2002)willrequiretheagenttocopewithinitiallyunknownmapsviaexplorationandlearningduringthegame,andthereforewillgreatlylimittheapplicabilityofcompletesearchalgorithmsandpre-computationtechniques.IncrementalsearchmethodssuchasdynamicA*(D*)(Stenz,1995)andD*Lite(Koenig&Likhachev,2002)candealwithinitiallyunknownmapsandarewidelyusedinrobotics,includingDARPA’sUnmannedGroundVehicleprogram,Marsrover,andothermobilerobotprototypes(Her-bert,McLachlan,&Chang,1999;Thayer,Digney,Diaz,Stentz,Nabbe,&Hebert,2000).Theyworkwellwhentherobot’smovementsareslowwithrespecttoitsplanningspeed(Koenig,2004).Inreal-timestrategygames,however,theAIenginecanberesponsibleforhundredstothousandsofagentstraversingamapsimultaneouslyandtheplanningcostbecomesamajorfactor.Toillus-trate:evenatthesmallerscaleofthesix-yearold“AgeofEmpires2”(Ensemble-Studios,1999),60-70%ofsimulationtimeisspentinpathﬁnding(Pottinger,2000).Thisgivesrisetothefollowingquestions:1.Howcanplanningtimepermove,andparticularlytheﬁrst-movedelay,beminimizedsothateachagentmovessmoothlyandrespondstouserrequestsnearlyinstantly?2.Givenreal-timeexecution,localsensoryinformation,andinitiallyunknownterrain,howcantheagentlearnanear-optimalpathand,atthesametime,minimizethelearningtimeandmemoryrequired?Therestofthepaperisorganizedasfollows.Weﬁrstintroduceafamilyofreal-timesearchalgo-rithmsdesignedtoaddressthesequestions.Wethenmaketheﬁrstcontributionbydeﬁningasimpleparameterizedframeworkthatuniﬁesandextendsseveralpopularreal-timesearchalgorithms.Thesecondcontributionlieswithatheoreticalanalysisoftheresultingframe