Stochastic shortest path games

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

STOCHASTICSHORTESTPATHGAMESSTEPHEND.PATEKyANDDIMITRIP.BERTSEKASzAbstract.Weconsiderdynamic,two-player,zero-sumgameswherethe\minimizingplayerseekstodriveanunderlyingnite-statedynamicsystemtoaspecialterminalstatealongaleastexpectedcostpath.The\maximizerseekstointerferewiththeminimizer’sprogresssoastomaximizetheexpectedtotalcost.Weconsider,forthersttime,undiscountednite-stateproblems,withcompactactionspaces,andtransitioncoststhatarenotstrictlypositive.Weadmitthattherearepoliciesfortheminimizerwhichpermitthemaximizertoprolongthegameindenitely.Underassumptionswhichgeneralizedeterministicshortestpathproblems,weestablish(i)theexistenceofareal-valuedequilibriumcostvectorachievablewithstationarypoliciesfortheopposingplayersand(ii)theconvergenceofvalueiterationandpolicyiterationtotheuniquesolutionofBellman’sequation.Keywords.gametheory,stochasticgames,optimization,dynamicprogramming,stochasticshortestpathsAMSsubjectclassications.90D15,93E05,49L201.Introduction.Thispaperdevelopsbasictheoryrelatingtostochasticshort-estpathgames.Thesearetwo-player,zero-sum,gameswheretheminimizingplayerseekstodriveanunderlyingnite-statedynamicsystemtoaspecialterminalstatealongaleastexpectedcostpath.Themaximizerseekstointerferewiththemini-mizer’sprogresssoastomaximizetheexpectedtotalcost.Inactualplay,theplayersimplementactionssimultaneouslyateachstage,withfullknowledgeofthestateofthesystembutwithoutknowledgeofeachother’scurrentdecision.Gamesofthistypehavebeenstudiedforsometime.TheeldwasinitiatedbyShapleyinhisclassicalpaper[7].InShapley’sgames,twoplayersaresuccessivelyfacedwithmatrix-games(inmixedstrategies)whereboththeimmediatecostandtransitionprobabilitiestonewmatrix-gamesareinuencedbythestagewisedecisionsoftheplayers.Inthisformulation,thestateofthesystemisthematrix-gamecurrentlybeingplayed.Itisassumedthatthissetofstatesisniteandthatthereisanon-zerominimalprobabilitythat,atanystage,thegamewilltransitiontoaterminalstate,endingthesequenceofrewardsandpayos.Itturnsoutthatthisisequivalenttoaninnite-horizongamewithdiscountedadditivecost.Theanalysisofsuchgamesisstraightforward,themainresultsbeing(i)theexistenceandcharacterizationofauniquereal-valuedequilibriumcostvectorachievableinstationaryrandomizedpoliciesand(ii)theconvergenceofvalueiterationandpolicyiterationtotheequilibriumcost.SinceShapley’swork,gametheoristshaveactivelystudiedextensionstothediscounted-costmodel.In[4],KushnerandChamberlainconsiderundiscounted,pur-suit/evasion,stochasticgameswherethereisaterminalstatecorrespondingtotheevaderbeing\caught.Thestatespaceisassumedtobenite(withn+1elements,oneofwhichistheterminalstate).Makingsomeregularityassumptionsonthetran-sitionprobabilitiesandcostfunctions,theyconsiderpurestrategiesovercompactactionspaces.Inaddition,theyassumethateither,SupportedbytheNationalScienceFoundationGrant9300494-DMIandanOceofNavalRe-searchfellowship.yDepartmentofSystemsEngineering,SchoolofEngineeringandAppliedScience,UniversityofVirginia,OlssonHall,Charlottesville,Virginia,22903(sdp5f@virginia.edu)zMassachusettsInstituteofTechnology,LaboratoryforInformationandDecisionSystems,Room35-210,Cambridge,MA02139(dimitrib@mit.edu)12S.D.PATEKANDD.P.BERTSEKAS1.Then-stageprobabilitytransitionmatrix[P(;)]n(fromnon-terminalstatestonon-terminalstates)isa\uniformcontractioninthestationarypolicypairs(;)ofthetwoplayers.(Thatis,forsome0,[P(;)]nhasrow-sumslessthan1forallstationarypolicypairs(;).)Or,2.Thetransitioncosts(tothepursuer)areuniformlyboundedbelowby0andthereexistsastationarypolicy~forthepursuerthatmakes[P(~;)]nauniformcontractionunderallstationarypoliciesfortheevader.Theyshowthatthereexistsanequilibriumcostvectorforthegamewhichcanbefoundthroughvalueiteration.In[10],vanderWalconsidersaspecialcaseofKushnerandChamberlain’sgames.Undermorerestrictiveassumptionsaboutthepursuer’sabilitytocatchtheevader,hegiveserrorboundsfortheupdatesinvalueiteration.In[3],KumarandShiaugiveadetailedanalysisofstochasticgameswithverymildassumptionsaboutthestatespaceandcontrolconstraintsets.Forthecaseofnonnegativeadditivecost(withnodiscounting),theyestablishtheexistenceofanextendedrealequilibriumcostvectorinnon-Markovrandomizedpolicies(whereforbothplayersthebestmixedactioncandependonallofthepaststatesandcontrols,aswellasthecurrentstate).TheyshowthattheminimizingplayercanachievetheequilibriumusingastationaryMarkovrandomizedpolicyandthat,incasethestatespaceisnite,themaximizingplayercanplay-optimallyusingstationaryrandomizedpolicies.Otherresearchershavestudiedso-called\non-terminatingstochasticgames(alsosometimescalled\undiscountedstochasticgames),wherethecostsarenotdis-countedbutareaveragedinstead.Suchaverage-costgameshavearichmathematicalstructurewhichhasbeenextensivelycoveredintheliterature[13,5].Inthispaper,weconsiderundiscountedadditivecostgameswithoutaveraging.Weadmitthattherearepoliciesfortheminimizerwhichallowthemaximizertoprolongthegameindenitelyatinnitecosttotheminimizer.Wedonotassumenonnegativityofcost,asin[4]and[3].Wemakealternativeassumptionswhichguaranteethat

1 / 20
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功