Optimality inequalities for average cost Markov de

loling090o
1 ℃
2020-01-24

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

OptimalityInequalitiesforAverageCostMarkovDecisionProcessesandtheStochasticCashBalanceProblemEugeneA.FeinbergDepartmentofAppliedMathematics&StatisticsStateUniversityofNewYorkatStonyBrookStonyBrook,NY11794-3600efeinberg@notes.cc.sunysb.edu631-632-7189MarkE.LewisSchoolofOperationsResearch&IndustrialEngineeringCornellUniversity226RhodesHall,Ithaca,NY14853mel47@cornell.edu607-255-0757March6,2007Keywords:Markovdecisionprocess,averagecostperunittime,optimalityinequality,optimalpolicy,inventorycontrolMSC2000SubjectClassiﬁcations:Primary:90C40;Secondary:90B05OR/MSsubjectclassiﬁcations:Primary:DynamicProgramming/optimalcontrol/Markov/Inﬁnitestate;Secondary:Inventory/production/Uncertainty/StochasticAbstractForgeneralstateandactionspaceMarkovdecisionprocesses,wepresentsufﬁcientconditionsfortheexistenceofsolutionsoftheaveragecostoptimalityinequalities.Theseconditionsalsoimplytheconvergenceofboththeoptimaldiscountedcostvaluefunctionandpoliciestothecorrespondingobjectsfortheaveragecostsperunittimecase.Inventorymodelsarenaturalapplicationsofourresults.Wedescribestructuralpropertiesofaveragecostoptimalpoliciesforthecashbalanceproblem;aninven-torycontrolproblemwherethedemandmaybenegativeandthedecision-makercanproduceorscrapinventory.Wealsoshowtheconvergenceofoptimalthresholdsintheﬁnitehorizoncasetothoseundertheexpecteddiscountedcostcriterionandthoseundertheexpecteddiscountedcoststothoseundertheaveragecostsperunittimecriterion.1IntroductionInadiscrete-timeMarkovdecisionprocess(MDP)theusualmethodtostudytheaveragecostcriterionistoﬁndasolutiontotheaveragecostoptimalityequations.Apolicythatachievestheminimuminthissystemofequationsisthenaveragecostoptimal.Whenthestateandactionspacesareinﬁnite,onemayberequiredtoreplacetheequationswithinequalities,yettheconclusionsarethesame;apolicythatachievestheminimum1intheinequalitiesisaveragecostoptimal.Sch¨al[27]providestwogroupsofgeneralconditionsthatimplytheexistenceofasolutiontotheaveragecostoptimalityinequalities(ACOI).Theﬁrstgroup,referredtoasAssumptions(W)inSch¨al[27],requireweakcontinuityofthetransitionprobabilities.Thesecondgroup,Assumptions(S),requiresetwisecontinuityofthetransitionprobabilities.Ineithercase,foreachstateacompactactionsetwasassumedin[27].ThepurposeofthispaperistoadaptSch¨al’s[27]conditionstoproblemswithnoncompactactionsets;inparticulartothoserelatedtoinventorycontrol.Aswasnotedin[12],typicalinventorycontrolmodels(withgeneraldemanddistributions)requireweakcontinuity;setwisecontinuityisnotenoughtoyieldtheconclusionsthattheACOIhaveasolution.Ontheotherhand,whenthedemanddistributionisrestrictedtobecontinuousortheinventoryisrestrictedtobeinteger,weshowthatsetwisecontinuitydoessufﬁce.ThebooksbySennott[28]andHern´andez-LermaandLasserre[19]dealwithcountableandgeneralstateMDPs,respectively.Hern´andez-LermaandLasserre[19,Chapter5],Hern´andez-Lerma[18],andFern´andez-Gaucherand[14]presentresultsfornon-compactactionsetsbutassumesetwisecontinuity.TheresultsofHern´andez-Lerma[18]extendSch¨al’s[27]resultsonMDP’swithsetwisecontinuoustransitionprobabilitiesfromcompactactionsetstononcompactactionsets.InthispaperwestudyMDPswithweaklycontinuoustransitionprobabilities.Themajormotivationforthisstudyistheirrelevancetoinventorycontrolproblems.Section5.7inHern´andez-LermaandLasserre[19]providesconditionsfortheexistenceofstationaryoptimalpoliciesforanMDPwithweaklycontinuoustransitionprobabilitiesbutthederivationisdonedirectly;withoutderivingtheoptimalityequationsorin-equalities.Weareinterestednotonlyintheexistenceofoptimalpoliciesbutinthevalidityoftheoptimalityinequalities.Thisisanimportantstepsincetheseinequalitiescanbeusedtoprovestructuralpropertiesofoptimalstationarypoliciesandtoproveconvergenceofdiscountedcostoptimalpoliciestoaveragecostoptimalpolicies.Werecallthat,accordingtotheexampleconstructedbyCavazos-Cadena[4],optimal-ityinequalitiesmayholdforanMDPforwhichoptimalityequalitiesdonothold.Inaddition,optimalityinequalitiesimplytheexistenceofoptimalpolicies[27,Proposition1.3].Inthispaperweconsideraclassofproblemswithnoncompactactionsetsandintroduceanadditionalcondition,Assumption(LB),thatstatesthatacertainfunction,relevanttotherelativevaluefunctions,is2locallyboundedfromabove.Thisassumptionisimportantforthefollowingreasons:(i)ifitissatisﬁed,noncompactactionsetscanbereducedtocompactactionsubsetsinawaythatthevaluefunctionsremainunchangedandSch¨al’sassumptions[27]hold,(ii)itcanbeveriﬁedeasily,and(iii)ittypicallyholdsforinventorycontrolproblems.Thus,thispaperprovidesstraightforwardtoolstoanalyzeinventorycontrolproblemswiththeaveragecostsperunittimecriterionandwithouttheassumptionthatthedemandiseitherdiscreteorcontinuous.Thoughtheoptimalityof(s;S)policiesforperiodicreviewinventorycontrolproblemsisawell-knownfact,foraveragecostproblemsitsrigorousproofwithouttheassumptionthatthedemandiseitherdiscreteorcontinuouswasestablishednotlongagobyChenandSimchi-Levi[7].Evenso,theproofprovidedin[7]isnontrivialandproblem-speciﬁc.FeinbergandLewis[13]providedastraightforwardproofofthisfactbyusingtheresultsdescri