基于simio的产品分拣系统建模与仿真

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

HIERARCHICALREINFORCEMENTLEARNINGINCONTINUOUSSTATEANDMULTI-AGENTENVIRONMENTSADissertationPresentedbyMOHAMMADGHAVAMZADEHSubmittedtotheGraduateSchooloftheUniversityofMassachusettsAmherstinpartialfulfillmentoftherequirementsforthedegreeofDOCTOROFPHILOSOPHYSeptember2005ComputerSciencecCopyrightbyMohammadGhavamzadeh2005AllRightsReservedHIERARCHICALREINFORCEMENTLEARNINGINCONTINUOUSSTATEANDMULTI-AGENTENVIRONMENTSADissertationPresentedbyMOHAMMADGHAVAMZADEHApprovedastostyleandcontentby:SridharMahadevan,ChairAndrewG.Barto,MemberVictorR.Lesser,MemberWeiboGong,MemberW.BruceCroft,DepartmentChairComputerScienceTomyparents.ACKNOWLEDGMENTSImustbeginbythankingmymotherandthenproceedtoaskhertoforgivemeforyetanotherfailing:Iamabsolutelyincapableofexpressingthedepthofmygratitudeforherendlesslove,support,andencouragement.IamdeeplygratefultomyadvisorSridharMahadevan,whoseguidance,support,andpatiencewereinstrumentalinbringingthisworktofruition.Sridhargavemetremendousfreedomtoexploreandtrynewideas,whichhashadanessentialroleinmygrowthasaresearcher.ThankyouSridhar.DuringmygraduatestudiesatUMassIhavehadtheopportunitytocollaboratewithAndyBarto.IhavefoundAndyanoutstandingandvisionaryresearcher,andawonderfulhumanbeing.Itwasagreathonorandarealpleasureformetohavehimasamemberofmythesiscommittee.Iamalsoindebtedtotheothermembersofmycommitteefortheirpatienceinread-ingdraftsofmythesis,theirinsightfulcomments,andtheirstimulatingquestionsduringmydefense.IthankVictorLesserforhisconstantsupport,andforhelpingmebetterunderstandresearchdirectionsinmulti-agentsystems;andWeiboGongforinspiringcon-versations.ImustthankDoinaPrecupheartilyforherunwaveringsupportwhilealongvisadelayhadinterruptedmyresearchandalmosteveryotheraspectofmylife.Itisamazinghowone’scareeranddignitycanfallatthemercyofsuchaseeminglybanaluncertaintyasavisadelay.Iamindebtedforhersupportatsuchatime:shemadeeveryefforttomakemefeelpartofthecommunityatthecomputersciencedepartmentatMcGilluniversity.Manyothershavesharedtheirinsightsandcontributedtothedevelopmentoftheideasinthethesis.IespeciallythankBalaramanRavindranandmyoldbuddyKhashayarRohan-vimaneshformanyusefulconversationsandmoreimportantfortheirpreciousfriendship.IthankAndyFaggandMikeRosensteinforexposingmetoawidevarietyoftopicsincontinuousstateandactionreinforcementlearning.IneverforgetAndy’sfriendship,hisdown-to-earthmanner,andhistastyandfreshsalsas.IthankMikewhomadeorganizingaworkshopatAAAI-2004ajoyfulandeducationalexperienceforme.IwanttothankCaroLocusandAliM.Eydgahi,myM.S.andB.S.advisorsfromUniversityofTehran,Iran.Theytaughtmehowtobearesearcher,howtobetterexpressmyideas,andhelpedmeinwritingmyfirstresearchpapers.IalsowanttothankAbdolEsfahanianwithoutwhomitwouldnothavebeenpossibleformetopursuemyeducationintheUnitedStatesofAmerica.IwouldliketothankallthemembersoftheAutonomousLearningLaboratoryatUMass,pastandpresent,fortheirfriendship,fortheirconstantsupportandencourage-ment,forgivingusefulfeedbackduringmypracticetalksandlab-meetingpresentations,andfinallyfortakingcareofmycubicleduringmyunwantedone-yearabsence.ThankyouColinBarringer,JadDavis,AndyFagg,JeffreyJohns,AndersJonsson,GeorgeKonidaris,VictoriaManfredi,AmyMcGovern,SarahOsentoski,TedPerkins,MarcPickett,Balara-manRavindran,KhashayarRohanimanesh,MikeRosenstein,SuchiSaria,AshvinShah,Özgür¸Sim¸sek,AndrewStout,ChrisVigorito,andPippinWolfeformakingourlabsuchanexcellentandenjoyableenvironmentforresearch.IamalsogratefultothemembersofoursmallAutonomousAgentsLaboratoryatMichiganStateUniversity,withwhomIlearnedaboutnewresearchdirections,openprob-lems,andsolutiontechniquesinArtificialIntelligence,MachineLearning,andReinforce-mentLearning:NataliaHernandezGardiol,RajbalaMakar,SilviuMinut,KhashayarRo-hanimanesh,andGeorgiosTheocharous.Iamproudtobelongtoanintellectualcommunitythattreatshopeful,younggradu-atestudentswiththesamerespectasseniorresearchers.Someofthemembersofthiscommunitywhohavebeenparticularlyhelpfulandkindtome,andtheirusefulcommentsvicontributedtothequalityofthisdocumentareDavidAndre,BernhardHengst,ShieMan-nor,DoinaPrecup,RichardSutton,andPrasadTadepalli.ThematerialinthisworkisbaseduponworkcarriedoutintheAutonomousAgentsLaboratoryintheDepartmentofComputerScienceandEngineeringatMichiganStateUniversity,undertheDARPAcontractDAANO2-98-C-4025,andtheAutonomousLearn-ingLaboratoryintheDepartmentofComputerScienceatUniversityofMassachusettsAmherst,undertheNASAcontractNAg-1445#1,andtheNSFgrantECS-0218125.viiABSTRACTHIERARCHICALREINFORCEMENTLEARNINGINCONTINUOUSSTATEANDMULTI-AGENTENVIRONMENTSSEPTEMBER2005MOHAMMADGHAVAMZADEHB.Sc.,UNIVERSITYOFTEHRAN,IRANM.Sc.,UNIVERSITYOFTEHRAN,IRANPh.D.,UNIVERSITYOFMASSACHUSETTSAMHERSTDirectedby:ProfessorSridharMahadevanThisdissertationinvestigatestheuseofhierarchyandabstractionasameansofsolvingcomplexsequentialdecisionmakingproblemssuchasthosewithcontinuousstateand/orcontinuousactionspaces,anddomainswithmultiplecooperativeagents.Thisthesisde-velopsseveralnovelextensionstohierarchicalreinforcementlearning(HRL),anddesignsalgorithmsthatareappropriateforsuchproblems.Ithasb

1 / 20
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功