Layered Learning in Multi-Agent Systems

zht888360
1 ℃
2020-03-14

整理文档很辛苦，赏杯茶钱您下走！

还剩 ... 页未读，继续阅读 >>

免费阅读已结束，点击下载阅读编辑剩下 ... 页

阅读已结束，您可以下载文档离线阅读编辑

资源描述

LayeredLearninginMulti-AgentSystemsPeterStoneDecember15,1998CMU-CS-98-187SchoolofComputerScienceCarnegieMellonUniversityPittsburgh,PA15213-3891SubmittedinpartialfulllmentoftherequirementsforthedegreeofDoctorofPhilosophyThesisCommittee:ManuelaM.Veloso,ChairAndrewW.MooreHerbertA.SimonVictorR.Lesser(UniversityofMassachusetts,Amherst)Copyrightc1998PeterStoneTheworkhasbeensupportedthroughthegenerosityoftheNASAGraduateStudentResearchPro-gram(GSRP).ThisresearchisalsosponsoredinpartbytheDefenseAdvancedResearchProjectsAgency(DARPA),andRomeLaboratory,AirForceMaterielCommand,USAF,underagreementnumbersF30602-95-1-0018,F30602-97-2-0250andF30602-98-2-0135andinpartbytheDepartmentoftheNavy,OceofNavalResearchundercontractnumberN00014-95-1-0591.Viewsandconclusionscontainedinthisdocu-mentarethoseoftheauthorsandshouldnotbeinterpretedasnecessarilyrepresentingtheocialpoliciesorendorsements,eitherexpressedorimplied,ofNASA,theDefenseAdvancedResearchProjectsAgency(DARPA),theAirForceResearchLaboratory(AFRL),theDepartmentoftheNavy,OceofNavalRe-search,ortheU.S.Government.Keywords:Multi-agentsystems,machinelearning,multi-agentlearning,controllearning,hierarchicallearning,reinforcementlearning,decisiontreelearning,neuralnetworks,roboticsoccer,networkroutingAbstractMulti-agentsystemsincomplex,real-timedomainsrequireagentstoacteectivelybothau-tonomouslyandaspartofateam.Thisdissertationaddressesmulti-agentsystemsconsistingofteamsofautonomousagentsactinginreal-time,noisy,collaborative,andadversarialenvi-ronments.Becauseoftheinherentcomplexityofthistypeofmulti-agentsystem,thisthesisinvestigatestheuseofmachinelearningwithinmulti-agentsystems.ThedissertationmakesfourmaincontributionstotheeldsofMachineLearningandMulti-AgentSystems.First,thethesisdenesateammemberagentarchitecturewithinwhichaexibleteamstructureispresented,allowingagentstodecomposethetaskspaceintoexiblerolesandallowingthemtosmoothlyswitchroleswhileacting.Teamorganizationisachievedbytheintroductionofalocker-roomagreementasacollectionofconventionsfollowedbyallteammembers.Itdenesagentroles,teamformations,andpre-compiledmulti-agentplans.Inaddition,theteammemberagentarchitectureincludesacommunicationparadigmfordomainswithsingle-channel,low-bandwidth,unreliablecommunication.Thecommunica-tionparadigmfacilitatesteamcoordinationwhilebeingrobusttolostmessagesandactiveinterferencefromopponents.Second,thethesisintroduceslayeredlearning,ageneral-purposemachinelearningparadigmforcomplexdomainsinwhichlearningamappingdirectlyfromagents’sensorstotheiractuatorsisintractable.Givenahierarchicaltaskdecomposition,layeredlearningallowsforlearningateachlevelofthehierarchy,withlearningateachleveldirectlyaectinglearningatthenexthigherlevel.Third,thethesisintroducesanewmulti-agentreinforcementlearningalgorithm,namelyteam-partitioned,opaque-transitionreinforcementlearning(TPOT-RL).TPOT-RLisde-signedfordomainsinwhichagentscannotnecessarilyobservethestatechangeswhenotherteammembersact.Itexploitslocal,action-dependentfeaturestoaggressivelygeneralizeitsinputrepresentationforlearningandpartitionsthetaskamongtheagents,allowingthemtosimultaneouslylearncollaborativepoliciesbyobservingthelong-termeectsoftheiractions.Fourth,thethesiscontributesafullyfunctioningmulti-agentsystemthatincorporateslearninginareal-time,noisydomainwithteammatesandadversaries.Detailedalgorithmicdescriptionsoftheagents’behaviorsaswellastheirsourcecodeareincludedinthethesis.Empiricalresultsvalidateallfourcontributionswithinthesimulatedroboticsoccerdo-main.Thegeneralityofthecontributionsisveriedbyapplyingthemtotherealroboticsoccer,andnetworkroutingdomains.Ultimately,thisdissertationdemonstratesthatbylearningportionsoftheircognitiveprocesses,selectivelycommunicating,andcoordinatingtheirbehaviorsviacommonknowledge,agroupofindependentagentscanworktowardsacommongoalinacomplex,real-time,noisy,collaborative,andadversarialenvironment.34AcknowledgementsIwouldliketothankmanypeoplefortheirsupport,encouragementandguidanceduringmyyearsasagraduatestudenthereatCMU.Firstandforemost,thisdissertationrepresentsagreatdealoftimeandeortnotonlyonmypart,butonthepartofmyadvisor,ManuelaVeloso.Shehashelpedmeshapemyresearchfromdayone,pushedmetogetthroughtheinevitableresearchsetbacks,andencouragedmetoachievetothebestofmyability.WithoutManuela,thisdissertationwouldnothavehappened.Ialsothankmyotherthreecommitteemembers,AndrewMoore,HerbSimon,andVictorLesserforvaluablediscussionsandcommentsregardingmyresearch.Almostallresearchinvolvingrobotsisagroupeort.ThemembersoftheCMUro-bosoccerlabhaveallcontributedtomakingmyresearchpossible.SorinAchim,whohasbeenwithourprojectalmostfromthebeginninghastirelesslyexperimentedwithdierentrobotarchitectures,alwaysmanagingtopullthingstogetherandcreateworkinghardwareintimeforcompetitions.KwunHanwasapartnerinthesoftwaredevelopmentoftheCMUnited-97team,aswellasaninstrumentalhardwaredeveloperforCMUnited-98.MikeBowlingsuccessfullycreatedanewsoftwareapproachfortheCMUnited-98robots.Healsocollaboratedonanearlysimulatoragentimplementation.