1 Monitoring and Evaluation of Parallel and Distri

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

1MonitoringandEvaluationofParallelandDistributedSystemsRichardHofmannUniversityErlangen,IMMDVIIMartensstr.3,D-91058Erlangenphone:++49-9131-85-7026email:rhofmann@informatik.uni-erlangen.deAbstractDuetothecomplexinteractionsbetweenactivitiesinparallelprocesses,thedynamicbehaviorofthesystemcan-notbequantifiedapriori.However,aprofoundknowledgeaboutwhatisgoingoninthesystemisthebasisforbal-ancingtheloadinordertooptimallyutilizethepotentialpowerofsuchaparallelsystem.Monitoringisavaluableaidingettingthenecessaryinsightintothisdynamicbe-haviorofinteractingprocesses.Inthefirstpartofthetutorial,theprinciplesofmeasure-ment-basedperformanceanalysisinparallelanddistributedsystemsarediscussed.Generaltopics,concernedwithhardware,software,andhybridmonitoringarepresentedwithexamples,andrulesaregivenforchoosingtheappro-priatemonitoringtechnique.Asanexample,ZM4,auniver-saldistributedmonitorsystemisintroduced.Thesecondpartofthetutorialdealswithalltasksre-latedtotheprocessofpresentingthemeaningoftracedatatohumanbeings.Traceevaluationcanbeperformedwithstatistics-orientedtoolsthatcomputecommontracestatis-tics,findactivities,andvalidateassertionsonsystembe-havior,aswellasinteractivegraphics-orientedtoolsthatpresentstatetimediagramsordrawcausalitydiagramsbetweenprocesstraces.Allthesetoolswillbeintroducedwithexamplesfrommeasurementsatpracticalparallelanddistributedsystems.1IntroductionTherawcomputingpowerofmoderncomputersisgrowingrapidlywithtime.Oneshouldexpectthatthepowerattheuser’sdisposalgrowswiththesamerate.However,experienceshowsasometimesrisingbutsome-timesfallingamountofpowerthatcanbereallyused.Thereasonsforthisphenomenonaremanifold:Usersexpectamorecomfortableenvironmentthatcostscomputingpower,securitymechanismsalsoaccounttoasignificantpartoftherawprocessorpower.Itisprobablynotpossi-bletogetridoftheseeffects.Anothersourceofwastedprocessorpowercanberemediedbycarefuldesignofsoftwaresystemsontheonehandandthoroughanalysisoftheruntimebehaviorontheotherhand.Whileperformanceanalysisisanim-portantissueinmonoprocessorsystems,thisisanindis-pensabletaskinparallelanddistributedsystems.Thisfactiscausedbythecomplexinteractionsbetweenthediffer-entprogrampartsallcooperatinginordertosolveacommontask.Necessarilyusingsharedresourcescausesproblemswithprocesssynchronization,waitingtimes,deadlocksandthelike.Beyondmerelyfunctionalproblemsthisdif-ficultyinmanagingparalleltasks,thereisahighprob-abilityofwastingprocessorpower,i.e.notexploitingtheprocessorpoweratasufficientlyhighlevel.Thistutorialpaperfirstdealswiththebasicproblemsinparallelanddistributedsystemsinordertoprepareacommonknowledgeaboutthereasonsofsuchapowerloss.Ingeneral,thistopiccanbetreatedbyregardingcausalrelationshipsbetweeneventsondifferentcooper-atingprocessors.Inthesecondpartanintroductionintomonitoringofparallelanddistributedsystemswillbepre-sented.Itwillbeshown,howdifferentmonitoringap-proachescanbedesignedbysystemsprogrammersaswellasbyusersofaparallelanddistributedsystem.Parallelanddistributedsystemsrequireamonitoringfacilitythatisabletocopewithalargernumberofproc-essorsaswellaswithspatialdistribution.Forthisreason,ZM4,amonitorsystemthatisbeingusedformanyproj-ectswillbeintroducedasanexampleforstructuringandusingauniversaldistributedmonitorsystem.Usingevent-basedmonitoringtypicallyyieldslargetraceseveniftheeventsarechosencarefully.Inordertoconcentrateworkonpromisingpartsoftheeventtrace,itisnecessarytopointoutthelocationoftheproblem.Therefore,statisticalmethodsareusedforaquickover-viewandacoarseanalysis.Withthatinsight,morede-tailedmethodscanbeapplied.Theircommongoalisnotonlytohaveameasurefortheperformanceofthesesys-temsinpartorintotal,butalsotogetinsightintothedy-namicbehavioroftheprocessesinteracting.Themostimportantmethodsforvisualizingthedy-namicbehaviorofparallelprocessesaretimestatedia-grams(ganttcharts)andcausalitydiagrams(hassedia-grams).Eachofthesemethodsisdiscussedwithanex-ampleintheremainderofthispaper.Basedonthisinfor-mation,thesystemcanbereprogrammedinordertoim-proveitsperformance.22PerformanceProblemsinP&DSystemsTuningprogramsforsingleprocessormachinesisfairlyeasy:useprofilingforfindingoutthosepartsoftheprogramthatareusedpredominantly.Typically,thisisonlyasmallfractionofthewholecode.Rewritingthesepartsoftheprogramyieldahigherperformance.Thissimpleconclusiondoesnotholdforprogramsrunningonparallelanddistributedsystems.Forexample,tuningapartofaprogramthathastowaitforaninterme-diateresultfromanotherprocesswillnotprofitfromthistuning—itsimplyhastowaitforalongertime.Inordertodeterminewhyaprogrambehavesthewayitdoes,thereasonforthisbehaviormustbesought.Thisleadstoregardingcausalityincomputersystems.Ascanbeseeninalatersection,analyzingparallelanddistrib-utedsystemsfromacausalitypointofviewcanleadtointerestingresults.2.1CausalityandComputerSystemsGenerally,thetermcausalitydenotesalaw,whereaspecificactionalwaysleadstothesamespecificresult.Adaptedtocomputersystems,causalitymeans,thatthebehavioroftheirprocessesisruledbythelaws,expressedintheprogram.Here,thefutureofeachprocessdepend

1 / 19
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功