TORQUE-Administrators-Guide

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

TORQUE®AdministratorGuideversion3.0.2TORQUEAdminManualversion3.0.2LegalNoticesPrefaceDocumentationOverviewIntroductionGlossary1.0Overview1.1Installation1.2Initialize/ConfigureTORQUEontheServer(pbs_server)1.3AdvancedConfiguration1.4ManualSetupofInitialServerConfiguration1.5ServerNodeFileConfiguration1.6TestingServerConfiguration1.7TORQUEonNUMASystems1.8TORQUEMulti-MOM2.0SubmittingandManagingJobs2.1JobSubmission2.2MonitoringJobs2.3CancelingJobs2.4JobPreemption2.5KeepingCompletedJobs2.6JobCheckpointandRestart2.7JobExitStatus2.8ServiceJobs3.0ManagingNodes3.1AddingNode3.2ConfiguringNodeProperties3.3ChangingNodeState3.4HostSecurity3.5LinuxCpusetSupport3.6SchedulingCores3.7SchedulingGPUs4.0SettingServerPolicies4.1QueueConfiguration4.2ServerHighAvailability5.0InterfacingwithaScheduler5.1IntegratingSchedulersforTORQUE6.0ConfiguringDataManagement6.1SCP/RCPSetup6.2NFSandOtherNetworkedFilesystems6.3FileStage-In/Stage-Out7.0InterfacingwithMessagePassing7.1MPI(MessagePassingInterface)Support8.0ManagingResources8.1MonitoringResources9.0Accounting9.1AccountingRecords10.0Logging10.1JobLogging11.0TroubleShooting11.1Troubleshooting11.2ComputeNodeHealthCheck11.3DebuggingAppendicesAppendixA:CommandsOverviewClientCommandsmomctlpbsdshpbsnodesqalterqchkptqdelqholdqmgrqrerunqrlsqrunqsigqstatqsubqtermtracejobServerCommandspbs_mompbs_serverpbs_trackAppendixB:ServerParametersAppendixC:MOMConfigurationAppendixD:ErrorCodesandDiagnosticsAppendixE:ConsiderationsBeforeUpgradingAppendixF:LargeClusterConsiderationsAppendixG:PrologueandEpilogueScriptsAppendixH:RunningMultipleTORQUEServersandMomsontheSameNodeAppendixI:SecurityOverviewAppendixJ:SubmitFilter(akaqsubWrapper)AppendixK:torque.cfgFileAppendixL:TORQUEQuickStartGuideChangelogLegalNoticesCopyright©2011AdaptiveComputingEnterprises,Inc.Allrightsreserved.DistributionofthisdocumentforcommercialpurposesineitherhardorsoftcopyformisstrictlyprohibitedwithoutpriorwrittenconsentfromAdaptiveComputingEnterprises,Inc.TrademarksAdaptiveComputing,ClusterResources,Moab,MoabWorkloadManager,MoabClusterManager,MoabClusterSuite,MoabGridScheduler,MoabGridSuite,MoabAccessPortal,andotherAdaptiveComputingproductsareeitherregisteredtrademarksortrademarksofAdaptiveComputingEnterprises,Inc.TheAdaptiveComputinglogoandtheClusterResourceslogoaretrademarksofAdaptiveComputingEnterprises,Inc.Allothercompanyandproductnamesmaybetrademarksoftheirrespectivecompanies.AcknowledgmentsTORQUEincludessoftwaredevelopedbyNASAAmesResearchCenter,LawrenceLivermoreNationalLaboratory,andVeridianInformationSolutions,Inc.Visit(optional)qmgroptionsnecessarytogetthesystemupandrunning.SystemTestingisalsocovered.The2.0SubmittingandManagingJobssectioncoversdifferentactionsapplicabletojobs.Thefirstsection,2.1JobSubmission,detailshowtosubmitajobandrequestresources(nodes,softwarelicenses,andsoforth)andprovidesseveralexamples.Otheractionsincludemonitoring,canceling,preemption,andkeepingcompletedjobs.The3.0ManagingNodessectioncoversadministratortasksrelatingtonodes,whichincludesthefollowing:addingnodes,changingnodeproperties,andidentifyingstate.Alsoanexplanationofhowtoconfigurerestricteduseraccesstonodesiscoveredinsection3.4HostSecurity.The4.0SettingServerPoliciessectiondetailsserversideconfigurationsofqueueandhighavailability.The5.0InterfacingwithaSchedulersectionoffersinformationaboutusingthenativeschedulerversusanadvancedscheduler.The6.0ConfiguringDataManagementsectiondealswithissuesofdatamanagement.Fornon-networkfilesystems,theSCP/RCPSetupsectiondetailssettingupSSHkeysandnodestoautomatetransferringdata.TheNFSandOtherNetworkedFileSystemssectioncoversconfigurationforthesefilesystems.ThischapteralsoaddressestheuseofFileStage-In/Stage-Outusingthestageinandstageoutdirectivesoftheqsubcommand.The7.0InterfacingwithMessagePassingsectionoffersdetailssupportingMPI(MessagePassingInterface).The8.0ManagingResourcessectioncoversconfiguration,utilization,andstatesofresources.The9.0AccountingsectionexplainshowjobsaretrackedbyTORQUEforaccountingpurposes.The10.0Troubleshootingsectionisatroubleshootingguidethatoffershelpwithgeneralproblems;itincludesanFAQ(FrequentlyAskedQuestions)listandinstructionsforhowtosetupandusecomputenodechecksandhowtodebugTORQUE.Thenumerousappendicesprovidetablesofcommands,parameters,configurationoptions,errorcodes,theQuickStartGuide,andsoforth.A.CommandsOverviewB.ServerParametersC.MOMConfigurationD.ErrorCodesandDiagnosticsE.ConsiderationsBeforeUpgradingF.LargeClusterConsiderationsG.PrologueandEpilogueScriptsH.RunningMultipleTORQUEServersandMoms

1 / 238
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功