DataMiningUsingSAS®EnterpriseMiner:ACaseStudyApproach,SecondEditionThecorrectbibliographiccitationforthismanualisasfollows:SASInstituteInc.2003.DataMiningUsingSAS®EnterpriseMinerTM:ACaseStudyApproach,SecondEdition.Cary,NC:SASInstituteInc.DataMiningUsingSAS®EnterpriseMinerTM:ACaseStudyApproach,SecondEditionCopyright©2003,SASInstituteInc.,Cary,NC,USAISBN1-59047-395-7Allrightsreserved.ProducedintheUnitedStatesofAmerica.Youruseofthise-bookshallbegovernedbythetermsestablishedbythevendoratthetimeyouacquirethise-book.U.S.GovernmentRestrictedRightsNotice:Use,duplication,ordisclosureofthissoftwareandrelateddocumentationbytheU.S.governmentissubjecttotheAgreementwithSASInstituteandtherestrictionssetforthinFAR52.227-19,CommercialComputerSoftware-RestrictedRights(June1987).SASInstituteInc.,SASCampusDrive,Cary,NorthCarolina27513.1stprinting,April2003SASPublishingprovidesacompleteselectionofbooksandelectronicproductstohelpcustomersuseSASsoftwaretoitsfullestpotential.Formoreinformationaboutoure-books,e-learningproducts,CDs,andhard-copybooks,visittheSASPublishingWebsiteatsupport.sas.com/pubsorcall1-800-727-3228.SAS®andallotherSASInstituteInc.productorservicenamesareregisteredtrademarksortrademarksofSASInstituteInc.intheUSAandothercountries.®indicatesUSAregistration.Otherbrandandproductnamesaretrademarksoftheirrespectivecompanies.ContentsChapter1IntroductiontoSASEnterpriseMiner1StartingEnterpriseMiner1SettingUptheInitialProjectandDiagram2IdentifyingtheInterfaceComponents3DataMiningandSEMMA4AccessingSASDatathroughSASLibraries16Chapter2PredictiveModeling19ProblemFormulation20CreatingaProcessFlowDiagram21DataPreparationandInvestigation34FittingandComparingCandidateModels58GeneratingandUsingScoringCode72GeneratingaReportUsingtheReporterNode80Chapter3VariableSelection83IntroductiontoVariableSelection83UsingtheVariableSelectionNode84Chapter4ClusteringTools91ProblemFormulation91OverviewofClusteringMethods92Chapter5AssociationAnalysis105ProblemFormulation105Chapter6LinkAnalysis111ProblemFormulation111ExaminingWebLogData111Appendix1RecommendedReading121RecommendedReading121Index123iv1CHAPTER1IntroductiontoSASEnterpriseMinerStartingEnterpriseMiner1SettingUptheInitialProjectandDiagram2IdentifyingtheInterfaceComponents3DataMiningandSEMMA4DefinitionofDataMining4OverviewoftheData4PredictiveandDescriptiveTechniques5OverviewofSEMMA5OverviewoftheNodes6SampleNodes6ExploreNodes7ModifyNodes9ModelNodes11AssessNodes13ScoringNodes14UtilityNodes14SomeGeneralUsageRulesforNodes15AccessingSASDatathroughSASLibraries16StartingEnterpriseMinerTostartEnterpriseMiner,startSASandthentypeminerontheSAScommandbar.SubmitthecommandbypressingtheReturnkeyorbyclickingthecheckmarkiconnexttothecommandbar.Alternatively,selectfromthemainmenuSolutionsAnalysisEnterpriseMinerFormoreinformation,seeGettingStartedwithSASEnterpriseMiner.2SettingUptheInitialProjectandDiagramChapter1SettingUptheInitialProjectandDiagramEnterpriseMinerorganizesdataanalysesintoprojectsanddiagrams.Eachprojectmayhaveseveralprocessflowdiagrams,andeachdiagrammaycontainseveralanalyses.Typicallyeachdiagramcontainsananalysisofonedataset.Followthesestepstocreateaproject.1FromtheSASmenubar,selectFileNewProject2Typeanamefortheproject,suchasMyProject.3SelecttheClient/serverprojectcheckboxifnecessary.Note:YoumusthavetheaccesstoaserverthatrunsthesameversionofEnterpriseMiner.Forinformationaboutbuildingaclient/serverproject,seeGettingStartedwithSASEnterpriseMinerortheonlineHelp. 4ModifythelocationoftheprojectfolderbyeithertypingadifferentlocationintheLocationfieldorbyclickingBrowse.5ClickCreate.Theprojectopenswithaninitialuntitleddiagram.6Selectthediagramtitleandtypeanewname,suchasMyFirstFlow.IdentifyingtheInterfaceComponents3IdentifyingtheInterfaceComponentsTheSASEnterpriseMinerwindowcontainsthefollowinginterfacecomponents: ProjectNavigator—enablesyoutomanageprojectsanddiagrams,addtoolstotheDiagramWorkspace,andviewHTMLreportsthatarecreatedbytheReporternode.NotethatwhenatoolisaddedtotheDiagramWorkspace,thetoolisreferredtoasanode.TheProjectNavigatorhasthreetabs: Diagramstab—liststhecurrentprojectandthediagramswithintheproject.Bydefault,theprojectwindowopenswiththeDiagramstabactivated. Toolstab—containstheEnterpriseMinertoolspalette.Thistabenablesyoutoseeallofthetools(ornodes)thatareavailableinEnterpriseMiner.ThetoolsaregroupedaccordingtotheSEMMAdata-miningmethodology.ManyofthecommonlyusedtoolsareshownontheToolsBaratthetopofthewindow.YoucanaddadditionaltoolstotheToolsBarbydraggingthemfromtheToolstabontotheToolsBar.Inaddition,youcanrearrangethetoolsontheToolsBarbydraggingeachtooltoanewlocationontheToolsBar. Reportstab—displaystheHTMLreportsthataregeneratedbyusingtheReporternode. DiagramWorkspace—enablesyoutobuild,edit,run,andsaveprocessflowdiagrams. ToolsBar—containsacustomizablesubsetofEnterpriseMinertoolsthatarecommonlyusedtobuildprocessflowdiagramsintheDiagramWorkspace.YoucanaddordeletetoolsfromtheToolsBar. ProgressIndicator—displaysaprogressindicatorbarthatindicatestheexecutionstatusofanEnterpriseMinertask. MessageP