8-1数据仓库与数据挖掘

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

DatabaseSystemConcepts,6thEd.©Silberschatz,KorthandSudarshanSeewww.db-book.comforconditionsonre-useChapter20:DataAnalysis©Silberschatz,KorthandSudarshan20.2DatabaseSystemConcepts-6thEditionChapter20:DataAnalysisDecisionSupportSystemsDataWarehousingDataMiningClassificationAssociationRulesClustering©Silberschatz,KorthandSudarshan20.3DatabaseSystemConcepts-6thEditionDecisionSupportSystemsDecision-supportsystemsareusedtomakebusinessdecisions,oftenbasedondatacollectedbyon-linetransaction-processingsystems.Examplesofbusinessdecisions:Whatitemstostock?Whatinsurancepremiumtochange?Towhomtosendadvertisements?ExamplesofdatausedformakingdecisionsRetailsalestransactiondetailsCustomerprofiles(income,age,gender,etc.)©Silberschatz,KorthandSudarshan20.4DatabaseSystemConcepts-6thEditionDecision-SupportSystems:OverviewDataanalysistasksaresimplifiedbyspecializedtoolsandSQLextensionsExampletasksForeachproductcategoryandeachregion,whatwerethetotalsalesinthelastquarterandhowdotheycomparewiththesamequarterlastyearAsabove,foreachproductcategoryandeachcustomercategoryStatisticalanalysispackages(e.g.,:S++)canbeinterfacedwithdatabasesStatisticalanalysisisalargefield,butnotcoveredhereDataminingseekstodiscoverknowledgeautomaticallyintheformofstatisticalrulesandpatternsfromlargedatabases.Adatawarehousearchivesinformationgatheredfrommultiplesources,andstoresitunderaunifiedschema,atasinglesite.Importantforlargebusinessesthatgeneratedatafrommultipledivisions,possiblyatmultiplesitesDatamayalsobepurchasedexternally©Silberschatz,KorthandSudarshan20.5DatabaseSystemConcepts-6thEditionDataWarehousingDatasourcesoftenstoreonlycurrentdata,nothistoricaldataCorporatedecisionmakingrequiresaunifiedviewofallorganizationaldata,includinghistoricaldataAdatawarehouseisarepository(archive)ofinformationgatheredfrommultiplesources,storedunderaunifiedschema,atasinglesiteGreatlysimplifiesquerying,permitsstudyofhistoricaltrendsShiftsdecisionsupportqueryloadawayfromtransactionprocessingsystems©Silberschatz,KorthandSudarshan20.6DatabaseSystemConcepts-6thEditionDataWarehousing©Silberschatz,KorthandSudarshan20.7DatabaseSystemConcepts-6thEditionDesignIssuesWhenandhowtogatherdataSourcedrivenarchitecture:datasourcestransmitnewinformationtowarehouse,eithercontinuouslyorperiodically(e.g.,atnight)Destinationdrivenarchitecture:warehouseperiodicallyrequestsnewinformationfromdatasourcesKeepingwarehouseexactlysynchronizedwithdatasources(e.g.,usingtwo-phasecommit)istooexpensiveUsuallyOKtohaveslightlyout-of-datedataatwarehouseData/updatesareperiodicallydownloadedformonlinetransactionprocessing(OLTP)systems.WhatschematouseSchemaintegration©Silberschatz,KorthandSudarshan20.8DatabaseSystemConcepts-6thEditionMoreWarehouseDesignIssuesDatacleansingE.g.,correctmistakesinaddresses(misspellings,zipcodeerrors)MergeaddresslistsfromdifferentsourcesandpurgeduplicatesHowtopropagateupdatesWarehouseschemamaybea(materialized)viewofschemafromdatasourcesWhatdatatosummarizeRawdatamaybetoolargetostoreon-lineAggregatevalues(totals/subtotals)oftensufficeQueriesonrawdatacanoftenbetransformedbyqueryoptimizertouseaggregatevalues©Silberschatz,KorthandSudarshan20.9DatabaseSystemConcepts-6thEditionWarehouseSchemasDimensionvaluesareusuallyencodedusingsmallintegersandmappedtofullvaluesviadimensiontablesResultantschemaiscalledastarschemaMorecomplicatedschemastructuresSnowflakeschema:multiplelevelsofdimensiontablesConstellation:multiplefacttables©Silberschatz,KorthandSudarshan20.10DatabaseSystemConcepts-6thEditionDataWarehouseSchema©Silberschatz,KorthandSudarshan20.11DatabaseSystemConcepts-6thEditionDataMiningDataminingistheprocessofsemi-automaticallyanalyzinglargedatabasestofindusefulpatternsPredictionbasedonpasthistoryPredictifacreditcardapplicantposesagoodcreditrisk,basedonsomeattributes(income,jobtype,age,..)andpasthistoryPredictifapatternofphonecallingcardusageislikelytobefraudulentSomeexamplesofpredictionmechanisms:ClassificationGivenanewitemwhoseclassisunknown,predicttowhichclassitbelongsRegressionformulaeGivenasetofmappingsforanunknownfunction,predictthefunctionresultforanewparametervalue©Silberschatz,KorthandSudarshan20.12DatabaseSystemConcepts-6thEditionDataMining(Cont.)DescriptivePatternsAssociationsFindbooksthatareoftenboughtby“similar”customers.Ifanewsuchcustomerbuysonesuchbook,suggesttheotherstoo.AssociationsmaybeusedasafirststepindetectingcausationE.g.,associationbetweenexposuretochemicalXandcancer,ClustersE.g.,typhoidcaseswereclusteredinanareasurroundingacontaminatedwellDetectionofclustersremainsimportantindetectingepidemics©Silberschatz,KorthandSudarshan20.13DatabaseSystemConcepts-6thEditionClassificationRulesClassificationruleshelpassignnewobjectstoclasses.E.g.,givenanewautomobileinsuranceapplicant,shouldheorshebeclassifiedaslowrisk,mediumriskorhighrisk?Classificationrulesforaboveexamplecoulduseavarietyofdata,suchaseducationallevel,salary,age,etc.personP,P.degree=mastersandP.income75,000P.credit=excellentpersonP,P.deg

1 / 38
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功