Lec-3 Data warehousing and OLAP__ technology 09

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

2020/3/9lgao@mail.xidian.edu.cnDataWarehousingandOLAPTechnologyGaolinSchoolofComputerScienceandTechnologyXidianUniversityContentsoflastcourseWhypreprocessthedata?Datacleaning清理Dataintegration集成Datatransformation变换Datareduction约简Datadiscretization离散化SummaryContentsoftodayWhatisadatawarehouse?Amulti-dimensionaldatamodelFromdatawarehousingtodataminingWhatisDataWarehouse?Definedinmanydifferentways,butnotrigorously.Adecisionsupportdatabasethatismaintainedseparatelyfromtheorganization’soperationaldatabaseSupportinformationprocessingbyprovidingasolidplatformofconsolidated,historicaldataforanalysis“Adatawarehouseisasubject-oriented面向主题,integrated,time-variant,andnonvolatile非易失collectionofdatainsupportofmanagement’sdecisionmakingprocess.”DatawarehousingTheprocessofconstructingandusingdatawarehousesDataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,salesFocusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessingProvideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocessDataWarehouse—IntegratedConstructedbyintegratingmultipleheterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareappliedEnsureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconvertedDataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystemsOperationaldatabase:currentvaluedataDatawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”DataWarehouse—NonvolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironmentOperationalupdateofdatadoesnotoccurinthedatawarehouseenvironmentDoesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdataDataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsprogramontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysisDataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)联机事务处理MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)联机分析处理MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER(entity-relationship)+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueriesWhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciledDataWarehousingandOLAPTechnologyforDataMiningWhatisadatawarehouse?Amulti-dimensionaldatamodelFromdatawarehousingtodataminingFromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid(方体).Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapex(顶点)cuboid.Thelatticeofcuboidsformsadatacube2-DviewA2-DviewofsalesdataforAllElectronicsaccordingtothedimensionstimeanditem,wherethesalesarefrombrancheslocatedinthecityofVancouver.Themeasuredisplayedisdollarssold(inthousands)3-Dview3-Ddatacuberepresentation4-DdatacuberepresentationCube:ALatticeofCuboidsSchemasforMultidimensionalDatabases:StarSchemaSchemasforMultidimensionalDatabases:SnowflakeSchemaSchemasforMultidimensionalDatabases:FactConstellationAConceptHierarchy:Dimension(location)Aconcepthierar

1 / 34
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功