Module1IntroductiontoDataWarehousingModuleOverview•数据仓库概述•考虑数据仓库解决方案Lesson1:数据仓库概述•商业难题•什么是数据仓库?•数据仓库架构•数据仓库解决方案组件•数据仓库项目•数据仓库项目角色•SQLServer作为数据仓库平台业务难题•关键业务数据分布在多个业务系统•找到业务决策的信息是耗时的和容易出错的•基本的业务问题很难回答?WhatIsaDataWarehouse?•一个集中存放用于报表和数据的信息容器•通常,一个数据仓库:包含大量的历史数据优化了数据查询(而不是插入和更新)定期加载新的业务数据为企业商务智能解决方案提供依据DataWarehouseArchitecturesCentralizedDataWarehouseDepartmentalDataMartHubandSpokeComponentsofaDataWarehousingSolution•从业务系统和其他数据源抽取数据加载•数据通常最终加载到数据仓库•数据清洗和重复数据的删除,确保数据仓库中数据的质量•MDM提供确切的业务数据实体DataWarehouseStagingDatabaseETLLoadProcessETLStagingProcessMasterDataManagement1011000110DataCleansingReportingandAnalysisDataSourcesDataWarehousingProjects1.首先确定数据仓库需要解决的业务问题2.确定回答这些问题所需的数据3.识别所需数据的数据源4.评估关键业务目标价值可行性,从现在的数据回答每个问题•对大量数据的项目,使用增量更新比较有效:把项目分解为多个子项目每个子项目处理一个特定的主题DataWarehousingProjectRoles•Projectmanager•Solutionarchitect•Datamodeler•Databaseadministrator•Infrastructurespecialist•ETLdeveloper•Businessusers/analyst•Testers•DatastewardsSQLServerAsaDataWarehousingPlatformSQLServerAnalysisServicesSQLServerDatabaseEngineMicrosoftSQLServerIntegrationServicesSQLServerMasterDataServices1011000110SQLServerDataQualityServicesMicrosoftSQLAzureandtheWindowsAzureMarketplaceMicrosoftSharePointServerMicrosoftPowerPivotTechnologiesMicrosoftExcel•DataMiningAdd-In•PowerPivotAdd-In•MDSAdd-InPowerViewSQLServerReportingServicesReports,KPIs,andDashboardsDataWarehousingBusinessIntelligenceLesson2:ConsiderationsforaDataWarehouseSolution•DataWarehouseDatabaseandStorage•DataSources•Extract,Transform,andLoadProcesses•DataQualityandMasterDataManagementDataWarehouseDatabaseandStorage考虑数据仓库包括:Databaseschema•Logical:typicallydenormalizedforoptimalreadperformance•Physical:oftenpartitionedforperformanceandmanagementHardware•Queryprocessingandmemory•Storage•NetworkHighavailabilityanddisasterrecovery•Hardwareredundancy•BackupstrategySecurity•Serveraccess•DatapermissionsDataSources数据源连接类型凭证和权限数据格式数据采集窗口Extract,Transform,andLoadProcesses•临时表:存放临时数据•所需的转换:提取数据时所需的数据转换和清洗•增量ETL:数据的变化加载DataQualityandMasterDataManagement•Dataquality:Cleansingdata:•Validatingdatavalues•Ensuringdataconsistency•IdentifyingmissingvaluesDeduplicatingdata•Masterdatamanagement:EnsuringconsistentbusinessentitydefinitionsacrossmultiplesystemsApplyingbusinessrulestoensuredatavalidity1011000110ModuleReviewandTakeaways•WhymightyouconsiderincludingastagingareainyourETLsolution?•WhatoptionsmightyouconsiderforperformingdatatransformationsinanETLsolution?•Whywouldyouassignthedatastewardroletoabusinessuserratherthanadatabasetechnologyspecialist?