多表连接查询优化的相关研究吕彬2009.3.5motivation星型连接效果对比图050010001500200025008.00E+045.00E+051.05E+06三元组数时间(ms)Star-join优化前Star-join优化后'Z'型连接效果对比图050001000015000200008.00E+045.00E+051.05E+06三元组数时间(ms)z-join优化前z-join优化后图中表示连接顺序对查询效率的影响:准确估计选择度要考虑属性间的相关性问题关键:高效地计算属性间的相关度AgendaMulti-tablejoinoverview•HeuristicandrandomizedoptimizationforthejoinorderingproblemMichaelSteinbrunn,etal,TheVLDBJournal(1997)6:191–208Attributecorrelationdetection•BHUNT:AutomaticDiscoveryofFuzzyAlgebraicConstraintsinRelationalDataPaulG.BrownPeterJ.Haas,Proceedingsofthe29thVLDBConference,2003•CORDS:AutomaticDiscoveryofCorrelationsandSoftFunctionalDependenciesIhabF.Ilyas,VolkerMarkl,etal,SIGMOD2004,June13–18,2004,•COCA:MoreAccurateMultidimensionalHistogramsoutofMoreAccurateCorrelationsDetectionCAOWei1,QINXiongpai,WANGShan,WAIM2008Starjoin•StarGazingfromatopyourDB2z/OSDatabaseServerTerryPurcell,etal,IntelligentOptimizer•Starjoinrevisited:PerformanceinternalsforclusterarchitecturesJosepAguilar-Saborit,Data&KnowledgeEngineering63(2007)995–1013Heuristicandrandomizedoptimizationforthejoinorderingproblem•Choosingjointypebasedoncost•Solutionspaceforthejoinorderingproblem•Joinorderingstrategies•Quantitativeanalysis•ConclusionMulti-tablejoinoverviewChoosingjointypebasedoncost•Costmodels•Nestedloopjoin•Sort-mergejoin•HashjoinMulti-tablejoinoverviewIOCostElapsedTimeRowCostCPUCostBaseCostScanCostPageCostSolutionspaceforthejoinorderingproblem•Left-deeptreesn!waystoallocatenbaserelationstothetree’sleavesgoodsolutionsbecauseofexploitingthecost-reducingpipeliningtechnique•Bushytreesanadaptableplanenumerationstrategylineargraphs(n3−n)/6stargraphs(n−1)2n−2Multi-tablejoinoverviewJoinorderingstrategies•Deterministicalgorithmsheuristicorexhaustivesearch•RandomizedalgorithmsDefineasetofmoveswhichconstituteedgesbetweenthedifferentsolutionsofthesolutionspaceperformsarandomwalkalongtheedgesaccordingtocertainrules,terminatingassoonasnomoreapplicablemovesexistoratimelimitisexceeded•Geneticalgorithmsmakeuseofarandomizedsearchstrategyverysimilartobiologicalevolution•HybridalgorithmscombinethestrategiesofpuredeterministicandpurerandomizedalgorithmsMulti-tablejoinoverviewQuantitativeanalysis•Classofthejoingraph•RelationcardinalitiesanddomainsizesMulti-tablejoinoverviewQuantitativeanalysis•SolutionspacesMulti-tablejoinoverviewQuantitativeanalysis•deterministicalgorithmsMulti-tablejoinoverviewQuantitativeanalysis•deterministicalgorithmsMulti-tablejoinoverviewQuantitativeanalysis•RandomizedandgeneticalgorithmsMulti-tablejoinoverviewQuantitativeanalysis•RandomizedandgeneticalgorithmsMulti-tablejoinoverviewQuantitativeanalysis•RandomizedandgeneticalgorithmsMulti-tablejoinoverviewQuantitativeanalysis•RandomizedandgeneticalgorithmsMulti-tablejoinoverviewQuantitativeanalysis•Totalrunningtime•FindfinalsolutiontimeMulti-tablejoinoverviewConclusion•HeuristicalgorithmsVsRandomizedandgeneticalgorithmscomputequickly,butfarfromtheoptimum.bettersuitedforjoinoptimizations;althoughrequirealongerrunningtime•SolutionspaceExceptthestarjoingraph,thebushytreeispreferablethanleft-deepprocessingtrees.•TheextensibilityofrandomizedandgeneticalgorithmsMulti-tablejoinoverviewQuery-Drivenapproaches•Queryworkload•Queryfeedback•Multidimensionalhistogram•SASHalgorithm•AdvantageanddisadvantageData-Drivenapproaches•StatisticalChi-squaretest,Log-linearmodel•ProbabilityBayesiannetwork,Markovnetwork•Fullscanvs.Sample•hardFDs,softFDs,CorrelationAttributecorrelationdetectionBHUNT:AutomaticDiscoveryofFuzzyAlgebraicConstraintsinRelationalData•ExampleAttributecorrelationdetectionOverviewofBHUNT•AlgebraicConstraints•Examplea1:deliveries.deliveryDate,a2:orders.shipDate,⊕:asthesubtractionoperator,P:orders.orderID=deliveries.orderID,I={2,3,4,5}∪{12,13,...,19}∪{31,32,33,34,35}.AttributecorrelationdetectionOverviewofBHUNT•GeneratingCandidatesC=(a1,a2,P,⊕).GeneratingPairingRulesTurningPairingRulesIntoCandidates•IdentifyingFuzzyConstraintsConstructingBumpIntervals,byapplyingstatisticalhistogramming,segmentation,orclusteringtechniquestoasampleofthecolumnvaluesChoosingtheSampleSize,Thesamplesizeisselectedtocontrolthenumberof“exception”recordsthatfailtosatisfytheconstraint.•ExploitingtheConstraintsIdentifythemostusefulsetofconstraints,andcreate“exceptiontables”toholdalloftheexceptionrecords.•Modifythequeriestoincorporatetheconstraintstheoptimizerusestheconstraintstoidentifynew,moreefficientaccesspaths.AttributecorrelationdetectionExperimentalresultsofBHUNTAttributecorrelationdetectionCORDS:AutomaticDiscoveryofCorrelationsandSoftFunctionalDependencies•.AttributecorrelationdetectionCORDS:AutomaticDiscoveryofCorrelationsandSoftFunctionalDependencies•softfunctionaldependencies•C1=C2,thevalueofC1determinesthevalueofC2notwithcertainty,butmerelywithhighprobability.•hardfunctionaldependencies•thevalueofC1completelydeterminesthevalueofC2.AttributecorrelationdetectionCORDSoverview•First:EnumeratingandPruning•PruningruleTypeconstraintStatisticalconstraintParingconstraintWorkloadcon