高级数据库技术问答题(附答案)

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

DatabaseSystemProblem1Thehashjoinalgorithmdescribedinthetextbookcomputesthenaturaljoinoftworelations.Writepseudocodeforaniteratorthatimplementsahashbasedalgorithmforcomputingan“anti-join”operation:Ranti-joinScontainsalltuplesinRthathavenomatchingtuplesinS.Althoughanti-joinisnotanoperationthatcanbespecifieddirectlyinSQL,itisimplementedinthequeryprocessortosupportsometypesofqueries.Yourpseudocodemustdefinethestandarditeratorfunctionsopen(),next(),andclose(),andmustfetchtheinputtuplesusingiteratorcalls(inotherwords,RandSmaynotbebaserelations,butmaybeoutputsofotheroperators).Showwhatstateinformationtheiteratormustmaintainbetweencalls.Forsimplicity,youcanassumethatSissmallandeasilyfitsinmemory.Herearemoredetailsonanti-join::(a)Forthediskthatwillholdtheindex,thetimetoreadagivenblockintomemorycanbeapproximatedby(25+.02*m)milliseconds.The25millisecondsrepresenttheseekandlatencycomponentsoftheread,the.02*mmillisecondsisthetransfertime.(Thatis,asmbecomeslarger,thelargertheblockwillbeandthemoretimeitwilltaketoreaditintomemory.)(b)Oncetheblockisinmemoryabinarysearchisusedtofindtherightpointer.Sothetimetoprocessablockinmainmemoryis(a+blog_2m)milliseconds,forsomeconstantsa,b.(log_2mmeanslogbase2ofm.)(c)Themainmemorytimeconstantaismuchsmallerthanthediskseekandlatencytimeof25milliseconds.(d)Theindexisfull,sothatthenumberofblocksthatmustbeexaminedpersearchislog_mN.Questions:(i)Whatvalueofmminimizesthetimetosearchforagivenrecord?AnapproximateanswerisOK.(HINT:Ifyoucomeupwithanequationwhichishardtosolvealgebraically,tryplugginginvaluestolocatetherootoftheequation.)Thevalueyouobtainshouldbeindependentofb!(ii)Whathappensastheseekandlatencyconstant(25ms)decreases?Forinstance,ifthisconstantiscutinhalf,howdoestheoptimummvaluechange?Answer:LetT1bethetimetakentoreadoneblockintomainmemoryandT2bethetimetoprocessthatblockinmemory.Ifnblocksareexaminedtosearchforagivenrecord,thenthetimetakenforthissearchis:n*(T1+T2)Substitutingtheappropriateexpressionsforallthevariables,wegetT(m)=[(25+0.02m)+(a+blog2m)]*logmNIgnoringaincomparisonwith30,wegetT(m)=[25+0.02m+blog2m]*logmNSubstitutinglog2mbyln(m)/ln(2)andlogmNbyln(N)/ln(m),wegetT(m)=[(25+0.02m)/ln(m)]*ln(N)+b*ln(N)/ln(2)Hence,tominimizeT(m),weneedtominimizef(m)=(25+0.02m)/ln(m).Anumberoftechniquescanbeappliedtodeterminethevalueofmthatminimizesthisexpression.Forexample,takingthederivativeandsettingittozero(f'(m)=0),wegetm*(ln(m)-1)=1250Thiscanbesolvedtoyieldanintegralvalueof272.Itiseasytoverifythatthisvaluedoesindeedminimizef(m)andhenceT(m).Seek+latencyconstantdecreases:LetussplitT(m)intothreecomponentsasfollows(here,tsistheseek+latencytimeconstantwhichwas30intheabovecase):TotalseekandlatencytimeTa=ts*ln(N)/ln(m)TotaltransfertimeTb=0.02*m*ln(N)/ln(m)TotalbinarysearchtimeTc=b*ln(N)/ln(2)Ofthese,Tcisindependentofm,TaisadecreasingfunctionofmwhereasTbisanincreasingfunctionofm.Iftsisverysmall,thenTaisnegligiblecomparedtoTb.Hence,wemustchooseasmallvalueofmtominimizethesearchtime.Ontheotherhand,ifTaisverylargecomparedtoTb,thentheformerdominatesandwechoosealargevalueofmtominimizesearchtimes.Fromtheselimitingcases,itiseasytoseethatiftsdecreases,theoptimumvalueofmdecreases.Inparticular,ifts=25/2=12.5,wegetanoptimumvalueof155(byproceedingalongthesamelinesasabove).Problem3Tobuildahashindexforamulti-attributesearchkey,wecanuseanapproachcalledpartitionedhashing.Thepartitionedhashfunctionisreallyalistofhashfunctions,oneforeachattributeinthesearchkey.SupposethatwewishtobuildapartitionedhashindexonR(A,B)with2nbucketsnumbered0to2n1.Inthiscase,thepartitionedhashfunctionconsistsoftwohashfunctionshAandhB.HashfunctionhAtakesavalueofAasinputandproducesaresultwithnAbits,andhBtakesavalueofBasinputandproducesaresultwith(nnA)bits.Thetworesultsareconcatenatedtogethertoproducetheresultofthepartitionedhashfunction,whichisthenusedtoindexthebuckets.TolocaterecordswithA=aandB=b,wesimplygodirectlytothebucketnumberedhA(a)hB(b)(inbinary).(a)WhichbucketsdowehavetoexamineinordertolocateallrecordswithA=a?(b)Supposewearegivenaquerymix.EachqueryinthismixwilleitheraskforrecordswithagivenvalueofA,oritwillaskforrecordswithagivenvalueofB(butneverboth).Withprobabilityp,thevalueofAwillbespecified.Giveaformulaintermsofn,nA,andpfortheexpectednumberofbucketsthatmustbeexaminedtoanswerarandomqueryfromthemix.(c)FindthevalueofnA,asafunctionofnandp,thatminimizestheexpectednumberofbucketsexaminedperquery?Answer:(a)(b)(c)Firstf(x)=thendeterminethevalueofNathatminimizesthef(x)(Tips:useyourcalculusknowledge)

1 / 4
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功