ASampleofSamplers:AComputationalPerspectiveonSamplingOdedGoldreichDepartmentofComputerScienceandAppliedMathematicsWeizmannInstituteofScience,Rehovot,Israel.Email:oded@wisdom.weizmann.ac.ilMay26,1997AbstractWeconsidertheproblemofestimatingtheaverageofahugesetofvalues.Thatis,givenoracleaccesstoanarbitraryfunctionf:f0;1gn7![0;1],weneedtoestimate2 nPx2f0;1gnf(x)uptoanadditiveerrorof .Weareallowedtoemployarandomizedalgorithmwhichmayerrwithprobabilityatmost .Wesurveyknownalgorithmsforthisproblemandfocusontheideasunderlyingtheircon-struction.Inparticular,wepresentanalgorithmwhichmakesO( 2 log(1= ))queriesandusesn+O(log(1= ))+O(log(1= ))cointosses,bothcomplexitiesbeingveryclosetothecorrespondinglowerbounds.Keywords:Sampling,randomnesscomplexity,savingrandomness,pairwiseindependentrandomvariables,Expandergraphs,randomwalksongraphs,lowerbounds.01IntroductionInmanysettingsrepeatedsamplingisusedtoestimatetheaveragevalueofahugesetofvalues.Namely,thereisavaluefunction de nedoverahugespace,say :f0;1gn7![0;1],andonewishestoapproximate def=12nPx2f0;1gn (x)withouthavingtoinspectthevalueof ontheentiredomain.Wecommentthatitisessentialtohavetherangeof bebounded(orelsenoreasonableapproximationmaybepossible).Ourconventionofhaving[0;1]betherangeof isadoptedforsimplicity,andtheproblemforother(predetermined)rangescanbetreatedanalogously.1.1FormalSettingOurnotionofapproximationdependsontwoparameters:accuracy(denoted )anderrorprobability(denoted ).Wewishtohaveanalgorithmwhichwithprobabilityatleast1 ,getswithin ofthecorrectvalue.Thisleadstothefollowingde nition.De nition1.1(sampler):Asamplerisarandomizedalgorithmthatoninputparametersn(length), (accuracy)and (error),andoracleaccesstoanyfunction :f0;1gn7![0;1],outputs,withprob-abilityatleast1 ,avaluethatisatmost awayfrom def=12nPx2f0;1gn (x).Namely,Pr(jsampler (n; ; ) j ) wheretheprobabilityistakenovertheinternalcointossesofthesampler.Weareinterestedin\thecomplexityofsamplingquanti edasafunctionoftheparametersn, and .Speci cally,wewillconsiderthreecomplexitymeasures1.SampleComplexity:thenumberoforaclequeriesmadebythesampler.2.RandomnessComplexity:thenumberof(unbiased)cointossesperformedbythesampler.3.ComputationalComplexity:therunning-timeofthesampler.Wesaythatasampleise cientifitsrunning-timeispolynomialinthetotallengthofitsqueries(i.e.,polynomialinbothitssamplecomplexityandinn).Wewillfocusone cientsamplers.Furthermore,wewillfocusone cientsamplerswhichhaveoptimal(uptoaconstantfactor)samplecomplexity,andwillbeinterestedinhavingtherandomnesscomplexitybeaslowaspossible.1.2OverviewThestraightforwardmethod(orthenaivesampler)consistsofuniformlyandindependentlyselectingsu cientlymanysamplepoints(queries),andoutputtingtheaveragevalueofthefunctiononthesepoints.UsingCherno BoundoneeasilydeterminesthatO(log(1= ) 2)samplepointssu ce.Thenaivesamplerisoptimal(uptoaconstantfactor)initssamplecomplexity,butisquitewastefulinrandomness.InSection2,wediscussthenaivesamplerandpresentlower(andupper)boundsonthesampleandrandomnesscomplexitiesofsamplers.Thesewillguideourquestforimprovements.Pairwise-independentsamplingyieldsagreatsavingintherandomnesscomplexity.InSec-tion3wepresentthePairwise-IndependentSampler,anddiscussitsadvantagesanddisadvantages.Speci cally,forconstant 0,thePairwise-IndependentSamplerisoptimaluptoaconstantfactor1inbothitssampleandrandomnesscomplexities.However,forsmall (i.e., =o(1)),itssamplecomplexityiswasteful.Anewideaisrequiredforgoingfurther,andarelevanttool{randomwalksonexpandergraphs(seeAppendixA){isneededtoo.InSection4,wecombinethePairwise-IndependentSamplerwiththeExpanderRandomWalkTechniquetoobtainanewsampler.Looselyspeaking,thenewsamplerusesarandomwalkonanexpandertogenerateasequenceof‘def=O(log(1= ))(related)randompadsfor‘invocationsofthePairwise-IndependentSampler.Eachoftheseinvocationsreturnsan -closeapproximationwithprobabilityatleast0:9.Theexpanderwalktechniqueyieldsthat,withprobabilityatleast1 exp( ‘)=1 ,mostofthese‘invocationsreturnan -closeapproximation.Thus,themedianvalueisan( ; )-approximationtothecorrectvalue.Theresultingsampler,calledtheMedian-of-AveragesSampler,hassamplecomplexityO(log(1= ) 2)andrandomnesscomplexity2n+O(log(1= )).InSection5,wepresentanalternativesamplerwhichimprovesoverthepairwise-independentsampler.Maintainingthesamplecomplexityofthelatter(i.e.,O(1= 2)),thenewsamplerhasrandomnesscomplexityn+O(log(1= ))(ratherthan2n).CombiningthisnewsamplerwiththeExpanderRandomWalkTechnique,weobtainsamplecomplexityO(log(1= ) 2)andrandomnesscomplexityn+O(log(1= ))+O(log(1= )).Betterboundsareobtainedforthecaseof\Booleansamplers(i.e.,algorithmswhichmustonlywell-approximateBooleanfunctions).Inaddition,inSection5wepresenttwogeneraltechniquesforre ningsamplers.Weconcludewithsomeopenproblems.Inparticular,wediscussthenotionof\oblivious(or\averaging)samplers.TheHittingProblemisconsideredinAppendixC.Here,givenanoracletoafunctionhavingatleastan fractionof1’s,oneisrequiredto ndaninputwhichevaluatesto1.Clearly,eachsamplercanbeusedforthispurpose,butthisisanover-kill.Neve