OntheBayesRiskinInformation-HidingProtocols∗KonstantinosChatzikokolakisCatusciaPalamidessiINRIAandLIX,´EcolePolytechniquePalaiseau,France{kostas,catuscia}@lix.polytechnique.frPrakashPanangadenMcGillUniversityMontreal,Quebec,Canadaprakash@cs.mcgill.caAbstractRandomizedprotocolsforhidingprivateinformationcanberegardedasnoisychannelsintheinformation-theoreticsense,andtheinferenceoftheconcealedinformationcanberegardedasahypothesis-testingprob-lem.WeconsidertheBayesianapproachtotheproblem,andinvestigatetheprobabilityoferrorassociatedtotheMAP(MaximumAposterioriProbability)inferencerule.Ourmainresultisaconstructivecharacter-izationofaconvexbaseoftheprobabilityoferror,whichallowsustocomputeitsmaximumvalue(overallpossibleinputdistributions),andtoidentifyupperboundsforitintermsofsimplefunctions.Asasideresult,weareabletoimprovetheHellman-RavivandtheSanthi-Vardyboundsexpressedintermsofconditionalentropy.WethendiscussanapplicationofourmethodologytotheCrowdsprotocol,andinparticularweshowhowtocomputetheboundsontheprobabilitythatanadversarybreakanonymity.1IntroductionInformation-hidingprotocolstrytohidetherelationbetweencertainfacts,thatwewishtomaintainhidden,andtheobservableconsequencesofthesefacts.ExampleofsuchprotocolsareanonymityprotocolslikeCrowds[23],OnionRouting[29],andFreenet[8].Oftentheseprotocolsuserandomizationtoob-fuscatethelinkbetweentheinformationthatwewishtokeephiddenandthe∗ThisworkhasbeenpartiallysupportedbytheINRIADREI´EquipeAssoci´eePRINT-EMPS.TheworkofKonstantinosChatzikokolakisandCatusciaPalamidessihasbeenalsosupportedbytheINRIAARCprojectProNoBiS.observedevents.Crowds,forinstance,triestoconcealtheidentityoftheorigi-natorofamessagebyforwardingthemessagerandomlyuntilitsdestination,sothatifanattackerinterceptsthemessage,itcannotbesurewhetherthesenderistheoriginatororjustaforwarder.Inmostcases,protocolsliketheonesabovecanberegardedasinformation-theoreticchannels,wheretheinputsarethefactstokeephidden,theoutputsaretheobservables,andthematrixrepresentsthecorrelationbetweenthefactsandtheobservedevents,intermsofconditionalprobabilities.AnadversarycantrytoinferthefactsfromtheobservedeventsusingtheBayesianmethod,whichisbasedontheprincipleofassuminganaprioriprobabilitydistributiononthehiddenfacts(hypotheses),andderivingfromthat(andfromthematrix)theaposterioridistributionafteracertaineventhasbeenobserved.ItiswellknownthatthebeststrategyfortheadversaryistoapplytheMAP(MaximumAposterioriProbability)criterion,which,asthenamesays,dictatesthatoneshouldchoosethehypothesiswiththemaximumaposterioriprobabilitygiventheobservation.“Best”meansthatthisstrategyinducesthesmallestprobabil-ityofguessingthewronghypothesis.Theprobabilityoferror,inthiscase,isalsocalledBayesrisk.Intuitively,theBayesriskismaximumwhentherowsofthechannel’smatrixareallthesame;thiscasecorrespondsindeedtocapacity0,whichmeansthattheinputandtheoutputareindependent,i.e.wedonotlearnanythingabouttheinputsbyobservingtheoutputs.Thisistheidealsituation,fromthepointofviewofinformation-hidingprotocols.Inpractice,however,itisdifficulttoachievesuchdegreeofprivacy.WearetheninterestedinmaximizingtheBayesrisk,sotocharacterizequantitativelytheprotectionofferedbytheprotocol.ThemainpurposeofthispaperistoinvestigatetheBayesrisk,inrelationtothechannel’smatrix,andtoproducetightboundsonit.Theinterestinfindinggoodboundsfortheprobabilityoferrorismotivatedalsobythefactthatinsomecasethedecisionregioncanhaveacomplicatedgeometry,orthedecisionfunctioncanbeverysensitivetosmallvariationsintheinputdistribution,thusmakingitdifficulttocomputetheprobabilityoferror.Someexamplesofsuchsituationsareillustratedin[26].Goodboundsbasedon“easy”functions(i.e.functionseasytocompute,andnottoosensitivetocomputationalerrors)arethereforeveryusefulinsuchsituationsastheycanbeusedasanapproximationoftheprobabilityoferror.Itisparticularlynicetohaveconvexboundssincetheyboundanyestimatebasedonlinearinterpolation.Sinceourboundisbasedontheconvexhullitisthebestconvexboundthatmatchesthecornerpoints.TherearemanyboundsknowninliteraturefortheBayesrisk.Oneoftheseistheequivocationbound,duetoR´enyi[24],whichstatesthattheprobabilityoferrorisboundedbytheconditionalentropyofthechannel’sinputgiventheoutput.Later,HellmanandRavivimprovedthisboundbyhalf[15].Recently,SanthiandVardyhaveproposedanewbound,thatdependsexponentiallyonthe(oppositeofthe)conditionalentropy,andwhichconsiderablyimprovestheHellman-Ravivboundinthecaseofmulti-hypothesistesting[26].Thelatterisbetter,however,inthecaseofbinaryhypothesistesting.2TheBayesapproachtohypothesistestingisoftencriticizedbecauseitas-sumestheknowledgeoftheaprioridistribution,oratleastofagoodapprox-imationofit,whichisoftenanunjustifiedassumption.However,eveniftheadversarydoesnotknowtheaprioridistribution,themethodisstillvalidasymptotically,undertheconditionthatthematrix’srowsareallpairwisedis-tinguished.Undersuchconditionindeed,asshownin[3],byrepeatingtheexperimentthecontributionoftheaprioriprobabilitybecomeslessandlessrelevantforthecomputationoftheBayesianrisk,andit“washesout”inthelimit.Furthermore,t