arXiv:0711.2102v1[cs.IT]14Nov2007Patternsofi.i.d.SequencesandTheirEntropy-PartII:BoundsforSomeDistributions∗GilI.ShamirDepartmentofElectricalandComputerEngineeringUniversityofUtahSaltLakeCity,UT84112,U.S.Ae-mail:gshamir@ece.utah.edu.AbstractApatternofasequenceisasequenceofintegerindiceswitheachindexdescribingtheorderoffirstoccurrenceoftherespectivesymbolintheoriginalsequence.Inarecentpaper,tightgeneralboundsontheblockentropyofpatternsofsequencesgeneratedbyindependentandidenticallydistributed(i.i.d.)sourceswerederived.Inthispaper,preciseapproximationsareprovidedforthepatternblockentropiesforpatternsofsequencesgeneratedbyi.i.d.uniformandmonotonicdistributions,includingdistributionsovertheintegers,andthegeometricdis-tribution.Numericalboundsonthepatternblockentropiesofthesedistributionsareprovidedevenforveryshortblocks.Tightboundsareobtainedevenfordistributionsthathaveinfinitei.i.d.entropyrates.Theapproximationsareobtainedusinggeneralboundsandtheirderivationtechniques.Conditionalindexentropyisalsostudiedfordistributionsoversmalleralphabets.IndexTerms:patterns,monotonicdistributions,uniformdistributions,entropy.1IntroductionRecentwork(see,e.g.,[1],[5],[6],[10],[13],[14])hasconsidereduniversalcompressionforpatternsofindependentandidenticallydistributed(i.i.d.)sequences.Thepatternofasequencexn△=(x1,x2,...,xn)isasequenceψn△=ψ△=Ψ(xn)ofpointersthatpointtotheactualalphabetletters,wherethealphabetlettersareassignedindicesinorderoffirstoccurrence.Forexample,thepatternofallsequencesxn=lossless,xn=sellsoll,xn=12331433,andxn=76887288,whichisalphabetindependent,isψn=Ψ(xn)=12331433.CapitalΨ(·)denotesthepatternoperator.Patternsareinterestinginuniversalcompressionwithunknownalphabets,wherethedictionaryandthepatternofxncanbecompressedseparately(see,e.g.,[1]).Patternentropyisalsoimportantinlearningapplications.Considerallthenewspeciesanexplorerobserves.Theexplorercanidentifythesespecieswiththefirsttimeeachwasseen,andassignindicestospeciesinorderoffirstoccurrence.Theentropyofpatternscanthusmodeluncertaintyofsuchprocesses.∗SupportedinpartbyNSFGrantCCF-0347969.PartsofthematerialinthispaperwerepresentedattheIEEEInternationalSymposiumonInformationTheory,Seattle,WA,USA,July,2006.1Initialworkonpatterns[5],[6],[10],[13],[14],focusedonshowingdiminishinguniversalcom-pressionredundancyrates.Thefirstresultsonpatternentropyin[10],[12],[14],however,showedthatforsufficientlylargealphabets,thepatternblockentropymustdecreasefromthei.i.d.oneevenmoresignificantlythantheuniversalcodingpenaltyforcodingpatterns.SinceΨ(xn)istheresultofdataprocessing,itsentropymustbenogreaterthanHθ(Xn).Foralphabetsizek,nHθ(X)−log[k!/(max{0,k−n})!]≤Hθ(Ψn)≤nHθ(X),(1)wherecapitallettersdenoterandomvariables,andθistheparametervectorgoverningthesource1.Theboundsin(1)alreadyshowthatfork=o(n)thepatternentropyrateequalsthei.i.d.one2fornon-diminishingHθ(X).Subsequentlytotheresultsin[12],itwasindependentlyshownin[4]and[7]thatfordiscretei.i.d.sources,thepatternentropyrateisequaltothatoftheunderlyingi.i.d.process.Incontrastwiththeblockentropy,forsmalleralphabets,theconditionalnextindexentropyHθ Ψℓ|Ψℓ−1isguaranteedtostartincreasingfromHθ(X)aftersometimeℓ1.Thegain(decrease)inblockentropyisthusduetofirstoccurrencesofnewsymbols,andgainsonlyoccurbeforefirstoccurrencesbecomesparse.Thisobservation,pointedoutalsoin[4],[7],givesrisetoapossibilityofdiminishingo(1)overallperblockdecreasesofHθ(Ψn)fromnHθ(X).In[11],generaltightupperandmatchinglowerboundsontheblockentropywerederived.Thispapercontinuestheworkin[11],andusestheboundsderivedin[11]andtheirderivationmethodstoprovideveryaccurateapproximationsofthepatternblockentropiesforuniformandseveralmonotonici.i.d.distributions.Thecompleterangeofuniformdistributions,fromoverfixedsmallalphabets,tooverinfinitealphabets,isstudied.Monotonicdistributionsfromslowlytofastdecayingonesareconsidered.Itisshownthatthepatternentropycanbeapproximatedevenforslowlydecayingmonotonicdistributionswithinfinitei.i.d.entropyrates.Then,smallalphabetsandtheirconditionalnextindexentropiesarestudied.Thederivationmethodsarebasedonthosein[11].Theprobabilityspaceispartitionedintoagridofpoints.Betweeneachtwopoints,thereisabin.Symbolswhoseprobabilitieslieinthesamebincanbeexchangedinxntoprovideanothersequencex′nwithΨ(xn)=Ψ(x′n)andalmostequalprobability.Countingsuchsequences,packinglowprobabilitysymbolsintosinglepointmasses,leadstoboundsonHθ(Ψn).Properchoicesofgridsarekeyfortighteningbounds.Section2givessomepreliminaries.Generalbounds(somewhatmodified)from[11]arereviewedinSection3.Next,Section4summarizespatternentropiesfordifferentdistributions.Finally,Sec-tions5and6containtheproofsforuniformdistributionsandmonotonicdistributions,respectively.2PreliminariesLetxnbeann-tuplewithcomponentsxi∈Σ△={1,2,...,k}(wherethealphabetisdefinedwithoutlossofgenerality).Theasymptoticregimeisthatn→∞.However,thegeneralboundsarestatedalsoforfiniten.Thealphabetsizekmaybegreaterthannorinfinite.Thevector1Logarithmsaretakentobase2,hereandelsewhere.Thenaturallogarithmisdenotedbyln.2Fortwofunctionsf(n)andg(n),f(n)=o(g(n))