Bregman-Divergences

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

MatrixNearnessProblemsusingBregmanDivergencesInderjitS.DhillonTheUniversityofTexasatAustinHouseholderSymposiumXVISevenSprings,PAMay24,2005jointworkwithA.Banerjee,H.Cho,J.Ghosh,Y.Guan,S.Merugu,D.Modha,S.SraandJ.TroppBregmanDivergencesLetϕ:S→Rbeadifferentiable,strictlyconvexfunctionof“Legendretype”(S⊆Rd)TheBregmanDivergenceDϕ:S×int(S)→RisdefinedasDϕ(x,y)=ϕ(x)−ϕ(y)−(x−y)T∇ϕ(y)BregmanDivergencesLetϕ:S→Rbeadifferentiable,strictlyconvexfunctionof“Legendretype”(S⊆Rd)TheBregmanDivergenceDϕ:S×int(S)→RisdefinedasDϕ(x,y)=ϕ(x)−ϕ(y)−(x−y)T∇ϕ(y)yxϕ(z)=12zTzh(z)Dϕ(x,y)=12kx−yk2SquaredEuclideandistanceisaBregmandivergenceBregmanDivergencesLetϕ:S→Rbeadifferentiable,strictlyconvexfunctionof“Legendretype”(S⊆Rd)TheBregmanDivergenceDϕ:S×int(S)→RisdefinedasDϕ(x,y)=ϕ(x)−ϕ(y)−(x−y)T∇ϕ(y)yxDϕ(x,y)=xlogxy−x+yh(z)ϕ(z)=zlogzRelativeEntropy(alsocalledKL-divergence)isanotherBregmandivergenceBregmanDivergencesLetϕ:S→Rbeadifferentiable,strictlyconvexfunctionof“Legendretype”(S⊆Rd)TheBregmanDivergenceDϕ:S×int(S)→RisdefinedasDϕ(x,y)=ϕ(x)−ϕ(y)−(x−y)T∇ϕ(y)yxDϕ(x,y)=xy−logxy−1h(z)ϕ(z)=−logzItakura-SaitoDistance(usedinsignalprocessing)isanotherBregmandivergenceBregmanDivergencesFunctionNameϕ(x)domϕDϕ(x,y)Squarednorm12x2(−∞,+∞)12(x−y)2Shannonentropyxlogx−x[0,+∞)xlogxy−x+yBitentropyxlogx+(1−x)log(1−x)[0,1]xlogxy+(1−x)log1−x1−yBurgentropy−logx(0,+∞)xy−logxy−1Hellinger−√1−x2[−1,1](1−xy)(1−y2)−1/2−(1−x2)1/2`pquasi-norm−xp(0p1)[0,+∞)−xp+pxyp−1−(p−1)yp`pnorm|x|p(1p∞)(−∞,+∞)|x|p−pxsgny|y|p−1+(p−1)|y|pExponentialex(−∞,+∞)ex−(x−y+1)eyInverse1/x(0,+∞)1/x+x/y2−2/yPropertiesofBregmanDivergencesDϕ(x,y)≥0,andequals0iffx=yPropertiesofBregmanDivergencesDϕ(x,y)≥0,andequals0iffx=yNotametric(symmetry,triangleinequalitydonothold)PropertiesofBregmanDivergencesDϕ(x,y)≥0,andequals0iffx=yNotametric(symmetry,triangleinequalitydonothold)Strictlyconvexinthefirstargument,butnotconvex(ingeneral)inthesecondargumentPropertiesofBregmanDivergencesDϕ(x,y)≥0,andequals0iffx=yNotametric(symmetry,triangleinequalitydonothold)Strictlyconvexinthefirstargument,butnotconvex(ingeneral)inthesecondargumentThree-pointpropertygeneralizesthe“Lawofcosines”:Dϕ(x,y)=Dϕ(x,z)+Dϕ(z,y)−(x−z)T(∇ϕ(y)−∇ϕ(z))BregmanProjectionsNearnessinBregmandivergence:the“Bregman”projectionofyontoaconvexsetΩ,PΩ(y)=argminω∈ΩDϕ(ω,y)BregmanProjectionsNearnessinBregmandivergence:the“Bregman”projectionofyontoaconvexsetΩ,PΩ(y)=argminω∈ΩDϕ(ω,y)yPΩ(y)ΩBregmanProjectionsNearnessinBregmandivergence:the“Bregman”projectionofyontoaconvexsetΩ,PΩ(y)=argminω∈ΩDϕ(ω,y)yxPΩ(y)ΩBregmanProjectionsNearnessinBregmandivergence:the“Bregman”projectionofyontoaconvexsetΩ,PΩ(y)=argminω∈ΩDϕ(ω,y)yxPΩ(y)ΩGeneralizedPythagorasTheorem:Dϕ(x,y)≥Dϕ(x,PΩ(y))+Dϕ(PΩ(y),y)WhenΩisanaffineset,theaboveholdswithequalityHistoricalReferencesL.M.Bregman.“Therelaxationmethodoffindingthecommonpointofconvexsetsanditsapplicationtothesolutionofproblemsinconvexprogramming.”USSRComputationalMathematicsandPhysics,7:200-217,1967.Problem:minϕ(x)subjecttoaTix=bi,i=0,...,m−1Bregman’scyclicprojectionmethod:1.Startwithappropriatex(0).Computex(t+1)tobetheBregmanprojectionofx(t)ontothei-thhyperplane(i=tmodm)fort=0,1,2,...Convergestogloballyoptimalsolution.Thiscyclicprojectionmethodcanbeextendedtohalfspaceandconvexconstraints,whereeachprojectionisfollowedbyacorrection.Question:WhatrolecanBregmanDivergencesplayindataanalysis?ExponentialFamiliesofDistributionsDefinition.AregularexponentialfamilyisafamilyofprobabilitydistributionsonRdwithdensityfunctionparameterizedbyθ:pψ(x|θ)=exp{xTθ−ψ(θ)−gψ(x)}ψistheso-calledcumulantfunction,andisaconvexfunctionofLegendretypeExponentialFamiliesofDistributionsDefinition.AregularexponentialfamilyisafamilyofprobabilitydistributionsonRdwithdensityfunctionparameterizedbyθ:pψ(x|θ)=exp{xTθ−ψ(θ)−gψ(x)}ψistheso-calledcumulantfunction,andisaconvexfunctionofLegendretypeExample—considersphericalGaussiansparameterizedbymeanμ(withfixedvarianceσ):p(x)=1(2πσ2)dexp−12σ2kx−μk2=1(2πσ2)dexpxTμσ2−σ22μσ22−xTx2σ2Thusθ=μσ2,andψ(θ)=σ22θ2ExponentialFamiliesofDistributionsDefinition.AregularexponentialfamilyisafamilyofprobabilitydistributionsonRdwithdensityfunctionparameterizedbyθ:pψ(x|θ)=exp{xTθ−ψ(θ)−gψ(x)}ψistheso-calledcumulantfunction,andisaconvexfunctionofLegendretypeExample—considersphericalGaussiansparameterizedbymeanμ(withfixedvarianceσ):p(x)=1(2πσ2)dexp−12σ2kx−μk2=1(2πσ2)dexpxTμσ2−σ22μσ22−xTx2σ2Thusθ=μσ2,andψ(θ)=σ22θ2Note:Gaussiandistribution←→SquaredLossExamplePoissonDistribution:p(x)=λxx!e−λ,x∈Z+ThePoissonDistributionisamemberoftheexponentialfamilyIsthereaDivergenceassociatedwiththePoissonDistribution?ExamplePoissonDistribution:p(x)=λxx!e−λ,x∈Z+ThePoissonDistributionisamemberoftheexponentialfamilyIsthereaDivergenceassociatedwiththePoissonDistribution?YES—p(x)canbewrittenasp(x)=exp{−Dϕ(x,μ)−gϕ(x)},whereDϕistheRelativeEntropy,i.e.,Dϕ(x,μ)=xlogxμ−x+μExamplePoissonDistribution:p(x)=λxx!e−λ,x∈Z+ThePoissonDistributionisamemberoftheexponentialfamilyIsthereaDivergenceassociatedwiththePoissonDistribution?YES—p(x)canbewrittenasp(x)=exp{−Dϕ(x,μ)−gϕ(x)},whereDϕistheRela

1 / 61
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功