spectral clustering 3 谱聚类

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

SpectralClusteringRoyiItzhakSpectralClustering•Algorithmsthatclusterpointsusingeigenvectorsofmatricesderivedfromthedata•Obtaindatarepresentationinthelow-dimensionalspacethatcanbeeasilyclustered•Varietyofmethodsthatusetheeigenvectorsdifferently•Difficulttounderstand….ElementsofGraphTheory•AgraphG=(V,E)consistsofavertexsetVandanedgesetE.•IfGisadirectedgraph,eachedgeisanorderedpairofvertices•Abipartitegraphisoneinwhichtheverticescanbedividedintotwogroups,sothatalledgesjoinverticesindifferentgroups.0.10.20.80.70.60.80.80.8E={Wij}Setofweightededgesindicatingpair-wisesimilaritybetweenpointsSimilarityGraph•Distancedecreasesimilartyincrease•RepresentdatasetasaweightedgraphG(V,E)126{,,...,}vvv123456V={xi}SetofnverticesrepresentingdatapointsSimilarityGraph•Wijrepresentsimilaritybetweenvertex•IfWij=0whereisn’tsimilarity•Wii=0GraphPartitioning•Clusteringcanbeviewedaspartitioningasimilaritygraph•Bi-partitioningtask:–Divideverticesintotwodisjointgroups(A,B)123456ABV=AUBGraphpartitionisNPhardClusteringObjectives•Traditionaldefinitionofa“good”clustering:1.Pointsassignedtosameclustershouldbehighlysimilar.2.Pointsassignedtodifferentclustersshouldbehighlydissimilar.Minimizeweightofbetween-groupconnections0.10.20.80.70.60.80.80.8123456•ApplytheseobjectivestoourgraphrepresentationGraphCuts•Expresspartitioningobjectivesasafunctionofthe“edgecut”ofthepartition.•Cut:Setofedgeswithonlyonevertexinagroup.wewantstofindtheminimalcutbeetweengroups.ThegroupsthathastheminimalcutwouldbethepartitionBjAiijwBAcut,),(0.10.20.80.70.60.80.81234560.8ABcut(A,B)=0.3GraphCutCriteria•Criterion:Minimum-cut–Minimiseweightofconnectionsbetweengroupsmincut(A,B)OptimalcutMinimumcut•Problem:–Onlyconsidersexternalclusterconnections–Doesnotconsiderinternalclusterdensity•Degeneratecase:GraphCutCriteria(continued)•Criterion:Normalised-cut(Shi&Malik,’97)–Considertheconnectivitybetweengroupsrelativetothedensityofeachgroup.)(),()(),(),(minBvolBAcutAvolBAcutBANcut–Normalisetheassociationbetweengroupsbyvolume.•Vol(A):ThetotalweightoftheedgesoriginatingfromgroupA.•Whyusethiscriterion?–Minimisingthenormalisedcutisequivalenttomaximisingnormalisedassociation.–Producesmorebalancedpartitions.Secondoption________0101ABNAnumberofvertexesonABnumberofvertexesonBABNABAorBNthatsThepreviouscriteriawasonheweightThisfollowingcriteriaisonthesizeofthegroupExample–2Spirals-2-1.5-1-0.500.511.52-2-1.5-1-0.500.511.52DatasetexhibitscomplexclustershapesK-meansperformsverypoorlyinthisspaceduebiastowarddensesphericalclusters.-0.8-0.6-0.4-0.200.20.40.60.8-0.709-0.7085-0.708-0.7075-0.707-0.7065-0.706Intheembeddedspacegivenbytwoleadingeigenvectors,clustersaretrivialtoseparate.SpectralGraphTheory•Possibleapproach–Representasimilaritygraphasamatrix–ApplyknowledgefromLinearAlgebra…•SpectralGraphTheory–Analysethe“spectrum”ofmatrixrepresentingagraph.–Spectrum:Theeigenvectorsofagraph,orderedbythemagnitude(strength)oftheircorrespondingeigenvalues.},...,,{21n•Theeigenvaluesandeigenvectorsofamatrixprovideglobalinformationaboutitsstructure.111111nnnnnnwwxxλwwxxMatrixRepresentations•Adjacencymatrix(A)–nxnmatrix–:edgeweightbetweenvertexxiandxjx1x2x3x4x5x6x100.80.600.10x20.800.8000x30.60.800.200x4000.200.80.7x50.1000.800.8x60000.70.800.10.20.80.70.60.80.81234560.8•Importantproperties:–SymmetricmatrixEigenvaluesarerealEigenvectorcouldspanorthogonalbase][ijwAMatrixRepresentations(continued)•Importantapplication:–Normaliseadjacencymatrix•Degreematrix(D)–nxndiagonalmatrix–:totalweightofedgesincidenttovertexxix1x2x3x4x5x6x11.500000x201.60000x3001.6000x40001.700x500001.70x6000001.50.10.20.80.70.60.80.81234560.8jijwiiD),(MatrixRepresentations(continued)•Laplacianmatrix(L)–nxnsymmetricmatrix•Importantproperties:–Eigenvaluesarenon-negativerealnumbers–Eigenvectorsarerealandorthogonal–Eigenvaluesandeigenvectorsprovideaninsightintotheconnectivityofthegraph…0.10.20.80.70.60.80.81234560.8L=D-Ax1x2x3x4x5x6x11.5-0.8-0.60-0.10x2-0.81.6-0.8000x3-0.6-0.81.6-0.200x400-0.21.7-0.8-0.7x5-0.1000.8-1.7-0.8x6000-0.7-0.81.5Anotheroption–normalizedlaplasian•Laplacianmatrix(L)–nxnsymmetricmatrix0.00-0.060.00-0.521.000.000.000.001.00-0.52-0.44-0.471.000.000.00-0.501.000.47-0.00-0.061.00-0.50-0.440.000.00•Importantproperties:–Eigenvectorsarerealandnormalize–EachAij(whichi,jisnotequal)=0.10.20.80.70.60.80.81234560.80.50.5()DDADijADiiFindAnOptimalMin-Cut(Hall’70,Fiedler’73)•Expressabi-partition(A,B)asavector1ifA1ifBiiixpx•Thelaplacianissemipositive•TheRayleighTheoremshows:–Theminimumvalueforf(p)isgivenbythe2ndsmallesteigenvalueoftheLaplacianL.–Theoptimalsolutionforpisgivenbythecorrespondingeigenvectorλ2,referredastheFiedlerVector.–Wecanminimisethecutofthepartitionbyfindinganon-trivialvectorpthatminimisesthefunction2,)()(jiVjiijppwpfpLpTLaplacianmatrixProof•Basedon•ConsistencyofSpectralClusteringByUlrikevonLuxburg1,MikhailBelkin2,OlivierBousquetMaxPlanckInstituteforBiologicalCyberneticsPages2-6Proof,()()deg()()deg(),()()min()ijiswsvvolsvolsiwvolsivo

1 / 51
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功