GraphPartitioningandSpectralClusteringOutlineOverviewofGraphPartitioningGraphrepresentationHowdowedefinea“good”graphpartition?Howdowefinda“good”graphpartition?SpectralGraphTheoryMatrixrepresentationTheoreticalbackgroundSpectralClusteringAlgorithmBi-partitioningK-waypartitioningCo-clusteringConclusion0.10.20.80.70.60.80.80.8E={wij}Setofweightededgesindicatingpair-wisesimilaritybetweenpointsSimilarityGraphRepresentdatasetasaweightedgraphG(V,E)Exampledataset},...,,{621xxx123456V={xi}SetofnverticesrepresentingdatapointsGraphPartitioningClusteringcanbeviewedaspartitioningasimilaritygraphBi-partitioningtask:Divideverticesintotwodisjointgroups(A,B)123456ABRelevantIssues:Howcanwedefinea“good”partitionofthegraph?Howcanweefficientlyidentifysuchapartition?ClusteringObjectivesTraditionaldefinitionofa“good”clustering:1.Pointsassignedtosameclustershouldbehighlysimilar.2.Pointsassignedtodifferentclustersshouldbehighlydissimilar.2.Minimiseweightofbetween-groupconnections0.10.21.Maximiseweightofwithin-groupconnections0.80.70.60.80.80.8123456ApplytheseobjectivestoourgraphrepresentationGraphCutsExpresspartitioningobjectivesasafunctionofthe“edgecut”ofthepartition.Cut:Setofedgeswithonlyonevertexinagroup.BjAiijwBAcut,),(0.10.20.80.70.60.80.81234560.8ABcut(A,B)=0.3GraphCutCriteriaCriterion:Minimum-cutMinimiseweightofconnectionsbetweengroupsmincut(A,B)OptimalcutMinimumcutProblem:OnlyconsidersexternalclusterconnectionsDoesnotconsiderinternalclusterdensityDegeneratecase:GraphCutCriteria(continued)Criterion:Normalised-cut(Shi&Malik,’97)Considertheconnectivitybetweengroupsrelativetothedensityofeachgroup.Normalisetheassociationbetweengroupsbyvolume.Vol(A):ThetotalweightoftheedgesoriginatingfromgroupA.)(),()(),(),(minBvolBAcutAvolBAcutBANcutWhyusethiscriterion?Minimisingthenormalisedcutisequivalenttomaximisingnormalisedassociation.Producesmorebalancedpartitions.Howdoweefficientlyidentifya“good”partition?Problem:ComputinganoptimalcutisNP-hardSpectralGraphTheoryPossibleapproachRepresentasimilaritygraphasamatrixApplyknowledgefromLinearAlgebra…SpectralGraphTheoryAnalysethe“spectrum”ofmatrixrepresentingagraph.Spectrum:Theeigenvectorsofagraph,orderedbythemagnitudeoftheircorrespondingeigenvalues.},...,,{21nTheeigenvaluesandeigenvectorsofamatrixprovideglobalinformationaboutitsstructure.nnnnnnxxλxxaaaa111111MatrixRepresentationsAdjacencymatrix(A)nxnmatrix:edgeweightbetweenvertexxiandxjx1x2x3x4x5x6x100.80.600.10x20.800.8000x30.60.800.200x40.800.200.80.7x50.1000.800.8x60000.70.800.10.20.80.70.60.80.81234560.8Importantproperties:SymmetricmatrixEigenvectorsarerealandorthogonal][ijwAMatrixRepresentations(continued)Importantapplication:NormaliseadjacencymatrixDegreematrix(D)nxndiagonalmatrix:totalweightofedgesincidenttovertexxix1x2x3x4x5x6x11.500000x201.60000x3001.6000x40002.500x500001.70x6000001.50.10.20.80.70.60.80.81234560.8jijwiiD),(MatrixRepresentations(continued)Laplacianmatrix(L)nxnsymmetricmatrixImportantproperties:Eigenvaluesarenon-negativerealnumbersEigenvectorsarerealandorthogonalEigenvaluesandeigenvectorsprovideaninsightintotheconnectivityofthegraph…0.10.20.80.70.60.80.81234560.8L=D-Ax1x2x3x4x5x6x11.5-0.8-0.60-0.10x2-0.81.6-0.8000x3-0.6-0.81.6-0.200x4-0.80-0.22.5-0.8-0.7x5-0.1000.81.7-0.8x6000-0.7-0.81.5FindAnOptimalMin-Cut(Hall’70,Fiedler’73)Expressabi-partition(A,B)asavectorBif1Aif1iiixxpTheRayleighTheoremshows:Theminimumvalueforf(p)isgivenbythe2ndsmallesteigenvalueoftheLaplacianL.Theoptimalsolutionforpisgivenbythecorrespondingeigenvectorλ2,referredastheFiedlerVector.2,)()(jiVjiijppwpfWecanminimisethecutofthepartitionbyfindinganon-trivialvectorpthatminimisesthefunctionpLpTLaplacianmatrixSofar…Howcanwedefinea“good”partitionofagraph?Minimiseagivengraphcutcriterion.SpectralClustering(Simonet.al,’90)Howcanweefficientlyidentifysuchapartition?Approximateusinginformationprovidedbytheeigenvaluesandeigenvectorsofagraph.SpectralClusteringAlgorithmsThreebasicstages:1.Pre-processingConstructamatrixrepresentationofthedataset.2.DecompositionComputeeigenvaluesandeigenvectorsofthematrix.Mapeachpointtoalower-dimensionalrepresentationbasedononeormoreeigenvectors.3.GroupingAssignpointstotwoormoreclusters,basedonthenewrepresentation.SpectralBi-partitioningAlgorithm(Simon,’90)x11.5-0.8-0.60-0.10x2-0.81.6-0.8000x3-0.6-0.81.6-0.200x4-0.80-0.22.5-0.8-0.7x5-0.1000.81.7-0.8x6000-0.7-0.81.51.Pre-processingBuildLaplacianmatrixLofthegraph0.90.80.5-0.2-0.70.4-0.2-0.6-0.8-0.4-0.70.4-0.6-0.40.20.9-0.40.40.6-0.20.0-0.20.20.40.30.4-0.0.10.20.4-0.9-0.20.40.10.20.43.02.52.32.20.40.0Λ=X=2.DecompositionFindeigenvaluesXandeigenvectorsΛofthematrixLHowdowefindtheclusters?-0.7x6-0.7x5-0.4x40.2x30.2x20.2x1Mapverticestocorrespondingcomponentsofλ2SpectralBi-partitioning(continued)GroupingSortcomponentsofreduced1-dimensionalvector.Identifyclustersbysplittingthesortedvectorintwo.Howtochooseasplittingpoint?Naïveapproaches:Splitat0,meanorm