UncoveringtheoverlappingcommunitystructureofcomplexnetworksinnatureandsocietyGergelyPalla1,2,ImreDere´nyi2,Ille´sFarkas1&Tama´sVicsek1,2Manycomplexsystemsinnatureandsocietycanbedescribedintermsofnetworkscapturingtheintricatewebofconnectionsamongtheunitstheyaremadeof1–4.Akeyquestionishowtointerprettheglobalorganizationofsuchnetworksastheco-existenceoftheirstructuralsubunits(communities)associatedwithmorehighlyinterconnectedparts.Identifyingtheseaprioriunknownbuildingblocks(suchasfunctionallyrelatedproteins5,6,industrialsectors7andgroupsofpeople8,9)iscrucialtotheunderstandingofthestructuralandfunctionalpropertiesofnetworks.Theexistingdeterministicmethodsusedforlargenet-worksfindseparatedcommunities,whereasmostoftheactualnetworksaremadeofhighlyoverlappingcohesivegroupsofnodes.Hereweintroduceanapproachtoanalysingthemainstatisticalfeaturesoftheinterwovensetsofoverlappingcommu-nitiesthatmakesasteptowardsuncoveringthemodularstructureofcomplexsystems.Afterdefiningasetofnewcharacteristicquantitiesforthestatisticsofcommunities,weapplyanefficienttechniqueforexploringoverlappingcommunitiesonalargescale.Wefindthatoverlapsaresignificant,andthedistributionsweintroducerevealuniversalfeaturesofnetworks.Ourstudiesofcollaboration,word-associationandproteininteractiongraphsshowthatthewebofcommunitieshasnon-trivialcorrelationsandspecificscalingproperties.Mostrealnetworkstypicallycontainpartsinwhichthenodes(units)aremorehighlyconnectedtoeachotherthantotherestofthenetwork.Thesetsofsuchnodesareusuallycalledclusters,communities,cohesivegroupsormodules8,10,11–13;theyhavenowidelyaccepted,uniquedefinition.Inspiteofthisambiguity,thepresenceofcommunitiesinnetworksisasignatureofthehierarchicalnatureofcomplexsystems5,14.Theexistingmethodsforfindingcommunitiesinlargenetworksareusefulifthecommu-nitystructureissuchthatitcanbeinterpretedintermsofseparatedsetsofcommunities(seeFig.1bandrefs10,15,16–18).However,mostrealnetworksarecharacterizedbywell-definedstatisticsofoverlappingandnestedcommunities.Thiscanbeillustratedbythenumerouscommunitiesthateachofusbelongsto,includingthoserelatedtoourscientificactivitiesorpersonallife(school,hobby,family)andsoon,asshowninFig.1a.Furthermore,membersofourcommunitieshavetheirowncommunities,resultinginanextremelycomplicatedwebofthecommunitiesthemselves.Thishaslongbeenunderstoodbysociologists19buthasneverbeenstudiedsystem-aticallyforlargenetworks.Another,biological,exampleisthatalargefractionofproteinsbelongtoseveralproteincomplexessimultaneously20.Ingeneral,eachnodeiofanetworkcanbecharacterizedbyamembershipnumbermi,whichisthenumberofcommunitiesthatthenodebelongsto.Inturn,anytwocommunitiesaandbcansharesova;bnodes,whichwedefineastheoverlapsizebetweenthesecommunities.Naturally,thecommunitiesalsoconstituteanetwork,withtheoverlapsbeingtheirlinks.Thenumberofsuchlinksofcommunityacanbecalleditscommunitydegree,dcoma:Finally,thesizescomaofanycommunityacanmostnaturallybedefinedasthenumberofitsnodes.Tocharacterizethecommunitystructureofalargenetworkweintroducethedistributionsofthesefourbasicquantities.InparticularwefocusontheircumulativedistributionLETTERSFigure1|Illustrationoftheconceptofoverlappingcommunities.a,Theblackdotinthemiddlerepresentseitheroftheauthorsofthispaper,withseveralofhiscommunitiesaround.Zoominginonthescientificcommunitydemonstratesthenestedandoverlappingstructureofthecommunities,anddepictingthecascadesofcommunitiesstartingfromsomemembersexemplifiestheinterwovenstructureofthenetworkofcommunities.b,Divisiveandagglomerativemethodsgrosslyfailtoidentifythecommunitieswhenoverlapsaresignificant.c,Anexampleofoverlappingk-cliquecommunitiesatk¼4.Theyellowcommunityoverlapstheblueoneinasinglenode,whereasitsharestwonodesandalinkwiththegreenone.Theseoverlappingregionsareemphasizedinred.Noticethatanyk-clique(completesubgraphofsizek)canbereachedonlyfromthek-cliquesofthesamecommunitythroughaseriesofadjacentk-cliques.Twok-cliquesareadjacentiftheysharek21nodes.1BiologicalPhysicsResearchGroupoftheHungarianAcademyofSciences,Pa´zma´nyP.stny.1A,H-1117Budapest,Hungary.2DepartmentofBiologicalPhysics,Eo¨tvo¨sUniversity,Pa´zma´nyP.stny.1A,H-1117Budapest,Hungary.Vol435|9June2005|doi:10.1038/nature03607814©2005NaturePublishingGroupfunctionsdenotedbyP(m),P(sov),P(dcom)andP(scom).Fortheoverlapsize,forexample,P(sov)meanstheproportionofthoseoverlapsthatarelargerthansov.Furtherrelevantstatisticalfeatureswillbeintroducedlater.Thebasicobservationonwhichourcommunitydefinitionreliesisthatatypicalcommunityconsistsofseveralcomplete(fullycon-nected)subgraphsthattendtosharemanyoftheirnodes.Thus,wedefineacommunity,ormorepreciselyak-cliquecommunity,asaunionofallk-cliques(completesubgraphsofsizek)thatcanbereachedfromeachotherthroughaseriesofadjacentk-cliques(whereadjacencymeanssharingk21nodes)21–23.Thisdefinitionseekstorepresentthefactthatitisanessentialfeatureofacommunitythatitsmemberscanbereachedthroughwell-connectedsubsetsofnodes.Thereareotherpartsofthewholenetworkthatarenotreachablefromaparticulark-clique,buttheypotentiallycontainfurtherk-cliquecommunities.Int