©2001OxfordUniversityPressNucleicAcidsResearch,2001,Vol.29,No.2545–552AphylogenomicapproachtomicrobialevolutionThomasSicheritz-PonténandSivG.E.Andersson*DepartmentofMolecularEvolution,EvolutionaryBiologyCenter,UppsalaUniversity,75236Uppsala,SwedenReceivedAugust3,2000;RevisedandAcceptedNovember6,2000ABSTRACTTostudytheoriginandevolutionofbiochemicalpathwaysinmicroorganisms,wehavedevelopedmethodsandsoftwareforautomatic,large-scalereconstructionsofphylogeneticrelationships.Wedefinethecompletesetofphylogenetictreesderivedfromtheproteomeofanorganismasthephylomeandintroducethetermphylogeneticconnectionasaconceptthatdescribestherelativerelationshipsbetweentaxainatree.Aquerysystemhasbeenincorporatedintothesystemsoastoallowsearchesfordefinedcategoriesoftreeswithinthephylome.Asacomplement,wehavedevelopedthepyphysystemforvisualisingtheresultsofcomplexqueriesonphylogeneticconnections,genomiclocationsandfunctionalassignmentsinagraphicalformat.Ourphylogenomicsapproach,whichlinksphylogeneticinformationtotheflowofbiochemicalpathwayswithinandamongmicrobialspecies,hasbeenusedtoexaminemorethan8000phylogenetictreesfromsevenmicrobialgenomes.Theresultshaverevealedarichwebofphylogeneticconnections.However,theseparationofBacteriaandArchaeaintotwoseparatedomainsremainsrobust.INTRODUCTIONTheclassificationofmicroorganismsrepresentsamajorchallengeinbiology(1).MolecularphylogeneticsbasedonrRNAsandselectedproteinshavelaidthefoundationforamodernclassificationsystem,conceptuallyrepresentedbythe‘universaltreeoflife’(2).However,microbialgenomesarehighlydynamicinstructureandhorizontalgenetransfereventshavebeensuggestedtooccurmuchmorefrequentlythanwaspreviouslythought(3).TheacquisitionofforeignDNAcombinedwithintra-genomicrearrangementandduplicationeventsmayprovideanexplanationfortheremarkableabilityofbacteriatoconstantlyexplorenewgrowthhabitats.However,acontinuousflowofgeneticmaterialwithinandamongbacterialspeciesisproblematicalinthesensethatconflictingevolutionaryrelationshipsaretobeexpectedfromphylogeneticreconstructionsbasedonindividualgenesequences(4).Indeed,ananalysisofthecompletegenomeofthehyper-thermophilicbacteriumThermotogamaritimahasshownthataboutaquarterofthegenesaremostsimilartotheirhomologuesinArchaea(5).Similarly,ithasbeensuggestedthatalmost20%oftheEscherichiacoligenesareofrecentforeignorigin(6,7).Thus,individualgenetreesmaynotnecessarilyreflectthe‘correct’speciestree.Toquantifythefrequencyatwhichhorizontalgenetransfereventsoccurinbacteriaweneedtocomparephylogeneticdataatthegenomiclevelandrelateresultsbasedonthousandsofindividualgenesequencestofunctionalannotationsandmetabolicinformation.Theterm‘phylogenomics’referstosuchlarge-scale,genomicapproachestophylogeneticanalyses(8).Aseriesofimportantscientificissueshavetobeaddressedintheseglobalanalyses.First,weneedabetterunderstandingofthedistributionofhorizontalgenetransfereventsonanevolutionarytimescale.Afewalternativehypotheseshavebeenproposedtoexplainthecomplexpatternsofsequencerelationshipsobservedinmicrobialgenomes.The‘continualhorizontaltransfer’hypothesissuggeststhatgeneacquisitionsareongoingprocessesinmicroorganisms(9),whereasthe‘earlymassivetransfer’hypothesisproposesthatmassiveexchangesoccurredearlyinprokaryoticevolution,longbeforethediversificationofmodernmicrobialspecies(10).Wealsohavetodeterminewhethergenesareequallyamenabletohorizontalgenetransferorwhethersomegenesaremoresuitablefortracingevolutionaryrelationshipsthanothers.Genesrelatedtoprocessesthatareessentialtolife,suchasreplication,transcriptionandtranslation,havelongbeenthoughttobelesslikelytobehorizontallytransferredthangenesofimportanceonlyforgrowthinhighlyspecialisedmilieus(9,11).Thethirdimportantissueconcernsthemethodsusedforanalysisandhowtheresultsofthesemethodsareinterpreted.Withtherapidlyaccumulatingnumberofsequencesinthepublicdatabases,sequencesimilarityismostoftendefinedastheclosestmatchindatabasesearches(besthit)usingprogramssuchasBLAST(12).Thismethodisfastandsimpleandcaneasilybeautomatedfortheanalysisofthousandsofgenes.Theso-called‘besthits’havethereforeroutinelybeenusedasabasisforgeneannotationsingenomesequencingprojects.Byanalogy,examplesof‘oddsimilarities’havebeentakenasindicationsofhorizontalgenetransferevents(5,13).However,indicationsofunexpectedrelationshipsbasedonsequencesimilaritysearchesmaybeaffectedbyfactorssuchasgeneduplicationanddivergenceand/orbydifferencesinnucleotidesubstitutionrates,whicharenottakenintoaccountinthesesimplemethods.Therefore,greatcautionshouldbeexer-cisedwhentryingtoinferfunctionalequivalenceandevolu-tionaryrelationshipssolelyfromtheresultsofsimpledatabasesearchesorfrompair-wisesequencesimilaritymeasures(14).*Towhomcorrespondenceshouldbeaddressed.Tel:+46184714374;Fax:+46184716404;Email:siv.andersson@ebc.uu.sePresentaddress:ThomasSicheritz-Pontén,DepartmentofBiotechnology,theTechnicalUniversityofDenmark,DK-2800Lyngby,Denmark546NucleicAcidsResearch,2001,Vol.29,No.2Toquantifyandauthenticatehorizontalgenetransfersitisnecessarytoestablishthatthetransferredgenesarepositioneddeepl