clusterProfiler-R包详细教程

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

UsingclusterProfilertoidentifyandcomparefunctionalprofilesofgenelistsGuangchuangYuSchoolofBiologicalSciencesTheUniversityofHongKong,HongKongSAR,Chinaemail:guangchuangyu@gmail.comOctober13,2014Contents1Introduction22Citation23Supportedorganisms24GeneOntologyClassification35EnrichmentAnalysis45.1Hypergeometricmodel.........................45.2Genesetenrichmentanalysis.....................45.3GOenrichmentanalysis........................55.4KEGGpathwayenrichmentanalysis.................65.5DOenrichmentanalysis........................85.6Reactomepathwayenrichmentanalysis...............85.7Functioncall...............................85.8Visualization...............................85.8.1barplot..............................85.8.2enrichMap............................105.8.3cnetplot.............................105.8.4gseaplot.............................135.8.5pathviewfrompathviewpackage...............136Biologicalthemecomparison147SessionInformation1611IntroductionInrecentlyyears,high-throughputexperimentaltechniquessuchasmicroarray,RNA-Seqandmassspectrometrycandetectcellularmolecularsatsystems-level.Thesekindsofanalysesgeneratehugequantitatiesofdata,whichneedtobegivenabiologicalinterpretation.Acommonlyusedapproachisviaclusteringinthegenedimensionforgroupingdifferentgenesbasedontheirsimilarities[1].Tosearchforsharedfunctionsamonggenes,acommonwayistoincorporatethebiologicalknowledge,suchasGeneOntology(GO)andKyotoEncyclopediaofgenesandGenomes(KEGG),foridentifyingpredominantbiologicalthemesofacollectionofgenes.Afterclusteringanalysis,researchersnotonlywanttodeterminewhetherthereisacommonthemeofaparticulargenecluster,butalsotocomparethebiologicalthemesamonggeneclusters.Themanualsteptochooseinterestingclustersfollowedbyenrichmentanalysisoneachselectedclusterisslowandtedious.Tobridgethisgap,wedesignedclusterProfiler[2],forcomparingandvisualizingfunctionalprofilesamonggeneclusters.2CitationPleasecitethefollowingarticleswhenusingclusterProfiler.GYu,LGWang,YHan,QYHe.clusterProfiler:anRpackageforcomparingbiologicalthemesamonggeneclusters.OMICS:AJournalofIntegrativeBiology.2012,16(5),284-287.3SupportedorganismsAtpresent,clusterProfilerabout20speciesasshownbelow:•Arabidopsis•Anopheles•Bovine•Canine•Chicken•Chimp•EcolistrainK12•EcolistrainSakai•Fly•Human•Malaria•Mouse•Pig•Rat•Rhesus•Worm•Xenopus•Yeast•ZebrafishThesespeciesareallsupportedbyGOandKEGGanalyses.GOanalysesalsosupportCoelicolorandGondii.4GeneOntologyClassificationInclusterProfiler,groupGOisdesignedforgeneclassificationbasedonGOdistri-butionataspecificlevel.require(DOSE)data(geneList)gene-names(geneList)[abs(geneList)2]head(gene)##[1]43128318108745514355388991ggo-groupGO(gene=gene,organism=human,ont=BP,level=3,readable=TRUE)head(summary(ggo))##IDDescriptionCountGeneRatio##GO:0019953GO:0019953sexualreproduction1010/138##GO:0019954GO:0019954asexualreproduction00/138##GO:0032504GO:0032504multicellularorganismreproduction1111/138##GO:0032505GO:0032505reproductionofasingle-celledorganism00/138##GO:0051321GO:0051321meioticcellcycle55/138##GO:0006807GO:0006807nitrogencompoundmetabolicprocess7676/138##geneID##GO:0019953ASPM/CDK1/TRIP13/AURKA/CCNB1/PTTG1/GAMT/BMP4/DNALI1/PGR##GO:0019954##GO:0032504ASPM/TRIP13/AURKA/CCNB1/CSN3/PTTG1/GAMT/BMP4/ERBB4/STC2/PGR##GO:0032505##GO:0051321CDC20/TOP2A/NEK2/TRIP13/AURKA##GO:0006807CDC45/MCM10/S100A9/FOXM1/KIF23/CENPE/MYBL2/S100A8/TOP2A/NCAPH/E2F8/CXCL10/RRM2/UGT8/HJURP/NUSAP1/ISG20/CXCL13/CXCL11/SLC7A5/RAD51AP1/CXCL9/CENPN/CCNA2/CDK1/GINS1/PAX6/KIF18A/CDT1/BIRC5/KIF11/EZH2/NCAPG/AURKB/GINS2/CHAF1B/CHEK1/TRIP13/KIFC1/KIF18B/QPRT/KIF20A/IDO1/DTL/NUDT1/CCNB1/PIR/KIF4A/MCM5/PTTG1/MAOB/ADIPOQ/DACH1/ZNF423/AK5/RNASE4/TBC1D9/OMD/NOVA1/EMX2/PSD3/FABP4/GAMT/BMP4/SLC44A4/ABLIM3/ERBB4/NDP/FOXA1/CRY2/ABCA8/GATA3/TFAP2B/PGR/ADIRF/OGN5EnrichmentAnalysis5.1HypergeometricmodelEnrichmentanalysis[3]isawidelyusedapproachtoidentifybiologicalthemes.Hereweimplementhypergeometricmodeltoassesswhetherthenumberofse-lectedgenesassociatedwithdiseaseislargerthanexpected.Todeterminewhetheranytermsannotateaspecifiedlistofgenesatfrequencygreaterthanthatwouldbeexpectedbychance,clusterProfilercalculatesap-valueusingthehypergeometricdistribution:p=1k1Xi=0MiNMniNnInthisequation,Nisthetotalnumberofgenesinthebackgrounddistribution,Misthenumberofgeneswithinthatdistributionthatareannotated(eitherdirectlyorindirectly)tothenodeofinterest,nisthesizeofthelistofgenesofinterestandkisthenumberofgeneswithinthatlistwhichareannotatedtothenode.Thebackgrounddistributionbydefaultisallthegenesthathaveannotation.P-valueswereadjustedformultiplecomparison,andq-valueswerealsocalcu-latedforFDRcontrol.5.2GenesetenrichmentanalysisAcommonapproachinanalyzinggeneexpressionprofileswasidentifyingdiffer-entialexpressedgenesthataredeemedinteresting.Theenrichmentanalysiswedemonstratedpreviouswerebasedonthesedifferentialexpressedgenes.Thisapproachwillfindgeneswherethedifferenceislarge,butitwillnotdetectasit-uationwherethedifferenceissmall,butevidencedincoordinatedwayinasetofrelatedgenes.GeneSetEnrichmentAnalysis(

1 / 18
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功