转录组数据分析解读及 实例操作罗奇斌 奇云诺德QY NODE 德国慕尼黑工业大学 Second genera1on sequencers234常规分析5实验流程6分析所需工具 7• Bow1e so7ware – h9p://bow1e-‐bio.sourceforge.net/index.shtml/ • SAM tools – h9p://samtools.sourceforge.net/ • TopHat so7ware – h9p://tophat.cbcb.umd.edu/ • Cufflinks so7ware – h9p://cufflinks.cbcb.umd.edu/ • CummeRbund so7ware – h9p://compbio.mit.edu/cummeRbund/ *Linux, 64bit CPU, 16G memory • RNAseqisapowerfultooltodetcetthewholetransciptomeincellandtissue.• PreviousRNAseqresearchfocusonmRNA,butrecentstudiesprovethatpartoffunctionalnoncodingtransctiptandprotein-codingRNAsarelackofpolyA.Contentoftranscriptome1. Genes:expression,alterantesplices2. NoncodingRNA:snoRNA,mRNA-likencRNA,snRNA,someantisensetranscripts,pesudogenes,retrotransposon,andothersfunctionalRNAs3.SomerepeatelementsRNA-seq的生物学重复和标准1. 至少有两个生物学重复,除非“短时间梯度取样” (overlappingtimepointswithhightemporalresolution)不需要技术重复2. 对基因注释较好的物种,只定量比较研究,可用reads大于20M;用于注释基因组的转录组,大于100M3. 最好有浓度不同长度不同的绝对定量control(Spike-in),以评估mapping质量、测序均匀性和RNA-seq定量效果4. “3端/5端比值”是衡量RNA完整性的关键指标(理想值是1),也要进行计算评估5. 样品处理流程,文库构建流程,测序机器,测序类型,分析软件,样品评估关键指标,rpkm值关键结果完备。BackgroundmRNA-seqMapping and Assembly tools BWA -‐ BWA is a fast light-‐weighted tool that aligns rela1vely short sequences (queries) to a sequence database (targe), such as the human reference genome SeqMap -‐ A Tool For Mapping Millions Of Short Sequences To The Genome. MAQ -‐ stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. ERANGE -‐ Mapping and Quan1fying Mammalian Transcriptomes by RNA-‐Seq Cufflinks -‐ assembles transcripts, es1mates their abundances, and tests for differen1al expression and regula1on in RNA-‐Seq samples. iAssembler – a standalone package to assemble ESTs generated using Sanger and/or Roche-‐454 pyrosequencing technologies into con1gs. MapPER -‐ an RNA-‐seq paired-‐end read (PER) protocol. Support splice mapping and quan7fy TopHat -‐ is a fast splice junc1on mapper for RNA-‐Seq reads. SpliceMap -‐ SpliceMap is a de novo splice junc1on discovery tool. It offers high sensi1vity and support for arbitrarily long RNA-‐seq read lengths. MapSplice -‐ Splice Junc1on Mapping Tool. Trinity RNA-‐Seq Assembly – so7ware solu1ons targeted to the reconstruc1on of full-‐length transcripts and alterna1vely spliced isoforms from Illumina RNA-‐Seq data PALMapper -‐ a combina1on of the spliced alignment method QPALMA with the short read alignment tool GenomeMapper. RNA-SeqDataAnalysisToolsWeb-‐based tools rQuant.web -‐ is a web service to provide convenient access to tools for the quan1ta1ve analysis of RNA-‐Seq data. Galaxy -‐ Mapping pipeline for Illumina, 454, and SOLiD sequencing data. UCSC Genome Browser -‐ This site contains the reference sequence and working dra7 assemblies for a large collec1on of genomes. It also provides portals to the ENCODE and Neandertal projects. Bioconductor -‐ Bioconductor is an open source and open development so7ware project for the analysis and comprehension of genomic data. ExpEdit -‐ is a web applica1on for assessing RNA edi1ng in human at known or user specified sites supported by transcript data obtained by RNA-‐Seq experiments. Myrna -‐ a cloud compu1ng tool for RNA sequence. GenePa9ern -‐ is a powerful genomic analysis pladorm that provides access to more than 100 tools for gene expression analysis, proteomics, SNP analysis and common data processing tasks. Others Scripture -‐ is a method for transcriptome reconstruc1on that relies solely on RNA-‐Seq reads and an assembled genome to build a transcriptome ab ini&o. CisGenome -‐ An integrated tool for 1ling array, ChIP-‐seq, genome and cis-‐regulatory element analysis. ArrayExpressHTS -‐ is an R based pipeline for pre-‐processing, expression es1ma1on and data quality assessment of high throughput sequencing transcrip1onal profiling (RNA-‐seq) datasets. RSEQtools -‐ a modular framework to analyze RNA-‐Seq data using compact, anonymized data summaries. RNA-‐MATE -‐ A recursive mapping strategy for high-‐throughput RNA-‐sequencing data. SAMMate -‐ an RNA-‐seq analysis pipeline, allows processing of SAM/BAM files and is compa1ble with both single-‐end and paired-‐end sequencing technologies. Oqtans: Online Quan1ta1ve Transcriptome Analysis. DESeq -‐ Digital gene expresion analysis based on the nega1ve binomial distribu1on. EdgeR GeneexpressionnormalizationFragmentReads:RPKM:quantifiedtranscriptlevelsinreadsperkilobaseofexonmodelpermillionmappedr