第二代测序中的数据分析(基因组)罗奇斌1第二代测序分析类型基因组全基因组/外显子组测序SNPSmallInDelSNPannotation目标区域深度测序SNPannotationDenovo测序Genomeassembly转录组mRNA测序Geneexpression小RNA测序Annotationandtargetprediction2第二代测序分析工具•超过1000种分析工具–•常规分析–calling,qualitycontrol,alignment/assembly,SNP/Indeldiscovery,SNPannotation•高级分析–functionalpolymorphism,disease/phenotype,genomiccoordinate2第二代测序分析工具3第二代测序平台数据•illuminaGenomeAnalyzerII(solexa)–读长:80-120bp–格式:fastq•ABISOLiD–读长:50bp–格式:csfasta•RocheGSFLX(454)–读长:~400bp–格式:sff/fasta3.1Solexa–fastq格式3.1Solexa–fastq格式3.2Solid–csfasta格式3.3fasta格式4基因组常规分析基因组全基因组/外显子组测序SNPSmallInDelSNPannotation目标区域深度测序SNPannotationDenovo测序Genomeassembly转录组mRNA测序Geneexpression小RNA测序Annotationandtargetprediction4.1常规分析流程•Readscorrection•Assembly–shortreads:Solexa–longreads:3730,454reads–hybridreads:short+longreads•SNP/INDELCalling4.2常规分析工具4.3Solexa数据•BWA–•SAMtools–•SOAP2–•SOAPsnp–*Linux,64bitCPU,4Gmemory4.3Solexa数据:BWA•Indexreferencesequences–bwaindex-ais/bwtswref.fa–is:2Gb–bwtsw:2Gb•Mapping–bwaalnref.fashort_read.fqaln_sa.sai•OutputalignmentsintheSAMformat–bwasamseref.faaln_sa.saishort_read.fqaln.sam–bwasamperef.faaln_sa1.saialn_sa2.sairead1.fqread2.fqaln.sam4.3Solexa数据:SAM格式数据:SOAP2•Indexreferencesequences–2bwt-builderref.fa•Mapping–singlesoap-areads.fq-Dref.fa.index-ooutput–pairendsoap-areads1.fq-breads2.fq-Dref.fa.index-oPE_output-2SE_output-mmin_insert_size-xmax_insert_size4.3Solexa数据:SOAP24.4Solid数据:BioScope4.4Solid数据4.4Solid数据4.5454数据:newbler•RunMapping-ooutputdirref.fa1.sff…•454ReadStatus.txt4.6SNP/INDELCalling•Samtools––$samtoolsmpileup-ufref.faaln1.bamaln2.bam|bcftoolsview-bvcg-var.raw.bcf–$bcftoolsviewvar.raw.bcf|vcfutils.plvarFilter–D100var.flt.vcf–TheVCFformat(VariantCallFormat):4.6SNP/INDELCalling•GATK:GenomeAnalysisToolkit–常规分析基因组全基因组/外显子组测序SNPSmallInDelSNPannotation目标区域深度测序SNPannotationDenovo测序Genomeassembly转录组mRNA测序Geneexpression小RNA测序Annotationandtargetprediction5.1常规分析流程•Readscorrection•Assembly–shortreads:Solexa–longreads:3730,454reads–hybridreads:short+longreads•Scaffolding•Fixgap•GeneandGenomicsannotation5.1常规分析流程5.1常规分析流程5.2denovo分析工具5.3Solexa数据•CorrectiontoolforSOAPdenovo•Soapdenovo•Velvet~zerbino/velvet/•ABySS*Linux,64bitCPU,4G-256Gmemory6/22/13295.3Solexa数据5.3Solexa数据•*.contigContigsfile•*.scafSeqScaffoldsfile5.4Solid数据•Readscorrection–SOLiDAccuracyEnhancementTool(SAET)•Assembly–1.SOLiDdenovoAccessoryTools–2.Velvet~zerbino/velvet/5.5454数据•runAssembly-ooutputdir(-large)1.sff•Resultfiles–454AllContigs.fna–454LargeContigs.fna–454ReadStatus.txt(Assembled/Singleton/Repeat)–454Contigs.ace5.6GeneandGenomeAnnotation•Denovoprediction–GeneScan–Augustus•Homology-basedprediction•Referencegeneset谢谢!