二代测序数据分析简介童春发2013.12.23主要内容•重测序的原理及流程•数据结构与质量评估•SRA数据库及数据获取•Bowtie2、BWA和SAMtools软件使用重测序的原理及流程数据结构与质量评估•Fastq格式•FastQCFASTQformat@SEQ_IDGATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT+!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCFCCCCCCC65Illuminasequenceidentifiers@HWUSI-EAS100R:6:73:941:1973#0/1VersionsoftheIlluminapipelinesince1.4appeartouse#NNNNNNinsteadof#0forthemultiplexID,whereNNNNNNisthesequenceofthemultiplextag.WithCasava1.8theformatofthe'@'linehaschanged@EAS139:136:FC706VJ:2:2104:15343:1973931:Y:18:ATCACGQuality•AqualityvalueQisanintegermappingofp(i.e.,theprobabilitythatthecorrespondingbasecallisincorrect).•Phredqualityscore:•TheSolexapipeline(i.e.,thesoftwaredeliveredwiththeIlluminaGenomeAnalyzer)earlierusedQualityEncoding•SangerformatcanencodeaPhredqualityscorefrom0to93usingASCII33to126•Illumina'snewestversion(1.8)oftheirpipelineCASAVAwilldirectlyproducefastqinSangerformat•Solexa/Illumina1.0formatcanencodeaSolexa/Illuminaqualityscorefrom-5to62usingASCII59to126•StartingwithIllumina1.3andbeforeIllumina1.8,theformatencodedaPhredqualityscorefrom0to62usingASCII64to126•StartinginIllumina1.5andbeforeIllumina1.8,thePhredscores0to2haveaslightlydifferentmeaningAmericanStandardCodeforInformationInterchange(ASCII)FastQC••Doubleclick“run_fastqc.bat”torunFastQC•Theanalysisresultsfor11modules•Greentickfornormal•Orangetriangleforslightlyabnormal•RedcrossforveryunusualBasicStatisticsFilenameNHS066-47_L4_1.fq.gzFiletypeConventionalbasecallsEncodingSanger/Illumina1.9TotalSequences3992798FilteredSequences0Sequencelength100%GC37PerBaseSequenceQuality•Thecentralredlineisthemedianvalue•Theyellowboxrepresentstheinter-quartilerange(25-75%)•Theupperandlowerwhiskersrepresentthe10%and90%points•ThebluelinerepresentsthemeanqualityPerSequenceQualityScores•Awarningisraisedifthemostfrequentlyobservedmeanqualityisbelow27-thisequatestoa0.2%errorrate.•Anerrorisraisedifthemostfrequentlyobservedmeanqualityisbelow20-thisequatestoa1%errorrate.PerBaseSequenceContent•ThismoduleissuesawarningifthedifferencebetweenAandT,orGandCisgreaterthan10%inanyposition.•ThismodulewillfailifthedifferencebetweenAandT,orGandCisgreaterthan20%inanyposition.PerBaseGCContent•ThismoduleissuesawarningittheGCcontentofanybasestraysmorethan5%fromthemeanGCcontent.•ThismodulewillfailiftheGCcontentofanybasestraysmorethan10%fromthemeanGCcontent.PerSequenceGCContent•Awarningisraisedifthesumofthedeviationsfromthenormaldistributionrepresentsmorethan15%ofthereads•Thismodulewillindicateafailureifthesumofthedeviationsfromthenormaldistributionrepresentsmorethan30%ofthereadsPerBaseNContent•ThismoduleraisesawarningifanypositionshowsanNcontentof5%•ThismodulewillraiseanerrorifanypositionshowsanNcontentof20%SequenceLengthDistribution•Thismodulewillraiseawarningifallsequencesarenotthesamelength•ThismodulewillraiseanerrorifanyofthesequenceshavezerolengthDuplicateSequences•Thismodulewillissueawarningifnon-uniquesequencesmakeupmorethan20%ofthetotal•Thismodulewillissueaerrorifnon-uniquesequencesmakeupmorethan50%ofthetotalOverrepresentedSequencesAATTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGATCTCG653111.636TruSeqAdapter,Index10(97%over36bp)ATTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGATCTCGT64640.162TruSeqAdapter,Index10(97%over36bp)AATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGATCTCGT46330.116TruSeqAdapter,Index10(97%over36bp)AATTAGTCGGAAGAGCACACGTCTGAACTCCAGTCACTCGAAGATCTCGT44630.112TruSeqAdapter,Index10(97%over34bp)AATTATGGATAATTAAAGTATTCCCCCCTTTTTTTTATGATATTTTTGAC39940.100NoHitWarning:0.1%Failure:1%OverrepresentedKmers•Thismodulewillissueawarningifanyk-merisenrichedmorethan3foldoverall,ormorethan5foldatanyindividualposition•Thismodulewillissueaerrorifanyk-merisenrichedmorethan10foldatanyindividualbasepositionSavingaReportNHS066-47_L4_1.fq_fastqc.zipSRA数据库及数据获取SRA数据库及数据获取SRA数据库及数据获取SRA数据库及数据获取查看和下载SRR576183Fastq-dum将SRA文件转化成FASTQ格式•fastq-dump--split-files-DQ“+”./SRR576183.sra•fastq-dump--split-files-DQ“+”--gzip./SRR576183.sra直接下载FASTQ格式数据•将Reads比对到参考序列•BWA•Bowtie2•Soap•SamtoolsBWA•••wget•tar-xjvfbwa-0.7.5a.tar.bz2•cdbwa-0.7.5a•make•Dowloadtest.tar.gzfrom•../bwa-0.7.5a/bwaindexref.fa•../bwa-0.7.5a/bwamemref.fatest_PE1.faaln-se.sam•../bwa-0.7.5a/bwamemref.fatest_PE1.fatest_PE2.faaln-se.samBowtie2••下载bowtie2-2.1.0-linux-x86_64.zip•unzipbowtie2-2.1.0-linux-x86_64.zip•mvbowtie2-2.1.0bowtie2•cdbowtie2/example•mkdirwork•cdworkBowtie2•Indexareferencegenome../../bowtie2-build../reference/lambda_virus.falambda_virus•Aligningsingle-endreads../../bowtie2-xlambda_virus-U../reads/reads_1.fq-Seg1.sam•Aligningpaired-endreads../../bowtie2-xlambda_virus-1../reads/reads_1.fq-2../reads/reads_2.fq-Seg2.sam-U:unpa