lecture7-breseq

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

Re-sequencinghundredsofevolvedE.coligenomes:Findingnon-SNPmutations,analyzingmixedpopulations,andknowingwhatyoudon'tknowJeffBarrickMicrobiologyandMolecularGeneticsLaboratoryofRichLenski0500rbs3??????rbs2rbs1topAspoTglmUSpykF1000150020002500generation%populationwithgenotype300040005000mrdAancestor80%100%0%20%40%60%1Wednesday,June9,2010Lenskilong-termevolutionexperimentsinglecell2Wednesday,June9,2010Lenskilong-termevolutionexperiment24hrsgrowthsinglecell2Wednesday,June9,2010Lenskilong-termevolutionexperiment24hrsgrowth1:100dilution24hrsgrowthsinglecell2Wednesday,June9,2010Lenskilong-termevolutionexperiment24hrsgrowth1:100dilution24hrsgrowthrepeat...repeat...repeat...repeat...1:100dilution24hrsgrowthsinglecell2Wednesday,June9,2010Lenskilong-termevolutionexperiment❖12independentpopulationsevolved20yrs.Frozen“fossilrecord”hasbeenarchived.❖Howmanyandwhatmutations?❖Compareratesofgenomicchangeandfitnessincrease,monitordiversityinthepopulation,understandmolecularbasisofadaptation.24hrsgrowth1:100dilution24hrsgrowthrepeat...repeat...repeat...repeat...1:100dilution24hrsgrowthsinglecell2Wednesday,June9,2010•Application:Re-sequencing(microbes)•MainPlatform:IlluminaGenomeAnalyzer1.Overviewofstrategiesandsequencingdata2.breseq(“brēz-sēk”)bacterialre-sequencingpipeline•characteristicsofatypicaldataset•identifyingdifferentkindsofmutations•analysisofSNPsinamixedpopulation3.AskingevolutionaryquestionsOverview3Wednesday,June9,2010BeforeIforget...ThankstotheLenskilab,particularlyBrianBaer,NeerjaHajela,ZacharyBlount,andJustinMeyer.ThankstocollaboratorsJihyunKimetal.(KRIBB)DomSchneideretal.(Grenoble)GenoscopeThanksforcodingdiscussions/contributionsDaveKnoester4Wednesday,June9,2010Strategiesforfindingmutationsdenovoassembly❖assemblereadsbyoverlap(velvet,ABySS...)❖mapcontigstoreferencegenome❖infermutationsre-sequencing❖mapreadstoknownreferencegenome(ssaha2,maq,bowtie,...)❖infermutations5Wednesday,June9,2010Strategiesforlibrarypreparationpaired-endsingle-endmate-pairedindependentreadstwoinwardlyorientedreadsseparatedby~200nttwooutwardlyorientedreadsseparatedby~3000nt6Wednesday,June9,2010020040060080010001200!#$!%!&!'!()))*)+&!&#&$#%#&#'#(*)***+'!'#'$$%$&$'$(+)+*++(!(#($!%%!%&!%'!%(!!)!!*!!+!)!!)#!)$!&%!&&!&'!&(!#)!#*!#+megabases(PF)E.coligenome:4.6Mb148samplesatRTSF(shown)SequenceDataSets501001502002500avgcoverageMarch2010June2008120samplesatGenoscopewithDomSchneideretal.mostly1lanepergenome,36-bpsingle-endreads7Wednesday,June9,2010FASTQbreseqssaha2SAMfileofalignmentsothertools1.single-basesubstitutions(SNPs)2.smallwithin-alignmentindels3.largedeletions4.newjunctions(ISinsertions)5.copynumbervariation6.mixedpopulationSNPanalysisimplementation:•commandlinetool•pipelinewithpartsinPerl,R,andC++•emphasisonaccuracyoverspeed•runsonUnix,HPCC,MacOSXbreseqre-sequencingpipelinemutationannotationmutationidentificationreadalignmentHTML/TXToutput8Wednesday,June9,2010SAMtools()Createdtosupport1000GenomesprojectbyateamattheSangerCenter.•ClibrarywithbindingstoJava,Perl,Python,Lisp,etc.•Command-linetoolsformanipulation,consensus/indelcalling,viewingalignmentsastext,...•ManyreadalignersoutputinSAMformatText(SAM)andbinary(BAM)filesorganizedforquickretrievalofreadsalignedtoacertainposition.SAM:SequenceAlignment/Mapformat9Wednesday,June9,20101.Theoreticallimits:Readlengthandpairdistance.2.Practicallimits:Basequalityandcoverageevenness.Knowingwhatyoudon’tknowISinsertionsduplicationsinversionsacrossISSNPsinrepeatsinsertionofnewseqpaired-endsingle-endmate-pairedNeedstandardizedmetricstodescribecompletenessofre-sequencingdataonaper-baseper-genomebasis.IS=bacterialmobileelements0.8-1.5kbinlength.**–––**–––****–10Wednesday,June9,2010•Mostbasesinarunhaveerrorfrequenciesbetween10–4and10–3.OverallerrorratesagreewellwithPhredqualityscores[E=10–(Q/10)].TypicalBaseErrorRates11Wednesday,June9,2010•Thereisvariationinthefrequencyatwhichdifferentbaseerrorsoccuratagivenqualityscore.TypicalBaseErrorSpectrum10−−810−−710−−610−−510−−410−−310−−210−−11errorrate30405060basequalityscoreTCGΔreferencebaseA12Wednesday,June9,201010−−810−−710−−610−−510−−410−−310−−210−−11errorrate30405060basequalityscore10−−810−−710−−610−−510−−410−−310−−210−−11errorrate30405060basequalityscore10−−810−−710−−610−−510−−410−−310−−210−−11errorrate30405060basequalityscorereferenceΔ10−−810−−710−−610−−510−−410−−310−−210−−11errorrate30405060basequalityscorereferenceA10−−810−−710−−610−−510−−410−−310−−210−−11errorrate30405060basequalityscorereferenceCreferenceTreferenceGZDB294ATCGΔbaseobservedlowprG→AlowprT→AhighprA→Conlysinglebaseindelstabulated13Wednesday,June9,2010TypicalCoverageDistribution050000100000150000200000250000300000020406080100120●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●ZDB294Coverageperpositionfitsanegativebinomialdistribution(overdispersedPoisson).readdepthcoveragenumrefbaseswithcoverageConsideronlypositionswhereallreadalignmentsareunique.observedcoveragePoissonfitnegativebinomialfit14Wednes

1 / 44
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功