2013-BMMB597D:AnalyzingNextGenerationSequencingDataWeek10,Lecture19NickStolerTheHuckInstitutesoftheLifeSciencesPennStateSequencedatatogenotypes●AcommonsequencingworkflowSequencingreadsAlignmentsVariantcallsFASTQSAM/BAMVCFalistofshortsequencesalistofshortsequencesandwheretheyareinthegenomealistoflocationsinthegenomeandwhatthebaseisateachWhatarevariantcalls?●Naivevariantcalling-Checkallthereadsthatcoverbasechr1:291-Addupthebasesatchr1:291-e.g.10A's,2G's∙IsthisanA/Gheterozygoussiteortwosequencingerrors?●Actualvariantcallers-Estimatelikelihoodofavariantsitevsasequencingerror∙Sequencingerrorrate∙QualityscoresVCF:VariantCallFormat●Representalistoflocationsandthevariantcallateach-Simple,right?●Yesandno.-Simplefoundation∙Locationandbase-Complex“bonusfeatures”∙Indels,structuralvariants,etc.∙Multiplesamples∙HaplotypephasingVCF:Thesimplepart●location,referencebase,yourbase-CHROM/POS,REF,ALT-alotlikewgsim'smutations.txtVCF:TherestVCF:Thefullcolumnlist*****●Variantcallconfidence-likePhredscoreandMAPQ:Multiplevariants●Whatifyourreadshavemorethan1baseatonelocation?-wgsim'smutations.txt∙IUPACnotation●VCFjustgivescomma-separatedlists-REFALT-AA,C:Complexvariants●Canshowshortindels-CCT(insertT)-ACGA(deleteCG)VCF:Multiplesamples●VCFcanhaveavariablenumberofcolumns!●Columnheadingsarethesamplenames●VCFcanrepresentSNVcalls●andmuch,muchmore-Indels(GGC)-Multiplevariantspersite(inALTcolumn)-Multiplesamples(SAMPLEcolumns)●Checkposterforquickoverview-●Checkfullspecificationfordetails-●SamtoolsmpileupBCF-BCFistoVCFasBAMistoSAM∙(roughly)-TheBCFdoesn'tholdactualcalls∙encodeslikelihoodsforallvariants●BcftoolsviewVCF-Performstheactualvariantcalling-u:uncompressedoutput-D:includereaddepthinoutput-f:use../refs/sc.faasreference-v:onlyoutputnon-referencesites-c:doSNPcalling-g:callgenotypesatvariantsitesLiH.AstatisticalframeworkforSNPcalling,mutationdiscovery,associationmappingandpopulationgeneticalparameterestimationfromsequencingdata.Bioinformatics(2011)27(21):2987-2993.Morempileuptricks●CombinemultipleBAMfilesintooneBCF●OnlyincludeoneregionHomework19●Takeyourmutations.txtfilefromwgsim(orcreateanotherone)andcreateapartialVCFfilefromthefirst10lines(butskiponeswithindels)-Onlythelastheaderline(#CHROM)-Onlythefirst5columns-RefertoIUPACnucleicacidcodesfornon-ACGTbases∙meansitgeneratedreadswithbothAandTatthislocation●Usesamtools/bcftoolstocreateafullVCFfilefromthealignmentsyoucreatedintheprevioushomework