2013%&%BMMB%597D:%Analyzing%Next%Generaon%Sequencing%Data%%%Week%8,%Lecture%16%István'Albert''Biochemistry%and%Molecular%Biology%%and%Bioinformacs%Consulng%Center%%Penn%State%Binary%SAM%(BAM)%files%SAM%file:%%– informaon%on%the%alignment%of%each%read%– opmized%for%readability%and%sequenal%access%%BAM%(binary%SAM):%%– compression%!%saves%space%(opmized%for%size)%– may%be%sorted%+%indexed%!%locaon%query%(opmized%for%random%access)%%– the%file%is%not%readable%by%eye%Your%default%format%should%be%BAM%–%only%turn%it%into%SAM%when%viewing%the%file%%SAM/BAM%hierarchy%SAM%file%Sorted%SAM%file%BAM%file%Sorted%BAM%file%Indexed%BAM%file%Some%tools%have%certain%requirements%of%what%type%of%SAM/BAM%they%take.%%Your%default%data%format%should%be%a%sorted,'indexed'BAM'file!%transform%(view)%sorng%index%Transform%(view)%Download%and%‘make’%SAMTOOLS%h`p://samtools.sourceforge.net/%Samtools:%is%suite%of%commands%Most%acons%will%provide%help%on%their%usage%Default%Operaon%• By%default%samtools%expects%a%BAM%file%as%input%and%will%%produce%a%SAM%file%as%output%%• Every%alignment%result%should%be%stored%as%a%sorted%and%indexed%BAM%file%Transform%SAM%to%BAM%samtools'view'–Sb'input.sam''tempfile.bam'samtools'sort'–f'tempfile.bam'output.bam'transform%to%bam%sort%bam%file%samtools'index'output.bam'Index%bam%file%Add%the%following%to%the%%previous%week’s%shell%script%Filtering%SAM/BAM%files%samtools'view'–f'samtools'view'–F'Required%flag%(keep%if%matches)%Filtering%flag%(remove%if%matches)%Flags%are%using%a%bitwise%representaon%1%%%=%%00000001%!%%paired%end%read%2%%%=%%00000010%!%%mapped%as%proper%pair%4%%%=%%00000100%!%%unmappable%read%8%%%=%%00001000%%!%read%mate%unmapped%16%=%%00010000%%!%read%mapped%on%reverse%strand%7c'means%to%count%the%lines%%7f'number%&%keep%reads%that%match%%7F'number%&%remove%reads%that%match%Align%the%reads%contained%in%the%data%for%lecture%11%A%sorted%file%will%stay%sorted%during%transformaon%• Once%sorted%all%output%will%stay%sorted%regardless%of%the%output%type%(SAM,%BAM)%• You%can%creang%a%second,%smaller%and%filtered%file%that%does%not%need%to%be%sorted%again.%• You%do%need%to%index%the%new%file%though!%Explore%other%commands%samtools'flagstat'data.bam'samtools'idxstats'data.bam'Flag%stascs%Index%stats%samtools'depth-data.bam'|'head'Depth%of%coverage%Querying%a%BAM%file%%name:start7end'Samtools%allows%querying:%samtools'view'data.bam''chrV:100062000-Homework%16%Generate%a%sorted%and%indexed%BAM%file%based%on%the%data%lect15.fq.gz'%1. Find%the%number%of%uniquely%mapped%reads%%%2. Find%the%number%of%high%quality%alignments%(MAPQ30)%for%each%strand%separately%%3. A%genomic%feature%has%its%start%site%on%the%forward%strand%on%chromosome%I%at%posion%111,000.%%%– How%many%reads%fall%within%500b%upstream%of%this%locaon?%%– Print%the%posion%of%each%read%(hint:%there%are%not%that%many)%%– Report%the%number%of%reads%in%this%region%for%each%strand%separately.%%