PairwiseSequenceAlignment成对序列比对中国协和医科大学基础医学院2推荐读物DavidMount,2004.Bioinformatics,2nded.JonathenPavsner,2003.Bioinformatics&FunctionalGenomics生物信息学札记()R.Durbinetal.BiologicalSequenceAnalysis3提纲序列比对的基本概念序列比对的基本方法动态规划算法的基本算法原理序列比对中的评分矩阵序列比对的统计分析利用NCBI的Blast2Seq进行成对序列比较4成对序列比对是最基本的生物信息学的计算用于确定两个蛋白质(或基因)结构或功能上是否相关用于识别蛋白质间共有的保守的domain是利用BLAST(下节内容)进行生物序列数据库搜索的基础用于基因组的分析用于蛋白质三维结构的预测……5Pairwisealignmentofhumanretinol-bindingproteinandβ-lactoglobulin1MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG50RBP.||||.|...|:.||||.:|:1...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD.44lactoglobulin51LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE97RBP:||||::|.|.|||:|||.45ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK93lactoglobulin98DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC136RBP||||.|:.|||||..|94IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC135lactoglobulin137RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV185RBP.|||:||.||||136QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI.......178lactoglobulin6RBPandb-lactoglobulinarehomologousproteinsthatsharerelatedthree-dimensionalstructuresretinol-bindingprotein(NP_006735)β-lactoglobulin(P02754)7AlignmentTheprocessoflininguptwoormoresequencestoachievemaximallevelsofidentity(andconservation,inthecaseofaminoacidsequences)forthepurposeofassessingthedegreeofsimilarityandthepossibilityofhomology.Pairwisealignment(双重序列比对or成对序列比对)8几个重要概念(一)Homology(同源性)Similarityattributedtodescentfromacommonancestor.Identity(相同性)Theextenttowhichtwo(nucleotideoraminoacid)sequencesareinvariant.RBP26RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD-84+K+++++GTW++MA+L+AVT++L+W+glycodelin23QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEIVLHRWEN81几个重要概念(二)Identity(相同性)TheextenttowhichtwosequencesareinvariantConservation(保守性)Changesataspecificpositionofanaminoacidor(lesscommonly,DNA)sequencethatpreservethephysico-chemicalpropertiesoftheoriginalresidue.Similarity(相似性)Theextenttowhichnucleotideorproteinsequencesarerelated.Itisbaseduponidentityplusconservation.10Pairwisealignmentofretinol-bindingproteinandb-lactoglobulin1MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG50RBP.||||.|...|:.||||.:|:1...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD.44lactoglobulin51LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE97RBP:||||::|.|.|||:|||.45ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK93lactoglobulin98DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC136RBP||||.|:.|||||..|94IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC135lactoglobulin137RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV185RBP.|||:||.||||136QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI.......178lactoglobulinIdentity(bar)以第一行为例:排列了44个氨基酸残基,11个相同,percentidentity为25%(11/44)11Pairwisealignmentofretinol-bindingproteinandb-lactoglobulin1MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG50RBP.||||.|...|:.||||.:|:1...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD.44lactoglobulin51LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE97RBP:||||::|.|.|||:|||.45ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK93lactoglobulin98DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC136RBP||||.|:.|||||..|94IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC135lactoglobulin137RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV185RBP.|||:||.||||136QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI.......178lactoglobulinSomewhatsimilar(onedot)Verysimilar(twodots)以第一行为例:排列了44个氨基酸残基,11个相同,3个相似,percentsimilarity为32%(14/44)12Pairwisealignmentofretinol-bindingproteinandb-lactoglobulin1MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG50RBP.||||.|...|:.||||.:|:1...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD.44lactoglobulin51LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE97RBP:||||::|.|.|||:|||.45ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK93lactoglobulin98DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV...........QYSC136RBP||||.|:.|||||..|94IPAVFKIDALNENKVL........VLDTDYKKYLLFCMENSAEPEQSLAC135lactoglobulin137RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV185RBP.|||:||.||||136QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI.......178lactoglobulinInternalgapTerminalgapOpengapextensiongap13Gaps(空位)在序列比对中,在某一个位置如果一个残基没有匹配的残基,就形成空位;空位的评分分值常是负分(空位罚分);因为单一一次突变事件会导致不止一个残基的插入或删除,因此在比对中空位的存在权重比空位长度的权重更大;在进行BLAST时,一般不需要改变缺省的空位罚分。1415几个重要概念(三)twotypesofhomologoussequenceOrthologs(直系同源类似物)Homologoussequencesindifferentspeciesthatarosefromacommonancestralgeneduringspeciation;Paralogs(旁系同源类似物)Homologoussequenceswithinasinglespeciesthatarosebygeneduplication.16Homologoussequences.OrthologsandParalogsaretwotypesofhomologoussequences.Orthologydescribesgenesindifferentspeciesthatderivefromacommonancestor.Orthologousgenesmayormaynothavethesamefunction.Paralogydescribeshomologousgeneswithinasinglespeciesthatdivergedbygeneduplication.