2003/10/21AlgorithmsinBioinformatics,Lecture51生物資訊相關演算法AlgorithmsinBioinformatics呂學一(中央研究院資訊科學所)~hil/2003/10/21AlgorithmsinBioinformatics,Lecture52TodayApplicationsofsuffixtrees–Substringproblem(暖身)–“Exactstringmatching”revisited–Linearizationofcircularstring(挪移乾坤)–Longestcommonsubstring(異中求同)Intermission–小巨’smagicshow–TheLureofQueen(皇后的魅惑)2003/10/21AlgorithmsinBioinformatics,Lecture53ApplicationOneSubstringProblem(recapasawarm-up)2003/10/21AlgorithmsinBioinformatics,Lecture54SubstringProblemInput:twostringsPandS,–whereSisallowedtobepreprocessedinO(|S|)time.Output:anoccurrenceofPinS.Objective:doneinO(|P|)time.2003/10/21AlgorithmsinBioinformatics,Lecture5512345678S=bbabbaab[1,–]1[3,–][1,1][2,–][3,–]12[7,–][2,3][4,–][7,–]1[4,–]1[7,–][3,3][4,–]1[3,3]2003/10/21AlgorithmsinBioinformatics,Lecture5612345678S=bbabbaab[1,1][7,–][2,3][4,–][7,–][4,–][7,–][3,3][4,–][3,3]Q:Whereareabba,baa,bb?2003/10/21AlgorithmsinBioinformatics,Lecture5712345678S=bbabbaab12[1,1]34[7,–][2,3][4,–]5[7,–][4,–]6[7,–][3,3][4,–][3,3]3211Q:Whereareabba,baa,bb?2003/10/21AlgorithmsinBioinformatics,Lecture58ApplicationTwoExactStringMatching2003/10/21AlgorithmsinBioinformatics,Lecture59ExactStringMatchingInput:twostringsPandS,–whereSisallowedtobepreprocessedinO(|S|)time.Output:alloccurrencesofPinS.Challenge:solvingthisinO(|P|+k)time,–wherekisthenumberofoccurrencesofPinS.2003/10/21AlgorithmsinBioinformatics,Lecture510IdeaEachinternalnodekeepsthelabelsofallitsdescendantleaves.2003/10/21AlgorithmsinBioinformatics,Lecture51112345678S=bbabbaab12[1,1]34[7,–][2,3][4,–]5[7,–][4,–]6[7,–][3,3][4,–][3,3]Q:Something’smissing?6,35,24,15,2,4,1Q:Howdowefixthisproblem?2003/10/21AlgorithmsinBioinformatics,Lecture512123456789S=bbabbaab$12[1,1]34[7,–][2,3][4,–]5[7,–][4,–]6[7,–][3,3][4,–][3,3]5,24,15,2,4,1,817[9,–][4,4][5,–]8[9,–]9[9,–]6,3,7Q:ObtainableinO(|S|)time?2003/10/21AlgorithmsinBioinformatics,Lecture513Perhapsnot…S=aaaaa$1654321,2,3,4,51,2,3,41,2,31,22003/10/21AlgorithmsinBioinformatics,Lecture514AnobservationConsiderthesequenceLofleavesfromlefttoright.ThedescendantleavesofeachinternalnodehastobeconsecutiveinL.2003/10/21AlgorithmsinBioinformatics,Lecture515123456789S=bbabbaab$12[1,1]34[7,–][2,3][4,–]5[7,–][4,–]6[7,–][3,3][3,3]5,24,15,2,4,1,87[9,–][4,4][5,–]8[9,–]9[9,–]6,3,7123456789L=6375241891,34,84,56,72003/10/21AlgorithmsinBioinformatics,Lecture516ApplicationThreeCircularStringLinearization(挪移乾坤)2003/10/21AlgorithmsinBioinformatics,Lecture517NotationLet挪(S,i)denotethestringS[i…|S|]S[1…i–1].Si挪(S,i)2003/10/21AlgorithmsinBioinformatics,Lecture518TheproblemInput–astringS.Output–anindexithatmaximizesthealphabeticalorderof挪(S,i).12345678挪(S,1)=bbabbaab挪(S,2)=babbaabb挪(S,3)=abbaabbb挪(S,4)=bbaabbba挪(S,5)=baabbbab挪(S,6)=aabbbabb挪(S,7)=abbbabba挪(S,8)=bbbabbaabbabbaab2003/10/21AlgorithmsinBioinformatics,Lecture519Naïvealgorithmletj=1;fori=2to|S|do{if(挪(S,i)挪(S,j)){letj=i;}}outputj;Timecomplexity?2003/10/21AlgorithmsinBioinformatics,Lecture520Q:CanwebeatO(|S|2)?12345678挪(S,1)=bbabbaab挪(S,2)=babbaabb挪(S,3)=abbaabbb挪(S,4)=bbaabbba挪(S,5)=baabbbab挪(S,6)=aabbbabb挪(S,7)=abbbabba挪(S,8)=bbbabbaabbabbaab2003/10/21AlgorithmsinBioinformatics,Lecture521Linear-TimeAlgorithmviaSuffixTree2003/10/21AlgorithmsinBioinformatics,Lecture522Firstattempt–goingright12345678bbabbaabbabbaababbaabbbaabbaabaababbQ:Howtofixtheproblem?2003/10/21AlgorithmsinBioinformatics,Lecture523SecondAttemptSuffixtreeforSS2003/10/21AlgorithmsinBioinformatics,Lecture524KeyobservationEachlength-|S|substringofSSisa挪(S,j)forsomeindexjwith1≤j≤|S|.Each挪(S,j)with1≤j≤|S|isalength-|S|substringofSS.2003/10/21AlgorithmsinBioinformatics,Lecture5251234567890123456SS=bbabbaabbbabbaab1[1,–]1[3,–][1,1][2,–][3,–]12[7,–][2,3][4,–][7,–]1[4,–]1[7,–][3,3][4,–]12[10,–][4,5][6,–]1[10,–][2,2][3,3]2345[3,3]2003/10/21AlgorithmsinBioinformatics,Lecture526[1,1][7,–][4,–][7,–][4,–][7,–][3,3][10,–][4,5][6,–][10,–][2,2][3,3][3,3]Q:Howtousethissuffixtree?1234567890123456SS=bbabbaabbbabbaab2003/10/21AlgorithmsinBioinformatics,Lecture527Equivalently,…OutputtheindexisuchthatSS[i…|SS|]correspondstotherightmostleafofthesuffixtreeforSS.–Clearly,thistakesO(|S|)time.2003/10/21AlgorithmsinBioinformatics,Lecture528ApplicationFourLongestcommonsubstring(異中求同)2003/10/21AlgorithmsinBioinformatics,Lecture529TheproblemInput:twostringsAandB.Output:alongeststringCthatoccursinbothAandB.A=bbbabbaabB=baabbabbabC=bbC=baabC=abbaC=bbabba2003/10/21AlgorithmsinBioinformatics,Lecture530NaïvealgorithmbuildsuffixtreeforB;forL=|A|downto1dofori=1to|A|-L+1do{ifA[i…i+L-1]occursinB{outputA[i…i+L-1]andhalt;}}}output“nocommonsubstring”;Timecomplexity?2003/10/21AlgorithmsinBioinformatics,Lecture531O(|A|3+|B|)()3||12||1||1||||||)()1|(|AOAiiAOiOiAALALAL=+-=+-====Thefor-looptakestimeCanwedobetterthanthis?2003/10/21AlgorithmsinBioinformatics,Lecture532AfasteralgorithmbuildsuffixtreeforB;fori=1to|A|do{findthelargestintegerL(i)suchthatA[i…i+L(i)-1]occursinBbybinarysearch;}outputA[i…L(i)]fortheiwiththelargestL(i);Timeco