分析转录因子结合位点

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

第六章基因预测和基因结构分析(I)生物信息学基因组测序策略Genomesequencing:QUICKER,SMALLER,CHEAPERNatureBiotechnology26,1135-1145(2008)13years$3billion1day$1000(2008)identifyingnewgeneslookingatchromosomeorganizationandstructurefindinggeneregulatorysequencescomparativegenomicsApplicationsofsequencingWherearetheGenesintheGenome?GAGAAAATCAATTGGTTTAGAAGGTTTGGACTCACTTGACAGGTTCAGTTGGAGACGATCATAGGTGGCTGCTGTGACAAAGGGAAATTGTGCTTTTCCAGCATGCTTACTGACCCTGATTTACCTCAGGAGTTTGAAAGGATGTCTTCCAAGCGACCAGCCTCTCCGTATGGGGAAGCAGATGGAGAGGTAGCCATGGTGACAAGCAGACAGAAAGTGGAAGAAGAGGAGAGTGACGGGCTCCCAGCCTTTCACCTTCCCTTGCATGTGAGTTTTCCCAACAAGCCTCACTCTGAGGAATTTCAGCCAGTTTCTCTGCTGACGCAAGAGACTTGTGGCCATAGGACTCCCACTTCTCAGCACAATACAATGGAAGTTGATGGCAATAAAGTTATGTCTTCATTTGCCCCACACAACTCATCTACCTCACCTCAGAAGGCAGAAGAAGGTGGGCGACAGAGTGGCGAGTCCTTGTCTAGTACAGCCCTGGGAACTCCTGAACGGCGCAAGGGCAGTTTAGCTGATGTTGTTGACACCTTGAAGCAGAGGAAAATGGAAGAGCTCATCAAAAACGAGCCGGAAGAAACCCCCAGTATTGAAAAACTACTCTCAAAGGACTGGAAAGACAAGCTTCTTGCAATGGGATCGGGGAACTTTGGCGAAATAAAAGGGACTCCCGAGAGCTTAGCTGAGAAAGAAAGGCAACTCATGGGTATGATCAACCAGCTGACCAGCCTCCGAGAGCAGCTGTTGGCTGCCCACGATGAGCAGAAGAAACTAGCTGCCTCTCAGATTGAGAAACAGCGTCAGCAAATGGAGCTGGCCAAGCAGCAACAAGAACAAATTGCAAGACAGCAGCAGCAGCTTCTACAGCAACAACACAAAATCAATTTGCTCCAGCAACAGATCCAGGTTCAAGGTCAGCTGCCGCCATTAATGATTCCCGTATTCCCTCCTGATCAACGGACACTGGCTGCAGCTGCCCAGCAAGGATTCCTCCTCCCTCCAGGCTTCAGCTATAAGGCTGGATGTAGTGACCCTTACCCTGTTCAGCTGATCCCAACTACCATGGCAGCTGCTGCCGCAGCAACACCAGGCTTAGGCCCACTCCAACTGCAGCAGTTATATGCTGCCCAGCTAGCTGCAATGCAGGTATCTCCAGGAGGGAAGCTGCCAGGCATACCCCAAGGCAACCTTGGTGCTGCTGTATCTCCTACCAGCATTCACACAGACAAGAGCACAAACAGCCCACCACCCAAAAGCAAGGATGAAGTGGCACAGCCACTGAACCTATCAGCTAAACCCAAGACCTCTGATGGCAAATCACCCACATCACCCACCTCTCCCCATATGCCAGCTCTGAGAATAAACAGTGGGGCAGGCCCCCTCAAAGCCTCTGTCCCAGCAGCGTTAGCTAGTCCTTCAGCCAGAGTTAGCACAATAGGTTACTTAAATGACCATGATGCTGTCACCAAGGCAATCCAAGAAGCTCGGCAAATGAAGGAGCAACTCCGACGGGAACAACAGGTGCTTGATGGGAAGGTGGCTGTTGTGAATAGTCTGGGTCTCAATAACTGCCGAACAGAAAAGGAAAAAACAACACTGGAGAGTCTGACTCAGCAACTGGCAGTTAAACAGAATGAAGAAGGAAAATTTAGCCATGCAATGATGGATTTCAATCTGAGTGGAGATTCTGATGGAAGTGCTGGAGTCTCAGAGTCAAGAATTTATAGGGAATCCCGAGGGCGTGGTAGCAATGAACCCCACATAAAGCGTCCAATGAATGCCTTCATGGTGTGGGCTAAAGATGAACGGAGAAAGATCCTTCAAGCCTTTCCTGACATGCACAACTCCAACATCAGCAAGATATTGGGATCTCGCTGGAAAGCTATGACAAACCTAGAGAAACAGCCATATTATGAGGAGCAAGCCCGTCTCAGCAAGCAGCACCTGGAGAAGTACCCTGACTATAAGTACAAGCCCAGGCCAAAGCGCACCTGCCTGGTGGATGGCAAAAAGCTGCGCATTGGTGAATACAAGGCAATCATGCGCAACAGGCGGCAGGAAATGCGGCAGTACTTCAATGTTGGGCAACAAGCACAGATCCCCATTGCCACTGCTGGTGTTGTGTACCCTGGAGCCATCGCCATGGCTGGGATGCCCTCCCCTCACCTGCCCTCGGAGCACTCAAGCGTGTCTAGCAGCCCAGAGCCTGGGATGCCTGTTATCCAGAGCACTTACGGTGTGAAAGGAGAGGAGCCACATATCAAAGAAGAGATACAGGCCGAGGACATCAATGGAGAAATTTATGATGAGTACGACGAGGAAGAGGATGATCCAGATGTAGATTATGGGAGTGACAGTGAAAACCATATTGCAGGeneaGenes(i.e.,proteincoding)But...only2%ofthehumangenomeencodesproteinsOtherthanproteincodinggenes,whatisthere?•genesfornoncodingRNAs(rRNA,tRNA,miRNAs,etc.)•structuralsequences(scaffoldattachmentregions)•regulatorysequences•non-functional“junk”?It’sstilluncertain/controversialhowmuchofthegenomeiscomposedofanyoftheseclassesTheanswerswillcomefromexperimentationandbioinformatics.ComplexityofgenomePublishedbyAAASScience306,636-640(2004)TheENCODEProject:ENCyclopediaOfDNAElements–Proteincodinggenes.•Inlongopenreadingframes•ORFsinterruptedbyintronsineukaryotes•Takeupmostofthegenomeinprokaryotes,butonlyasmallportionoftheeukaryoticgenome–RNA-onlygenes•TransferRNA,ribosomalRNA,snoRNAs(guideribosomalandtransferRNAmaturation),intronsplicing,guidingmRNAstothemembranefortranslation,generegulation—thisisagrowinglist–Genecontrolsequences•Promoters•Regulatoryelements–Transposableelements,bothactiveanddefective•DNAtransposonsandretrotransposons•Manytypesandsizes–Repeatedsequences.•Centromeresandtelomeres•Manywithunknown(orno)function–Uniquesequencesthathavenoobviousfunction•Asageneralrule,eachpartofagenomicsequencehasonlyonefunction:protein-codinggene,RNAgene,controlsignal,transposableelement,repeatsequence,maybenofunctionalatall.But,mostsequenceelementsoverlaponlyslightlyifatall.What’sinagenome?protein-codinggenes,non–protein-codinggenes•easiertofindthanotherfunctionalelements•why?•genesaretranscribed—whichmeansthatwecanidentifythembylookingatRNA•traditionallythishasbeendonebycDNAorESTsequencing,morerecentlybymicroarray,SAGE,MPSS,etc.protein-codinggeneshaverecognizablefeatures1.openreadingframes(ORFs)2.codonbias3.knowntranscriptionandtranslationalstartandstopmotifs(promoters,3’poly-Asites)4.spliceconsensussequencesatintron-exonboundariesFindingprotein-codinggenesbegingeneregionstarttranslationdonorsplicesiteacceptorsplicesitestoptranslationendgeneregionsingleexonexonfinalexoninitialexon5’UTR3’UTRintronA,T,G,CFindingnon–protein-codinggenes•e.g.,tRNA,rRNA,snoRNA,miRNA,variousotherncRNAs•Hardertofindthanprotein-codinggenes•Why?•oftennotpoly-Atailed—don’tendupincDNAlibraries•noORF•constraintonsequencedivergenceatnucleotid

1 / 44
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功