lecture-1(宾夕法尼亚大学二代测序数据分析教程)

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

BMMB#597D:AnalyzingNextGenera1onSequencingDataBMMB#852:AppliedBioinforma1csWeek1,Lecture1István#Albert##Bioinforma1csConsul1ngCenterPennState,2013Introduc1onsLecturer:Istvan#Albert#(iua1@psu.edu)TA:Nicholas#Stoler#(nicholas.stoler@psu.edu)Officehours:MonRWedfrom1R3pmin502BWar1kEmail:iua1@psu.edu#CourseWebpage:hBp://• Lifesciencesarebecomingadatadrivenscience• Dataisrepresentedastextfilesinvariousformatsthataretransformedonestepata1me• Mostbioinforma1csclassesarefocusedoncomputerscienceoralgorithms.• Wewillfocusoninforma1onprocessingandapplica1onsRequirements• RecommendedlatestMacOSX10.8.4–(properlysetup)• OranotherUnixbasedopera1ngsystem• ifyouhaveaWindowscomputerpleaseinstallLinux– UbuntuLiveCD– DualbootLinuxandWindows– UseVirtualBoxandinstallLinuxintoitLecturetopics15weeks–twolecturesperweek=30lectures• coreinforma1cscompetency• computa1onalfounda1ons• biologicaldataformats• sta1s1calmethodsandvisualiza1on• soawaretoolsandtheirapplica1onsLectureFormats• Backgroundinforma1on• Prac1calexamplesthat1einwiththetopic• Finishingwithinclassexercises+homework• We’lltrytomakeitsimpleandeasytofollowHomework• Homeworkwillbegivenoutduringeach#lecture#and#correspondtothelecture.Labeled1,2…30• HomeworkdueontheTuesdayofthefollowingweekofwhenitwasgivenout.• Forexample:homework1and2willbeduenextTuesday.• Note:thereareofficehour(s)betweeneachhomework’duedate(WedandMon)• Homeworkusuallyfitsononesheetofpaper.Showthecommandsandtheiroutput.Grading• Gradeswillbetheaverageofall#homework#+final#project• Final#projectgivenoutlastweek,andisdueonMondayonthefinal’sweek.• ForhomeworkandprojectsyoumayworkinteamsComputa1on!Thought• Computa1onalapproachesreflectandaffectthethoughtprocess• Whenwelearninforma1cs,welearnhowtothinkinawaythatiseasytotranslateintocomputa1on• Thereisnomagic–itisjustlikeanyothersubjectmajer–itneedsalotofprac1ce(thebrainisamuscle)• Similartolearningaforeignlanguage–thereisavocabulary,agrammaridioma1cexpressionsRealis1cNewsBioinforma1cswillneverbeeasyortrivial!Itislikehighal1tudemountainhikingNeverunderes1mateit.Bioinforma1cs:itislikehikinguptoHallet’sPeakAtypicalbioinforma1csprojectislikehikinguptoHallet’sPeakintheRockyMountainItishardwork,withalotofeffortandyouifkeepitup,payajen1onyou’llgetthere.Thereisasteepbutnotoverlydangeroustrailintheback.Thereisnospecialskillotherthanproperwalkingtechniqueandnotgivingup.Therearenomagical#shortcutsthatyouwilllearn.Expecta1onsYoucanonly#learnbydoingitSpend3R6hoursoutsideclasseachweek:– Explorebehaviors– Expandthescopeofthestudy– Trynewsolu1onsTimeflieswhenyouknowwhatyouaredoing.Complexityversusdecisionmaking• Mostbioinforma1csanalysesconsistsofavery#large#number#ofvery#simpledecisions• Mostofwhichneedtobecorrect!• Thisiswhatmakesitdifficult• Therearenostrictrules,onlyguidelinesdaretoimproviseandadapt##Bioinforma1cstodayLargedatasetsgeneratedbycomplexequipment1. Data#managementstorage,transfer,datatransforma1ondomainofInformaRon#Technology2. Data#analysismapping,assemblyalgorithmscalingdomainofComputer#Science3. StaRsRcal#challenges#tradi1onalsta1s1csisnotwellsuitedformodelingsystema1cerrorsoverlargenumberofobserva1onsdomainofStaRsRcs4. Biological#hypothesis#tesRng##datainterpreta1ondomainofLife#Science#AnalysisScaling• Analysisalgorithmsalmostneverscalelinearlywiththeamountofdata.• Forexample,naïvesequencecomparisonsscaleasN*N:inordertocompareNsequencesagainstthemselvesweneedtodoN*Nopera1ons• N=1N=103analysis1meincreasesfrom1106.OriginsofClassicalSta1s1csDevelopedintheeraof• Limitedcomputa1onalcapabili1es• SmallandexpensivedatasetsOperatesonconceptssuchas“nullhypothesis”and“pRvalues”Currentlyinlifesciences• Powerfulcomputa1onalcapabili1es• CheapandextremelylargedatasetsSmallsystema1cdevia1onsstronglyinfluenceanytest–weareunabletoseparatethemanyinfluencesTheeraofabsurd(silly?)pVvalues,##p=10V19#Datacharacteris1cs• Random#errors#andsystemaRc#errors#accumulateandcompoundduringeachstep:fromsampleextrac1on,prepara1onthenmeasurements• LargenumberofmeasurementsmakeunlikelyeventsverycommonData#produced#by#equipment#novelinforma1onExample:74newSNVs(singlenucleo1devaria1ons)perindividualpergenera1onChallenges:shiaingterminologyWhatisthedifferencebetweenaSNP(singlenucleo1depolymorphism)andaSNV(singlenucleo1devaria1on)?ASNVisaprivatemuta1onwhileaSNPisamuta1onthatissharedamongstapopula1onAtwhatpointdoesaSNPturnintoSNV?BioStar:hjp://• Istartedthesitein2009duringthefirstyearthatBMMB597Dwasoffered!• Itwasmeanttosupportques1onsforthiscourse• Todayithasgrowntoajractover40Kuniquevisitorspermonthandover2#million#pageviewsperyearBioStar:hjp:// UpdateyourMac#OS#tothelatestversion10.8.4(MountainLion)2. UsingtheAppStoredownloadandinstallXCode###3. Downloadandinstallthecommand#line#tools##XcodepreferencesDownloadsOnLinux• Installawellsupportedlinuxversion:Ubuntu,Debian,Fedoraetc.• Useapackage#managertoinstalldependencies,leadstoincanta1onssuchas:aptVget#install#zlib1gVdev##Succ

1 / 32
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功