WoLF-PSORT-蛋白亚细胞定位预测

整理文档很辛苦,赏杯茶钱您下走!

免费阅读已结束,点击下载阅读编辑剩下 ...

阅读已结束,您可以下载文档离线阅读编辑

资源描述

NucleicAcidsResearch,2007,Vol.35,WebServerissueW585–W587doi:10.1093/nar/gkm259WoLFPSORT:proteinlocalizationpredictorPaulHorton1,Keun-JoonPark1,2,TakeshiObayashi3,NaoyaFujita1,3,HajimeHarada1,C.J.Adams-Collier4andKentaNakai3,*1ComputationalBiologyResearchCenter,AIST,Tokyo,Japan,2CenterforGenomeScience,NationalInstituteofHealth,KoreaCenterforDiseaseControl&Prevention,5Nokbeon-Dong,Eunpyung-Gu,Seoul122-701Korea,3HumanGenomeCenter,InstituteofMedicalScience,UniversityofTokyo,Tokyo,Japanand4CollierTechnologies,Everett,WA,USAReceivedJanuary30,2007;RevisedMarch26,2007;AcceptedApril8,2007ABSTRACTWoLFPSORTisanextensionofthePSORTIIprogramforproteinsubcellularlocationprediction.WoLFPSORTconvertsproteinaminoacidsequencesintonumericallocalizationfeatures;basedonsortingsignals,aminoacidcompositionandfunctionalmotifssuchasDNA-bindingmotifs.Afterconversion,asimplek-nearestneighborclassifierisusedforprediction.Usinghtml,theevidenceforeachpredictionisshownintwoways:(i)alistofproteinsofknownlocalizationwiththemostsimilarlocalizationfeaturestothequery,and(ii)tableswithdetailedinformationaboutindividuallocalizationfeatures.Forconvenience,sequencealignmentsofthequerytosimilarproteinsandlinkstoUniProtandGeneOntologyareprovided.Takentogether,thisinformationallowsausertounderstandtheevidence(orlackthereof)behindthepredictionsmadeforparticularproteins.WoLFPSORTisavailableatwolfpsort.orgINTRODUCTIONBilipidmembranesdivideeukaryoticcellsintovarioustypesoforganellescontainingcharacteristicproteinsandperformingspecializedfunctions.Thus,subcellularlocalizationinformationgivesanimportantcluetoaprotein’sfunction.AlthoughlocalizationsignalsinmRNAappeartoplaysomerole(1),themaindetermi-nantofaprotein’slocalizationresiduesintheprotein’saminoacidsequence.(Werecommendwikipedia.org/wiki/Protein_targetingforabriefoverviewandAlbertsetal.(2)foratextbookdescription.)Numerousexperimentstodetermineproteinlocaliza-tionhavebeenperformedtodate.Thesecanbroadlybeclassifiedas:small-scaleexperiments—theresultsofwhichcontinuetoaccumulateinpublicdatabases,suchasUniProt(3)andGeneOntology(4);andlarge-scaleexperimentsusingepitope(5)orgreenfluorescentprotein(GFP)(6)tagging,orbyseparationoforganellesbycentrifugationcombinedwithproteinidentificationbymassspectrometry(7,8).Althoughtheyprovideinvaluableinformation,thecoverageofexperimentaldataisonlyhighformodelorganisms,particularlyyeast.Moreover,theagreementamongstlarge-scaleexperimentaldataisonly75–80%(6–9).Thus,computationalpredictionoflocalizationfromaminoacidremainsanimportanttopic.Numerouscomputationalmethodsareavailable[reviewedin(10,11)].Some(includingWoLFPSORT)haverecentlybeenbenchmarkedbySprengeretal.(12),whofoundthecomputationalmethodstobeusefulforsites,suchasthenucleus,forwhichmanytrainingexamplescanbeeasilyobtainedfromUniProt(whichisthesourceofmostorallofthetrainingdataformostpredictionmethods—includingWoLFPSORT).Thedifferentmethodstheybenchmarkedwerefoundtohavedifferentstrengths.Here,wedescribethepublicserverforourWoLFPSORTmethod.PREDICTIONMETHODWoLFPSORTisanextensionofPSORTII(13,14)andalsousesthePSORT(15)localizationfeaturesforprediction.Inaddition,WoLFPSORTusessomefeaturesfromiPSORT(16)andaminoacidcomposition.Thosefeaturesareusedtoconvertaminoacidsequencesintonumericalvectors,whicharethenclassifiedwithaweightedk-nearestneighborclassifier.WoLFPSORTusesawrappermethodtoselectanduseonlythemostrelevantfeatures.Thisreducestheamountofinformationwhichneedstobeconsidered(anddisplayed)fortheusertointerpretindividualpredictionsandmayalsomakethepredictorlesspronetooverlearning.Thepredictionmethodhasdescribedinmoredetailelsewhere(17).*Towhomcorrespondenceshouldbeaddressed.Tel:þ81-3-5449-5131;Fax:þ81-3-5449-5133;Email:knakai@ims.u-tokyo.ac.jp2007TheAuthor(s)ThisisanOpenAccessarticledistributedunderthetermsoftheCreativeCommonsAttributionNon-CommercialLicense()whichpermitsunrestrictednon-commercialuse,distribution,andreproductioninanymedium,providedtheoriginalworkisproperlycited.DatasetTheWoLFPSORTdatasetisdividedintofungi,plantandanimalcontaining2113,2333and12771proteins,respectively.ThecurrentdatawasprimarilyobtainedfromUniProt(3)version45,butsubcellularlocalizationinformationfromGeneOntology(4)wasalsoused.Entrieswithevidencecodes{TAS,IDA,IMP}wereincluded,withmanualrevisionsinafewcases.Weintendtoupdatethesedatasetsregularlyinthefuture.LOCALIZATIONSITESANDPREDICTIONACCURACYWoLFPSORTclassifiesproteinsintomorethan10loca-lizationsites,includingduallocalizationsuchasproteinswhichshuttlebetweenthecytosolandnucleus.Basedonourcross-validationstudies(17),weestimatesensitivityandspecificityofaround70%for:nucleus,mitochondria,cytosol,plasmamembrane,extracellularand(inplants)chloroplast.Forothersites,suchasperoxisome,Golgi,etc.thesensitivityisverylow,butusefulpredictionsarestillmadeinsomecases.Forexample,theArabidopsisseedprotein12S1_ARATHisreasonablypredictedtolocalizetothevacuoleeventhoughonlyoneofitsneighbors(seebelow)sharessignificantsequencesimilarity.Anindependenttest(12)onmouseprotein

1 / 3
下载文档,编辑使用

©2015-2020 m.777doc.com 三七文档.

备案号:鲁ICP备2024069028号-1 客服联系 QQ:2149211541

×
保存成功