I.J.ModernEducationandComputerScience,2015,8,69-84PublishedOnlineAugust2015inMECS()DOI:10.5815/ijmecs.2015.08.08Copyright©2015MECSI.J.ModernEducationandComputerScience,2015,8,69-84WebPagesRetrievalwithAdaptiveNeuroFuzzySystembasedonContentandStructureMohammadSaberIrajiFacultyMemberofDepartmentofComputerEngineeringandInformationTechnology,PayameNoorUniversity,I.R.ofIranEmail:iraji.ms@gmail.comHakimehMaghamniaDepartmentofComputerEngineeringandInformationTechnology,PayameNoorUniversity,I.R.ofIranEmail:h.maghamnia@gmail.comMarziehIrajiDepartmentofComputerEngineeringandInformationTechnology,UniversityCollegeofRouzbahan,Sari,IranEmail:marziehiraji@gmail.comAbstract—Volumeofwebpagesandinformationonthewebisconstantlyincreasing.Inthispaper,wepresentedasystemtoretrievepagesrelevanttoaquery,thatcanbeusedbythesearchengines.Thedesignofourproposedsystem,content,Pagecontentofneighbors,Connectivity(linkanalysis)featureswereusedandthemethodsoffuzzySugenoandadaptivefuzzyneuralnetworkmethodsconsidered.Resultsshowedthattheneuralmethod,theerrorislessthanothermethods,intheretrievalofwebpagestailoredtotheuserssearchqueryontheWeb,canincreasetheefficiencyofsearchengines.IndexTerms—Webpagesretrieval,adaptiveneurofuzzy,searchengines.I.INTRODUCTIONApplicationsofcomputerandInternetissearchinglargevolumesofpages,andinformationretrievalresearchersandcomputerusers.MostpeopleusesearchingthroughaquerysearchengineslikeGoogleuse.ThevolumesofinformationavailableontheInternetareincreasingeverymoment.Memberslookingforusefulinformationonthemassareimportantforsearchenginestoprovideusefulinformationtousers,often.InSearchenginesthemainchallengeisdeterminerelevantdocumentsandirrelevanttothequery.SearchenginebenefitfromspiderssuchasWebrobots,usingdifferentalgorithmtoretrieveWebpagesrelevanttoaspecificdomain.Filteringmethodsaredividedintofourcategories:1.DeterminetherelevanceofaWebpagetoasubjectmanuallybyexperts[1].2.Suitabilityofawebpagetoaspecifictopic,thenumberofoccurrencesofkeywords[2].3.TFIDF(termfrequencyinversedocumentfrequency)iscomputedbasedonalexicon[3].4.Textclassificationmethodsthatappliedtowebpages[4].WebpagefilteringcanbepragmaticinsearchenginesandWebapplicationssuchasWebcontentmanagement.Eventspammingcircumfuseonwebpages,afteremail.Resultofwebspammingisdecreasequalityforsearchengine.Thus,itwastefulpagesindexedinthesearchenginesandqueryprocessingcostincreases[5].Thisisachallengeforservers,providetheappropriateinformationtoInternetusersbasedonthecontentandlinksofwebpages.ThisarticleismotivatedbydesigninganeuralfuzzysysteminordertoaccuratelyretrieveWebpages,accordingtoInternetusers'queries.Theaimofthisstudyistoexamineanddiscussaboutthewebpagesretrievalsystem,thispaperattemptstooptimizethewebpagesretrievalalgorithm.Thepaperisorganizedinfivesections.AftertheintroductioninSectionI,SectionIIwhichalsointroducestherelatedworksofwebpagesfiltering.SectionIIcontinueswithAdaptiveneurofuzzymodelsforproposedsystemandexamplesinsectionIII.SectionIVandVpresentstheresults,conclusionsoftheresearch.Thepaperendswithalistofreferences.II.WORKHISTORYGooglescholarisasearchenginethatuseforresearcher.GoogleScholarCitationisthehighestfactorintheretrievalprocessandtheincidenceofasearchwordinanarticle’stitletohaveapotentimpactonthearticle’sranking[6].RongmeiLibeevolvedclickedpagesfromclickeddomainsInordertoimprovetheefficiencyofWebinformationretrieval[9].HemaDubey,B.N.Royofferanewpagerankalgorithmbaseonmeanpageranksandreducesalgorithmcomplexity[10].Bhamidipati,etallintroducethescorefusiontechniqueandapplywhentwopages70WebPagesRetrievalwithAdaptiveNeuroFuzzySystembasedonContentandStructureCopyright©2015MECSI.J.ModernEducationandComputerScience,2015,8,69-84havesameranking[11].Sharma,etallwerecomparedDifferentmethodsforrankingwebpageswithdifferentalgorithms[12].Minnie,etallhaveimplementedLinksAlgorithmandotheralgorithms[13].theresultisretrievedIfanexactmatchoccurs,otherwisenot.Qiu,Hemmje,etallofferpagefilteringsystembasedonpagelinks,information’spagelinksforimprovesearchqueryalgorithms[14].In[15]reportamachine-learning-basedmethodthatmixWebcontentandstructureanalysis.TheydisplayeachWebpagebyasetofcontent-basedandlink-basedfeatures.TheywereusedtypeofneuralnetworksNamelysupportvectormachineandcomparetheirproposedmethodwithtwoexistingwebpagefilteringmethods—akeyword-basedmethodandalexicon-basedmethodandresultsperformbetter.Scarselli,etallhaveintroducedamachinelearningtypeforwebspamdiscoverybasedonGraphNeuralNetworksPM-GraphSOMs.theyuseLink-based(Degree-relatedmeasures,PageRank,TrustRank,TruncatedPageRank,Estimationofsupporters)andContent-basedfeatures(Fractionofanchortext,Fractionofvisibletext,Compressionrate,Corpusprecisionandcorpusrecall,Queryprecisionandqueryrecall,Independenttrigramlikelihood,Entropyoftrigrams)inyourproposedsystem[5].Theywereappraisetheirsystemintoatrainingdata(8339pages)andatestdata(1851pages)fromWEBSPAM-UK2006dataset.Theresultsshowthattheoptimizationoftheirmethod.Khokale,etallofferedaWebinformationretrievalwithFuzzylogic.Inth