3DataMiningforWebPersonalizationBamshadMobasherCenterforWebIntelligenceSchoolofComputerScience,Telecommunication,andInformationSystemsDePaulUniversity,Chicago,Illinois,USAmobasher@cs.depaul.eduAbstract.InthischapterwepresentanoverviewofWebpersonalizationpro-cessviewedasanapplicationofdataminingrequiringsupportforallthephasesofatypicaldataminingcycle.Thesephasesincludedatacollectionandpre-processing,patterndiscoveryandevaluation,andfinallyapplyingthediscoveredknowledgeinreal-timetomediatebetweentheuserandtheWeb.Thisviewofthepersonalizationprocessprovidesaddedflexibilityinleveragingmultipledatasourcesandineffectivelyusingthediscoveredmodelsinanautomaticpersonal-izationsystem.Thechapterprovidesadetaileddiscussionofahostofactivitiesandtechniquesusedatdifferentstagesofthiscycle,includingthepreprocessingandintegrationofdatafrommultiplesources,aswellaspatterndiscoverytech-niquesthataretypicallyappliedtothisdata.WeconsideranumberofclassesofdataminingalgorithmsusedparticularlyforWebpersonalization,includingtech-niquesbasedonclustering,associationrulediscovery,sequentialpatternmining,Markovmodels,andprobabilisticmixtureandhidden(latent)variablemodels.Finally,wediscusshybriddataminingframeworksthatleveragedatafromava-rietyofchannelstoprovidemoreeffectivepersonalizationsolutions.3.1IntroductionTheultimategoalofanyuser-adaptivesystemistoprovideuserswithwhattheyneedwithoutthemaskingforitexplicitly[89].Automaticpersonalization,therefore,isacentraltechnologyusedinsuchsystems.InthecontextoftheWeb,personalizationimpliesthedeliveryofdynamiccontent,suchastextualelements,links,advertisement,productrecommendations,etc.,thataretailoredtoneedsorinterestsofaparticularuserorasegmentofusers.Wedistinguishbetween“automaticpersonalization”andwhatissometimesreferredtoas“customization”.Bothcustomizationandpersonalizationrefertothedeliveryofcontenttailoredtoaparticularuser.Whatseparatesthesetwonotionsiswhocontrolsthecreationofuserprofilesaswellasthepresentationofinterfaceelementstotheuser.Incustomization,theusersareincontrolof(oftenmanually)specifyingtheirpreferencesorrequirements,basedonwhichtheinterfaceelementsarecreated.Ex-amplesofcustomizationontheWebincludecustomizedWebsites,suchasMyYahoo(),andavarietyofe-commerceWebsites(suchas)thatallowformanualconfigurationsofsystemsorservicesbeforepurchase.Automaticpersonalization,ontheotherhand,impliesthattheuserprofilesarecreated,andpoten-tiallyupdated,automaticallybythesystemwithminimalexplicitcontrolbytheuser.ExamplesofautomaticpersonalizationincommercialsystemsincludeAmazon.com’spersonalizedrecommendations,musicorplaylistrecommenderssuchasMystrand.com,andavarietyofnewsfilteringagentsavailabletoday.Traditionalapproachestoautomaticpersonalizationhaveincludedcontent-based,collaborative,andrule-basedfilteringsystems.Eachoftheseapproachesisdistin-guishedbythespecifictypeofdatacollectedtoconstructuserprofiles,andbythespecifictypeofalgorithmicapproachusedtoprovidepersonalizedcontent.Generally,theprocessofpersonalizationconsistsofadatacollectionphaseinwhichtheinforma-tionpertainingtouserinterestsisobtainedandalearningphaseinwhichuserprofilesareconstructedfromthedatacollected.Learningfromdatacanbeclassifiedintomem-orybased(alsoknownaslazy)learningandmodelbased(oreager)learningdependingonwhetherthelearningisdoneonlinewhilethesystemisperformingthepersonaliza-tiontasksorofflineusingtrainingdata.Standarduser-basedcollaborativefilteringandmostcontentbasedfilteringsystemsthatuselazylearningalgorithmsareexamplesofthememory-basedapproachtoper-sonalization,whileitem-basedandothercollaborativefilteringapproachesthatlearnmodelspriortodeploymentareexamplesofmodel-basedpersonalizationsystems.Memorybasedsystemssimplymemorizeallthedataandgeneralizefromitatthetimeofgeneratingrecommendations.Theyarethereforemoresusceptibletoscalabilityissues.Model-basedapproaches,thatperformthecomputationallyexpensivelearningphaseoffline,generallytendtoscalebetterthanmemorybasedsystemsduringtheonlinedeploymentstage.Ontheotherhand,asmoredataiscollected,memorybasedsystemsaregenerallybetteratadaptingtochangesinuserinterestscomparedtomodelbasedtechniquesthatmusteitherbeincrementalorberebuilttoaccountforthenewdata.Theseadvantagesandshortcomingshaveledtoanextensivebodyofresearchandpracticecomprisedofavarietyofpersonalizationorrecommendersystemsthatgenerallyfallintotheaforementionedcategories.Ourgoalinthischapterisnottoprovideanoverviewautomaticpersonalization,ingeneral.Rather,wefocusmorespecificallyonWebpersonalizationwheretherec-ommendedobjectscomefromarepositoryofWebobjects(itemsorpages)browseableeitherthroughnavigationoflinksbetweentheobjects,usuallyinaparticularWebsite.Furthermore,weareparticularlyinterestedinadataminingapproachtopersonaliza-tionwherethegoalistoleverageallavailableinformationaboutusersoftheWebsitetodeliverapersonalexperience.Kohavietal.[62]suggestfivedesiderataforsuccessindataminingapplications:–datarichwithdescriptionstoenablesearchforpatternsbeyondsimplecorrelations;–largevolumeofdatatoallowforbuildingreliablemo