SpatialDataMiningandItsApplications空间数据挖掘及其应用XingXieMicrosoftResearchAsiaJun.8,2010SpatialDataSpatialdata:anykindofdatathathasoneormorefieldsconcerningwithlocation,shape,areaandsimilarattributesHuman,animals,restaurants,photos,queries,tripsPoints,lines,polygons,etc.SpatialattributesTopologicalAdjacencyorinclusioninformationGeometricPosition(longitude/latitude),area,perimeter,boundarypolygonSpatialDataMiningSpatialdatamining:understandingspatialdata,discoveringrelationshipsbetweenspatialandnon-spatialdata,constructionofspatialknowledgebasesSpatialpatternsClustering/classificationAssociations,co-locationsSpatialoutlieranalysisLocationpredictionSequentialpatternsLocationBasedServicesGPSwillbeinstalledon40+%phonesby2011worldwideLocationbasedservice(LBS)willbecomea13Bbusinessby20130%10%20%30%40%50%60%70%80%90%100%20042005200620072008200920102011PercentageofTotalSalesAfricaAsia/PacificEasternEuropeJapanLatinAmericaMiddleEastNorthAmericaWesternEuropeTotalSource:GartnerDataquesteLocationBasedSocialNetworksLooptFoursquareBedo(贝多)ProjectsinMSRAsiaGeoLife:BuildingSocialNetworksUsingHumanLocationHistory()QueryCo-LocationPatternDiscovery(LocWeb2008,ACMGIS2008)TreasureMap:AGameBasedApproachtoAssignGeographicalRelevancetoWebImages()GeoLife:BuildingSocialNetworksUsingHumanLocationHistoryPOI/YPDBApplicationsUnderstandingPeopleSimilarusers:FriendrecommendationExperiencedusers:TravelexpertsrecommendationGroupusers:CommunitydiscoveryUnderstandingLocationsPersonalizedlocationrecommendationMininginterestingLocationsDetectingclassicaltravelsequencesGPSDevicesandUsers60devicesand138usersFromMay2007~present16%45%30%9%age=2222age=2526=age29age=3018%14%10%58%MicrosoftemplyeesEmployeesofothercompaniesGovernmentstaffColleagestudentsALarge-ScaleGPSDataset(byFeb.18,2009)10+millionGPSpoints260+millionmeters36citiesinChinaandafewcityintheUSA,KoreaandJapanUnderstandingUserMobility-1InferringtransportationmodesfromGPSdataDifferentiatedriving,ridingabike,takingabusandwalkingDifficultiesVelocity-basedmethodcannothandlethisproblemwell(0.5accuracy)PeopleusuallytransfertheirtransportationmodesinatripTheobservationofamodeisvulnerabletotrafficconditionandweatherUnderstandingUserMobility-2The1stfinding:walkingisatransitionbetweenothermodesPartitionatrajectoryintosegmentsofdifferentmodesHandlecongestiontosomeextentWalkBusCertainSegmentDenotesanon-WalkPoint:P.VVtorP.aatDenotesapossibleWalkpoint:P.VVtandP.aat(b)(c)BackwardForwardDriving(a)CertainSegment3UncertainSegmentsDrivingUnderstandingUserMobility-3The2ndfinding:manyfeaturesaremorediscriminativethanvelocityHeadingChangeRate(HCR)StopRate(SR)Velocitychangerate(VCR)0.65accuracyH1p1p2p3p1.V1p2.V2L1,T1p1.headp2.headVelocityVelocityVelocityDistanceDistanceDistancea)Drivingb)Busc)WalkingVsVsVsUnderstandingUserMobility-4Post-processingTransitionprobabilitybetweendifferenttransportationmodesP(Bike|Walk)andP(Bike|Driving)TypicaluserbehaviorsbasedonlocationConstrainsoftherealworldSegment[i-1]:DrivingSegment[i]:WalkSegment[i+1]:BikeP(Driving):75%P(Bus):10%P(Bike):8%P(Walk):7%P(Bike):62%P(Walk):24%P(Bus):8%P(Driving):6%P(Bike):40%P(Walk):30%P(Bus):20%P(Driving):10%GroundTruthInferenceresultTransitionP(Walk|Driving)TransitionP(Bike|Walk)BusstopBusstopUnderstandingUserMobility-5The3rdfinding:users’GPSlogsimplyroadnetworkUsethelocationconstrainsandtypicaluserbehaviorsasprobabilisticcuesBeingindependentofthemapinformationM={Driving,Walk,Bike,Bus},E.g.,P(M0)=P(Driving);P(M3|M1)=P(Bus|Walk);N1N2N7N8N6N5N3N1N2N5N3N4N1N4N8N5P18(Mi)P185(Mi|Mj)BuildingGraph(3)Spatialindexing(4)ProbabilitycalculationN7N8N6Changepointsandstart/endpoints(1)(2)AstartorendpointAchangepointP85(Mi)P54(Mi)P854(Mi|Mj)P581(Mi|Mj)P458(Mi|Mj)UnderstandingUserMobility-6ADCP/PCP/RVelocity-basedmethod0.490.150.58Advancedfeatures(SR+HCR+VCR)0.650.250.72Velocity-basedfeatures+advancedfeatures0.7280.270.78EF+normalpost-processing0.7410.310.77EF+graph-basedpost-processing0.7620.340.77MiningUserSimilarityBasedonLocationHistory-1FriendrecommendationpersonalizedlocationrecommendationMotivationFirstlawofthegeographySignificanceofusersimilarityincommunitiesIncreasingavailabilityofuser-generatedtrajectoriesDifficultiesHowtouniformlymodelusers’locationhistoriesHowtomeasureusersimilarityMiningUserSimilarityBasedonLocationHistory-2LocationhistoryrepresentationStaypointdetectionHierarchicalclusteringPersonalgraphbuildingP6P2P5P7P8P9P1StayPoint2StayPoint1P3P4Latitude,Longitude,TimeP1:Lat1,Lngt1,T1P2:Lat2,Lngt2,T2………...Pn:Latn,Lngtn,TnabcdeABLayer1Layer2Layer3G3G1G2aecABMiningUserSimilarityBasedonLocationHistory-3SimilarsequencesSamevisitingorder:ai==biSimilartransitiontime:SimilarityestimationThelengthofthematchedsimilarsequenceThelayerofthematchedsimilarsequenceLayer1Layer2Layer3G3G1G2HighLowabecABLayer1Layer2Layer3G3G1G2HighLowabdecABUser2:bdUser1:ABUser1:aceUser1:ABUser3:ABABceABUser1:aceUser2:ABUser3:bceUser1:User3User2MiningUserSimilarityBasedonLocationHistory-40.720.760.80.840.880.920.96MAPM