搜索
您的当前位置:首页Mining Mobile Sequential Patterns in a

Mining Mobile Sequential Patterns in a

来源:爱问旅游网
278IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

MiningMobileSequentialPatternsinaMobileCommerceEnvironment

Ching-HuangYunandMing-SyanChen,Fellow,IEEE

Abstract—Inthispaper,weexploreanewdataminingcapa-bilityforamobilecommerceenvironment.Tobetterreflectthecustomerusagepatternsinthemobilecommerceenvironment,weproposeaninnovativeminingmodel,calledminingmobilesequen-tialpatterns,whichtakesboththemovingpatternsandpurchasepatternsofcustomersintoconsideration.Howtostrikeacom-promiseamongtheuseofvariousknowledgetosolvetheminingonmobilesequentialpatternsisachallengingissue.Wedevisethreealgorithms(algorithmTJLS,algorithmTJPT,andalgo-rithmTJPF)fordeterminingthefrequentsequentialpatterns,whicharetermedlargesequentialpatternsinthispaper,fromthemobiletransactionsequences.AlgorithmTJLSisdevisedinlightoftheconceptofassociationrulesandisusedasthebasicscheme.AlgorithmTJPTisdevisedbytakingboththeconceptsofassocia-tionrulesandpathtraversalpatternsintoconsiderationandgainsperformanceimprovementbypathtrimming.AlgorithmTJPFisdevisedbyutilizingthepatternfamilytechniquewhichisde-velopedtoexploittherelationshipbetweenmovingandpurchasebehaviors,andthusisabletogeneratethelargesequentialpat-ternsveryefficiently.Asimulationmodelforthemobilecommerceenvironmentisdeveloped,andasyntheticworkloadisgeneratedforperformancestudies.Inminingmobilesequentialpatterns,itisshownbyourexperimentalresultsthatalgorithmTJPFsignifi-cantlyoutperformsothersinbothexecutionefficiencyandmemorysaving,indicatingtheusefulnessofthepatternfamilytechniquede-visedinthispaper.Itisshownbyourresultsthatbytakingbothmovingandpurchasepatternsintoconsideration,onecanhaveabettermodelforamobilecommercesystemandisthusabletoex-ploittheintrinsicrelationshipbetweenthesetwoimportantfactorsfortheefficientminingofmobilesequentialpatterns.

IndexTerms—Datamining,mobilecomputing,mobilesequen-tialpatterns,userbehavior.

Fig.1.Illustrativeexampleforamobiletransactionsequencewherecellsareunderlinedifitemsarepurchasedthere.

I.INTRODUCTION

T

HEEMERGENCEofpowerfulportabledevices,alongwithadvanceinwirelesscommunicationtechnologies,hasmadethemobileservicesavailable.Inthenearfuture,itisexpectedthattensofmillionsofuserswillcarrymobilephonesorportabledevicesthatusewirelessconnectiontoac-cessaworldwideinformationnetworkforbusinessorpersonalusefromanywhereatanytime,makingthemobilecommerce(MC)areality[1],[57],[58].Forexample,eNetworkWebExpress[19]enablesmobileuserstousecommercialWebap-plicationsoverwide-areawirelessnetworks(WANs).Bluetoothtechnology[20]allowsterminalsandcashregisterstotalkdi-rectlytoeachotherforthepurposeofmobilecommerce.The

ManuscriptreceivedMay2,2003;revisedJuly9,2004.Theworkwassup-portedinpartbytheNationalScienceCouncilofTaiwan,R.O.C.,underContractNSC93-2752-E-002-006-PAE.

TheauthorsarewiththeDepartmentofElectricalEngineering,NationalTaiwanUniversity,Taipei,Taiwan,R.O.C.(e-mail:mschen@cc.ee.ntu.edu.tw).DigitalObjectIdentifier10.1109/TSMCC.2005.855504

WirelessAccessProtocol(WAP)[21]bringstheMCenviron-mentaworld-widestandardforprovidingInternetcommuni-cationstodigitalmobilephones.InanMCenvironment,cus-tomerscanmakeanytransactionfromanywhereatanytimewiththepaymentmechanismprovidedbybanksorcreditcardcompanies[58].Inaddition,somekindofNokiamobilephonesprovidethewalletapplicationthatenablescustomerstogeteasyaccesstomobileservicesandtomakeconvenientonlinemo-biletransactions[2].Inthewallet,customerscanstoresensitivepersonalinformation,suchaspaymentandloyaltycarddetails,deliveryaddresses,andnotes,aswellasserviceprofiles.Inaddi-tion,withthewalletapplication,theNokiamobilephoneshavethecapabilityofstoringthetransactionswithmovingpatternsandpurchasingpatternsofcustomers.

Example1.1:Oneexamplescenarioenvisionedforamo-biletransactionsequenceisshowninFig.1,whereacustomermovesinthemobilecommerceenvironmentandmakestrans-actionsinthecorrespondingcellthroughthemobiledevice.Fig.1(a)showsthemovingpatternsofthiscustomerandthemobiletransactionsequencedataisrecordedinFig.1(b),whereforexample,itemi1waspurchasedwhenthecustomermovedtothecellA.

ItisimportanttonotethatsincecustomersaremovingalonganMCenvironmenttosearchfordesireditemstopurchase,theimplicationsfrommovingpatternsandpurchasepatternsareinfactentangled,andbothareofgreatimportanceforstudyingcustomerbehaviors.Clearly,thedistinctivefeaturesofknowl-edgediscoveryinanMCenvironmentincreasethedifficultyof

1094-6977/$25.00©2007IEEE

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT279

extractinginformationfromthemobiletransactionsequences.However,asthesemobilecommerceservicesarebecomingin-creasinglypopularnowadays,itisimperativetodeviseefficientalgorithmsforderivingcustomerbuyingbehaviortoimprovethequalityoftheseservices.Asaresult,thedesignanddevel-opmentofefficientminingalgorithmsforknowledgediscoveryinanMCenvironmentwhilefullyexploringtheintrinsicrela-tionshipbetweenmovingandpurchasepatternsistakenastheobjectiveofthispaper.ConductingtheminingonthemovingandpurchasepatternsofcustomersinanMCenvironmentiscalledtheminingofmobilesequentialpatterns(i.e.,largese-quentialpatterns)inthispaper.Inaddition,anovelknowledge,calledmobilesequentialrules,canbederivedfromthemobilesequentialpatternsforthemeasurementofcustomerpurchasebehaviorassociation.

Example1.2:FortheexampleshowninFig.1,thecustomerhasonekindofmovingpatternABCandtwokindsofpurchas-ingpatterns{󰀇A;t1󰀈andIfthere󰀇C;t9are󰀈}wheresufficientitemsetcustomerst1={i1having}anditemsett9={i2,i3}.thesamepatterns,themobilesequentialpatternisanimplica-tionoftheform󰀇{󰀇A;t1󰀈,󰀇C;t9󰀈}:ABC󰀈,whichmeansthatmostcustomersusuallypurchaseitemsett1incellAandthenpurchaseitemsett9incellCwiththespecificpathABC.Inad-dition,themobilesequentialruleisanimplicationoftheform󰀇{󰀇A;t1󰀈=⇒󰀇C;t9󰀈:ABC󰀈whichmeansthatcustomerspur-chasingitemsett1incellAareusuallymovingalongpathABCtocellCforpurchasingitemsett9.Withthemobilesequentialrule,whenacustomerpurchasesitemsett1incellA,thecellularphonecompanycouldsendthecouponsofproducts(i.e.,itemi2anditemi3)initemsett9toboostthesalesthroughthebasestationsinthecellsA,B,orCinaccordancewiththeirbroad-castingschedules.Moredescriptionaboutmobilecommerceisavailablein[1].

ThedetailsofrelatedworksaregiveninSectionII-A.De-spitesomeeffortshavingbeenelaborateduponexaminingtheuserbehavior,noneofthepriorwork,tothebestofourknowl-edge,hastakenbothmovingandpurchasepatternstogetherintoconsiderationtomodelthecustomerbehaviorinamobilecom-merceenvironment.Thiscaninpartbeexplainedbythefactthatthecostisexpensivetotrackandlogdetailedmovementsofmobileuserstoday.1However,itisexpectedthatsuchcostwilldecreasesoonandthecellularphonewillbecomethepopularinterfaceoftheinterconnectionnetworksforaccessingvariousservices[57],thusjustifyingthepracticalityandnecessityofconductingmobilesequentialpatternmining.Itisunderstoodthattherecordsofcellsvisitedanditemspurchased,requiredforminingmobilesequentialpatterns,maybelongtodifferentcompanies,andforthesecompanies,theymayhavedifferentconsiderationsonusingtheirdatatoimprovethemobilecom-merceservicesprovided.Itshouldgowithoutsayingthatsuchdataanalysisshouldbedonesolelyforthepurposesofsystemandserviceimprovementsandshouldbeconductedinacontin-gentwaythatneitheranylawisviolatednoristheprivacyofcustomersintruded.Nevertheless,withthelegalityandprivacy

1The

costtolocateamobileuserisestimatedtobeaboutUS$0.01to

US$0.05eachtimeaccordingtoamajormobilephoneserviceprovider.

Fig.2.Notionofminingmobilesequentialpatterns.

issuesconsidered,theknowledgediscoveryfromtheMCdataisbelievedtobeanincreasinglychallengingtechnicalprob-lemwhichisofgreatpracticalimportancefortheevolvingMCtechniques.

Consequently,tobetterreflectthecustomerbuyingbehav-iorintheMCenvironment,weproposeaninnovativeminingmodelthattakesboththemovingpatternsandpurchasepatternsintoconsideration.Inessence,theminingofmobilesequentialpatternsaggregatestheconceptsonminingassociationrules,miningpathtraversalpatterns,andminingsequentialpatterns,andthusrequiresacombineduseofcorrespondingtechniques.ThenotionofminingmobilesequentialpatternsisshowninFig.2,wheretherelationshipamongtheseminingcapabili-tiesisdepicted.Howtostrikeacompromiseamongtheuseofvariousknowledgetosolvetheminingonmobilesequentialpatternsisachallengingissue.Asanefforttosolvethisprob-lem,wedeviseaprocedure,namelymobilesequentialpatternsMSPs),toconducttheminingofmobilesequentialpatterns.WiththedetailsdescribedintheSectionII-C,theprocedureMSPsplitstheproblemofminingmobilesequentialpatternsintofourphases,namely:1)thelarge-transactiongenerationphase;2)thelarge-transactiontransformationphase;3)thesequential-patterngenerationphase;and4)thesequential-rulegenerationphase.

Inthispaper,theperformancebottleneckisinphase3),i.e.,thesequential-patterngenerationphase.Byhavingdifferentprioritiesonthefactorsinvolvinglargeitemsets,traversalpathsandordersofpurchases,wedevisethreealgorithms(algorithmTJLS,algorithmTJPT,andalgorithmTJPF)todeterminemobilesequentialpatterns.First,algorithmTJLSisdevisedinlightoftheconceptofitemsetjoininginassociationrulesmining[6].However,aswillbeseenlater,withoutfullyutilizingthetraversalpathsofmobilesequentialpatterns,algo-rithmTJLStendstocountthesupportsofalotofout-of-pathsequentialpatterns(i.e.,thesequentialpatternswhichdonotstaywithinapath),thusdegradingtheperformance.Next,toeliminatetheout-of-pathsequentialpatterns,algorithmTJPTisdevisedbytakingboththeconceptsofassociation

280IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

rules[6]andpathtraversalpatterns[12]intoconsiderationandgainsperformanceimprovementbypathtrimming.However,algorithmTJPTincursthecomparisonalongthepathwithtimecomplexityO(P),wherePistheaveragepathlength,andmaygeneratesomeuncertaincandidatesequentialpatternswhichundesirablyrequirefurthersubpatternidentifications.Consequently,byfullyexploringtheintrinsicrelationshipbe-tweenmovingandpurchasebehaviorsofcustomers,algorithmTJPFisdevelopedinlightofthepatternfamilytechnique.Thepatternfamilyofapatternconsistsofthepatternitselfandallitssubpatternsgeneratedineachround.Inessence,thepatternfamilytechniqueisafilteringtechniquethatfullyexploitsthepurchasepatternsgeneratedintheintermediatestagesofmo-bilesequentialpatternmining.Forbetterreadability,wedeferthedetaileddescriptionofthepatternfamilytechniqueandthecorrespondingtheoreticalpropertiestoSectionIII.Itwillbeshownthatutilizingtheinformationofpatternfamily,algo-rithmTJPFcomparesthepathwithtimecomplexityO(1)andgeneratesfeweruncertaincandidatesequentialpatterns,thusre-ducingthecorrespondingcomputationaloverheadandmemoryconsumption.

Afterallmobilesequentialpatternsareobtained,themobilesequentialrulescanbederivedwithastraightforwardwayanditisdescribedclearlyinSectionII-C4.AsimulationmodelfortheMCenvironmentisdevelopedandasyntheticworkloadisgeneratedforperformancestudies.Byutilizingpatternfamilytechnique,TJPFisshowntobeabletodeterminelargese-quentialpatternsveryefficiently.Asvalidatedbythesyntheticworkload,itisshownbyourexperimentalresultsthatalgorithmTJPFsignificantlyoutperformsothersinboththeexecutionef-ficiencyandthememorysaving.Itisshownbyourresultsthatbytakingbothmovingpatternsandpurchasepatternsintocon-sideration,onecanhaveabettermodelforanMCsystemandisthusabletoexploittheintrinsicrelationshipbetweenthesetwocustomerbehaviors.

Thispaperisorganizedasfollows.PreliminariesaregiveninSectionII.InSectionIII,threealgorithms(TJLS,TJPT,andTJPF)aredevisedfordetermininglargesequentialpatterns.ExperimentalstudiesareconductedinSectionIV.ThispaperconcludeswithSectionV.

II.PRELIMINARIES

Inthissection,theproblemofminingmobilesequentialpat-ternsisdescribedinSectionII-A,therelatedworksaredescribedinSectionII-B,andtheprocedureofminingmobilesequentialrulesisoutlinedinSectionII-C.A.ProblemFormulation

Inthemobilecommerceenvironmentwhereitemsaresoldinvariouscells,customersmaymoveamongthecellstopur-chaseitemsofinterestwitheithertraditionalorelectroniccom-mercetradingmechanisms.Ineithercase,customerspayforitemsthroughthemobiledevicesandthepurchasingrecordsarelogged.LetN={n1,n2,...,ng}bebeasetasetofofcellsitemsinthesoldMCenvironmentandI={i1,i2,...,ih}inthatenvironment.Wearegivenadatabaseofmobiletransaction

TABLEI

PATTERNSANDTHEIRNOTATIONS

sequences,whereeachmobiletransactionsequenceconsistsofsequence-id,cellsvisited,andalistofitemsetspurchasedinthecorrespondingcells,orderedbycustomermovementsamongcells.

Apathisdenotedby󰀇n1n2...ny󰀈,wherenj∈N,for1≤j≤y.Thus,thesequenceofcellsvisitedimplicitlyformsthepathofthemobiletransactionsequence.Inthispa-per,thediscoveredpatternsandtheirnotationaregiveninTableI.Atransaction,denotedas󰀇C;{i1,i2,...,ip}󰀈,meansthatitemset{i1,i2,...,ip}wasboughtincellC,whereC∈N,and{i1,i2,...,ip}⊆I.Thus,thelistofitemsetspurchasedinthecorrespondingcellsimplicitlyformsthelistoftransactionsofthemobiletransactionsequence.Asare-sult,eachmobiletransactionsequencecontainstheinforma-tionofpathandalistoftransactions.GivenadatabaseDMofmobiletransactionsequences,theproblemofminingmo-bilesequentialpatternsistodiscoverthefrequentsequentialpatternssamongallmobiletransactionsequences.Asequentialpatternisrepresentedbytheform󰀇listoftransactions:path󰀈,wherethetransactionsaremadealongthepath.Thesupportforasequentialpatternisdefinedasthenumberofmobiletrans-actionsequenceswhichsupportthissequentialpattern.Alargesequentialpatternisasequentialpatternwiththeminimumsupport(i.e.,asequentialpatternthatappearedinasufficientnumberofmobiletransactionsequences).

Thelengthofalargesequentialpatternisthenumberoftrans-actionsinthatlargesequentialpattern.Alargesequentialpatternoflengthkiscalledalargek-sequentialpattern.Thus,alarge1-sequentialpatterncanberepresentedbytheform󰀇transaction:cell󰀈,wherethetransactionismadeinthecell.Notethateachtransactioninalargek-sequentialpatternmustmeetthemini-mumsupport.In[7],anitemsetwithminimumsupportiscalledalargeitemsetorlitemset.Similarly,wecallatransactionwithminimumsupportlargetransactionorL-transaction,whichcanberepresentedas󰀇C;tj󰀈,wheretjrepresentsalitemsetincellC.Thus,ifthetransaction󰀇C;{i1,i2,...,ip}󰀈hasthemini-mumsupport,thelitemset{i1,i2,...,ip}willberepresentedastjandtheL-transaction󰀇C;{i1,i2,...,ip}󰀈willberepre-sentedas󰀇C;tj󰀈.Sinceeachtransactioninalargek-sequentialpatternwillhavetheminimumsupport,alargek-sequentialpatterncanberepresentedas󰀇{x1,x2,...,xk}:n1n2...ny󰀈,wherexjisanL-transactionmadealongthepathn1n2...ny.Recallthatinassociationrules[6],alargeitemsetisafrequentlypurchaseditemset.Insequentialpatterns[7],alarge

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT281

sequenceisafrequentlypurchasedsetofitemsetsorderedbythepurchasetime.Intraversalpatterns[12],alargereferenceisafrequentlytraveledpath.Inthispaper,alargesequentialpatternisapatterncontaining:1)thefrequentlypurchaseditemset(meaningthattheitemsetinanL-transactionmusthavetheminimumsupport);2)thefrequentlypurchasedsetofitemsetsorderedbythepurchasetime(meaningthatthesetofL-transactionsinalargesequentialpatternmusthavetheminimumsupport);and3)thefrequentlytraveledpath(meaningthatthepathinalargesequentialpatternmusthavetheminimumsupport).B.RelatedWork

Recently,miningofdatabaseshasattractedagrowingamountofattentionindatabasecommunitiesduetoitswideapplica-bilitytostudyingthebuyingbehaviorsofcustomers[11],[18].Miningassociationrulesisemployedtodiscovertheimpor-tantassociationsamongitemssuchthatthepresenceofsomeitemsinatransactionwillimplythepresenceofotheritemsinthesametransaction[5].Afterthat,severaltechnologiesonassociationrulemininghavebeendevelopedincluding:1)al-gorithmimprovements[3],[6],[10],[15],[27],[32],[43],[68];2)constraint-based[25],[28],[46];3)incrementalupdating[9],[14],[30];4)multipleminimumsupports[35],[60];5)fre-quentcloseditemsets[45],[47],[67];and6)generalized[53],multilevel[23],intertransaction[56],quantitative[54],andmul-tidimensional[61],[62].

Miningsequentialpatternswasfirstintroducedin[7]forfindingtheintertransactionpatternsinthetraditionalretailingenvironments.Afterthat,severaltechnologiesonsequentialpat-ternmininghavebeendeveloped,including:1)algorithmim-provements[26],[39],[48],[66];2)constraint-based[36],[55],[65];3)incrementalupdating[33],[44],[69];and4)general-ized[55].

Severaltemporalassociationruleminingtechniquesaread-dressedin[8],[13],[29],and[41].Episodemininghasbeenstudiedin[31]and[38]fordiscoveringfrequentpatternsinasequenceoftimeevents.Dasetal.[17]investigatedtheprob-lemoffindingrulesrelatingpatternsinatimeseriestootherpatternsinthatseries,orpatternsinoneseriestopatternsinanotherseries.Miningseriesofintervaleventswasdiscussedin[59]fordiscoveringthetemporalcontainmentrelationshipsofeventsequences.Atemporallogicapproachisproposedin[42]forfindingtemporalpatterns.Miningpartialordersfromthesequentialdataisexploredin[37].Miningsegmentwiseperi-odicpatternsisdiscussedin[24].Miningasynchronousperiodicpatternsisinvestigatedwithinasubsequenceshiftedbydistur-bance[63].Searchingforpartialperiodicpatternsintime-seriesdatabasesisdiscussedin[22].

AstudyonefficientminingofpathtraversalpatternsforcapturingWebuserbehaviorwasconductedin[12].WEB-MINER[16]wasdesignedforminingWebusageassociationrulesandsequentialpatterns.SeveralWWWserverlogsareanalyzedin[50]forderivingthepathdistributionpatternsoftheWebusers.Withmininglongestrepeatedsubsequences,arobustmethodwasproposedin[51]forreducingthecomplexity

Fig.3.

Flowchartofthewholeprocedureofminingmobilesequentialpatterns.

whilepreservingthepredictabilityoftheWebusersurfingpaths.Recently,severalECWebsiteshavebeenusingrecommendedsystemsforanalyzingthecustomerbehaviorstohelptheircus-tomerstofindproductsforpossiblepurchases[4],[52].ForcapturingtheuserbehaviorintheECenvironment,astudyonefficientminingWebtransactionpatternswasreportedin[64].Notethatbytreatingthecellsasotheritemsinthepatterns,onemayextendalgorithmGSP,whichwasdesignedforminingconventionalsequentialpatternsin[55],tofindmobilesequen-tialpatterns.However,suchanextensiontoalgorithmGSPisnotdeemedidealforminingmobilesequentialpatternsfortworeasons.First,sinceourapproachesprocessthecellsandtheitemsindividually,andthemodifiedGSPtreatsthecellasan-otheriteminthepatterns,itisexpectedthattheformerwillhaveasmallerdomaintoprocess,therebyhavingbetterefficiency,thanthelatter.Moreimportantly,bysimplytreatingthecellasanotherattribute,GSPisnotabletoutilizetheintrinsicrela-tionshipbetweenmovingandpurchasebehaviorsofcustomers,thusnotattainingtheminingefficiencywecouldhaveowingtothenatureofthisproblem.

C.ProcedureforMiningMobileSequentialPatternsWiththeaggregateconceptofminingonassociationrules,pathtraversalpatternsandsequentialpatterns,theproblemofminingmobilesequentialpatternscannotbesolvedbyasimpleadditionofpriortechniquessincefactorsinthesecompanionminingcapabilitiesareinfactentangled.Thisfactjustifiesthenecessityofdevisinganewminingprocedureformobilese-quentialpatterns.Asthemobilecommercebusinesshasbeenidentifiedbyseveralleadingindustrialcompaniesasthekeydirectiontomoveforyearstocome,itisbelievedthatmin-ingmobilesequentialpatternshasbecomeaverytimelyandimportantissuetoaddress.

TheflowchartforthewholeprocedureisshowninFig.3andthemeaningsofsymbolsaregiveninFig.4.Intheoverallprocedure,theproposedmethodsforminingmobilesequentialpatternsisoutlinedasfollows.

282IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

Fig.4.Meaningsofsymbolusedinminingmobilesequentialpatterns.

ProcedureMSP(MobileSequentialPatterns):

1)Large-TransactionGenerationPhase:Determinethe(L-transactionslargetransactions)fromthemobiletrans-actionsequences.

2)Large-TransactionTransformationPhase:EmployalgorithmLarge-TransactionTransformationwithSequence-Trimming(LTTST)totransformallmobiletransactionsequencesintothemaximalL-transactionsequences.

3)Sequential-PatternGenerationPhase:Employoneofthefollowingthreealgorithms[TJLS(TransactionsetJoinwithLarge-transactionset),TJPT(TransactionsetJoinwithPathTrimming),andTJPF(TransactionsetJoinwithPatternFamily)]todeterminethelargesequentialpatternsfromthemaximalL-transactionsequences.

4)Sequential-RuleGenerationPhase:Derivemobilese-quentialrulesfromthelargesequentialpatterns.

1)Large-TransactionGenerationPhase:Foreachcell,weapplyamodifiedalgorithmDHP[43]forfindingthesetofallL-transactionsTL.Similarlytotheapproachtakenby[7],thesetoflitemsetsismappedtoasetofcontiguousintegersforreducingthetimerequiredtocheckifamobilesequentialpatterniscontainedinamobiletransactionsequence.Notethatweareabletosimultaneouslydiscoverthesetofalllarge1-sequentialpatterns,sincethissetismainly{󰀇x:C󰀈|x∈TL,Cisthecellcontainingitemsetx}.

Example2.1:AnillustrativedatabaseforthisproblemisshowninFig.5,whereSequenceIDentification(SID)100isthemobiletransactionsequenceshowninFig.1.FortheexampledatabaseinFig.5,theL-transactionsareshowninFig.6(a).FortheL-transactionsshowninFig.6(a),afterthemappingshowninFig.6(b),thesetoflarge1-sequentialpatternsisshowninFig.6(c).

2)Large-TransactionTransformation(LTT)Phase:AswillbeseeninSectionIII,weneedtorepeatedlydeterminewhichpartofagivensetoflargesequentialpatternswillappearinthemobilesequentialpatterns.Forefficientlyminingthepatterns,weemployalgorithmLTTSTtotransformeachmobiletrans-

Fig.5.IllustrativeexampledatabaseDMthatstoressixmobiletransactionsequences.

Fig.6.Mappingtableshownin(b)mapsthelargetransactionsin(a)tothelarge1-sequentialpatternsin(c).

actionsequenceintoamaximalL-transactionsequenceinthisphase.

Example2.2:WiththemobiletransactionsequenceshowninFig.1andthemappingtableshowninFig.6(b),Fig.7illustratestheoperationsinalgorithmLTTST.InFig.7,thefirstcolumncorrespondstothesequenceofmovements,thesecondcolumncontainsthenodesvisitedandthethirdcolumnhastheitemspurchasedinSID100.Thefourthcolumngivestheon-goingL-transactioninthebufferandthefifthcolumngivestheon-goingstringinthebuffer.ThesixthcolumnshowstheL-transactionsetandtheseventhcolumnshowsthepathofthemaximalL-transactionsequencegeneratedbyLTTST.

Notethatthesameitemsetsindifferentcellsareviewedasdifferenttransactions.Thus,thesamelitemsetssoldindifferentcellswillbetransformedtodifferentintegers.

Example2.3:Forexample,transactions󰀇A;{i󰀇B;{ig,ih}󰀈,g,ih}󰀈,and󰀇C;{ig,ih}󰀈allhaveitemset{ig,ih}.Inad-dition,{ig,ih}isalitemsetinbothcellsAandBbutisnotincellC.Afterthisphase,{ig,ih}incellAand{ig,ih}incellBaretransformedtodifferentintegers(say,tyandtz)whereas{ig,ih}incellCwillbetrimmed.ForthedatabaseDMshown

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT283

Fig.7.Exampleforproducingthemaximallargetransactionsequences.

inFig.5,thetransformeddatabaseDT,storingmaximalL-transactionsequences,isshowninthefirsttableofFig.8forillustrativepurposes.

3)Sequential-PatternGenerationPhase:AfterallthemobiletransactionsequencesaretransformedtomaximalL-transactionsequences,threealgorithms(algorithmTJLS,algorithmTJPT,andalgorithmTJPF)aredevisedformininglargesequentialpatternsfromthetransformeddatabaseDT.Alargek-sequentialpatternisrepresentedas󰀇{x1,x2,...,xk}:n1n2...ny󰀈,wherexjisanL-transactionmadealongthepath{n1n2...ny}.ThedetailsofalgorithmsofthisphasewillbedescribedinSectionIII.

Example2.4:Thelargesequentialpatterns,generatedinthesequential-patterngenerationphasefromtheexampledatabaseDT,areshowninFig.8.Forexample,󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈}:ABCDEF󰀈isonelarge3-sequentialpattern,whoseL-transactionsetandpathappearinSID100,SID200,SID400,andSID500.Thesupportisthus4.

4)Sequential-RuleGenerationPhase:Afterthesequential-patterngenerationphase,wecanfindthemobilesequentialrulesfromthelargesequentialpatternsinthisphaseinastraightfor-wardmanner.Unliketheassociationrule[6],themobilese-quentialrule,derivedfrommobilesequentialpatternsinthispaper,isanimplicationoftheform󰀇X=⇒Z:n1,n2,...,ny󰀈,whereXandZarebothsetsofL-transactions,X∩Z=Φ,and{n1,n2,...,ny}⊆N.Therule󰀇X=⇒Z:n1n2...ny󰀈hassupportsifthenumberofmobiletransactionsequencesinDMcontaining󰀇X∪Z:n1n2...ny󰀈iss.Also,therule󰀇X=⇒Z:n1,n2,...,ny󰀈holdswithconfidencecifc%ofmobiletransactionsequencesinDMthatcontainXalsocon-

tainZalongthepath{n1,n2,...,ny}.Explicitly,support(󰀇X=⇒Z:n1n2...ny󰀈)=support(󰀇X∪Z:n1n2...ny󰀈),andconfidence(󰀇X=⇒Z:n1,n2,...,ny󰀈)=(󰀇X∪Z:n1n2...nh...ny󰀈)/(󰀇X:n1n2...nh󰀈}).

Example2.5:Forexample,supposethat󰀇{󰀇A;t1󰀈,󰀇C;t3󰀇,󰀇F;t4󰀈}:ABCDEF󰀈isonelarge3-sequentialpat-ternwithsupport=4and󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈}:ABC󰀈isonelarge2-sequentialpatternwithsupport=5.Then,wecanderiveonemobilesequentialrule󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈}=⇒{󰀇F;t4󰀈}:ABCDEF󰀈withthesupportequaltosupport(󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈}=⇒{󰀇F;t4󰀈}:ABCDEF󰀈)=

󰀇C;t3󰀈,󰀇F;t4󰀈}:ABCDEF󰀈)support(󰀇{󰀇A;t1󰀈,

=4andtheconfidence(󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈}=⇒

ABCDEF󰀈)=(support(󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,{󰀇F;t4󰀈}:

󰀇F;t4󰀈}:ABCDEF󰀈))/(support(󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈}:ABC󰀈)}=80%.

III.ALGORITHMSFORMININGMOBILE

SEQUENTIALPATTERNS

OncethedatabasecontainsmaximalL-transactionsequencesforallmobileusers,wecanderivethelargesequentialpatternsbyidentifyingthefrequentlyoccurringtransactionsequences.LetSkbethesetoflargek-sequentialpatterns,Rkbethesetofcandidatek-L-transactionsets,andCkrepresentthesetofcan-didatek-sequentialpatterns.RkisthetransactioncomponentofCk,andSkisasubsetofCk.Byhavingdifferentprioritiesonthefactorsinvolvinglargeitemsets,traversalpathsandordersofpurchases,wedevisethreealgorithms(algorithmTJLS,algo-rithmTJPT,andalgorithmTJPF)todeterminelargesequential

284IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

Fig.8.Largesequentialpatternsgeneratedinsequential-patterngenerationphasefromtheexampledatabaseDT.

patterns.BecausebothalgorithmTJPTandTJPFgenerateSkalongwiththegenerationofCk+1,weuseroundktorefertotheprocedureperformedtoobtain(Sk,Ck+1).ForalgorithmTJLS,weuseroundktorefertotheprocedureperformedtoobtain(Sk,Rk+1).NotethatS1isobtainedinthelarge-transactiongenerationphase,wethususeroundonetorefertotheprocedureperformedtoobtain(R2).Thesealgorithmsaredevisedstepbystepinlightofthefeaturesofthecandidategenerationofsequentialpatternsandareoutlinedasfollows.GeneralizedDescriptionsofAlgorithms:

1)AlgorithmTJLS:Byderivingastraightforwardextensionfrompriorworks,algorithmTJLSisdevisedasavariantofalgorithmaprioriin[6]byusingatwo-levelhashtreeinmininglargesequentialpatterns.

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT285

2)AlgorithmTJPT:Inlightoftheconceptofthepathtrimmingtechnique,algorithmTJPTisdevisedbytakingthepathintoconsiderationingeneratingthecandidatepatterns.

3)AlgorithmTJPF:Inlightoftheconceptofthepatternfamilytechnique,algorithmTJPFisdevisedbyusingtheshared-pathtreeingeneratingthecandidatepatterns.A.AlgorithmTJLS(TransactionsetJoinWithLarge-

TransactionSet)

AlgorithmTJLSisavariantofalgorithmaprioriin[6].Algo-rithmTJLSessentiallyutilizestheconceptofjoiningitemsetsinassociationrulemining[6],[55]whilesolvingthediscrepancybetweenlargesequentialpatternsandlargeitemsets.Similarlytoalgorithmapriori[6],TJLSjoinstheL-transactionsetsoflarge(k−1)-sequentialpatternsforthegenerationofcandidatek-L-transactionsetsintheproceduretodiscoverlargesequentialpatterns.However,unlikealgorithmapriori,TJLSemploysatwo-levelhashtree,calledthemobilesequencetree,tostorethecandidatesequentialpatterns.Byutilizingthetwo-levelhashingtechnique,TJLScanjointheL-transactionsetstoconstructthetransactioncomponentofthemobilesequencetreeinthecan-didategeneration.Then,inthedatabasescanforcountingthesupport,TJLSconstructsthepathcomponentbyextractingthecorrespondingpathfromthemaximalL-transactionsequenceswhoseL-transactionsetscontainthecorrespondingcandidateL-transactionsets.

Inthetwo-levelhashtree,anodeeithercontainsalistofpatterns(aleafnode)orahashtable(aninternalnode).Inaninternalnode,eachbucketofthehashtablepointstoanothernode.Thepatternsarestoredintheleafnodes.Therootofthehashtreeisdefinedtobeatdepth1.Aninternalnodeatdepthdpointstonodesatdepthd+1.WhenTJLSaddsapatternp,TJLSstartsfromtherootandgodownthetreeuntilreachingaleaf.Ataninternalnodeatdepthdinthetransactioncomponent,TJLSdecideswhichbranchtofollowbyapplyingahashfunctiontothedthL-transactionoftheL-transactionsetofpatternp.Similarly,ataninternalnodeatdepthginthepathcomponent,TJLSdecideswhichbranchtofollowbyapplyingahashfunctiontothegthcellofthepathofpatternp.

InthebeginningofhashingamaximalL-transactionsequencem,TJLSfindsallthecandidatesequentialpatternscontainedinmasfollows.IfTJLSreachesaninternalnodebyhashingtheL-transactionl(cellc),ithashesoneachL-transaction(cell)thatcomesafterl(c)inmandrecursivelyappliesthisproceduretothenodeinthecorrespondingbucket.IfTJLSreachesaleafnode,itfindswhichofthepatternsintheleafnodearecontainedinmandaddssupportcountstothem.

Example3.1:Fig.8isthelargesequentialpatternsgener-atedinsequential-patterngenerationphasefromtheexampledatabaseDT,andFig.9isthemobilesequencetreestoringS4inFig.8.

Forminingmobilesequentialpatterns,thefirstroundisex-ecutedwithlarge-transactiongenerationphasetoobtainS1,thesetoflarge1-sequentialpatterns,asshowninFig.6(c).In

addition,algorithmTJLSutilizestheL-transactionsinS1astheseedsetforgeneratingR2,thesetofcandidate2-L-transactionsets,whichisstoredinthetransactioncomponentofamo-bilesequencetree.Inthesecondround,TJLSconstructsthecompletemobilesequencetreebyhashingeachcombinationof2-L-transactionssetineachmaximalL-transactionsequenceintothetransactioncomponentandhashingthecorrespondingpathforconstructingthepathcomponent,tocountthesupportofeachcandidatesequentialpattern.Then,TJLSdestructsthemo-bilesequencetreeforderivingS2,thesetoflarge2-sequentialpatterns,andutilizestheL-transactionsetsinS2forgenerat-ingR3.Ineachsubsequentround,TJLSstartswithcandidateL-transactionsetsfoundinthepreviousroundforthecountingofsupportsofcandidatesequentialpatternsandthenidentifieslargesequentialpatterns.TJLSproceedstothegenerationofnewcandidateL-transactionsetsandstoresthemtothemobilesequencetree.Theprocedurecontinuesuntilnolargesequentialpatternsarederived.

Example3.2:ToillustratetheoperationsofalgorithmTJ,󰀇F;tLS,itcanbeseenfromFig.9,{󰀇A;t1󰀈,󰀇C;t3󰀈4󰀈,󰀇G;t5󰀈}isacandidate4-L-transactionsetgeneratedbyjoin-ingtheL-transactionsets{󰀇A;t{󰀇A;t󰀈,󰀇C;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈}and13󰀈,󰀇G;t5󰀈},respectively,fromlarge3-sequentialpatterns󰀇{󰀇A;t󰀇{󰀇A;t󰀇C;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈}:ABCDEF󰀈and1󰀈,3󰀈,󰀇G;t5󰀈}:ABCDEFG󰀈.Inscanningdatabasephase,algorithmTJLSconstructsthepathcomponentofthemobilesequentialtreeandcountssupportsofthecan-didatesequentialpatterns.Forexample,afterthetransac-tioncomponentoftreeisconstructedintheFig.9(a),TJLSscansthedatabaseDTinFig.8toobtainthepathcompo-nentinFig.9(b)whilealsocountingsupports.InSID100,whenthesupportforcandidateL-transactionset{󰀇A;t󰀇C;ttcounted,thecorresponding1󰀈,3󰀈,󰀇F;4󰀈,󰀇G;t5󰀈}isbeingpath󰀇ABCDEFG󰀈willbegeneratedinthepathcomponentofthemobilesequencetreetoaccountforonesupportcountof󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t󰀈5,󰀈}󰀇F;:ABCDEFGt󰀈.Hence,thefi-nalsupportof󰀇{󰀇A;t1󰀈,󰀇C;t34󰀈,󰀇G;t5󰀈}:ABCDEFG󰀈is2,i.e.,fromSID100andSID200.Explicitly,thecor-respondingpathisdividedintoseveralsubpathsbyidenti-fyingthecellsofL-transactions.FortheexampleshowninFig.10,algorithmTJ{󰀇A;t,󰀇P;tLScountsthesupportofL-transactionset1󰀈,󰀇C;t3󰀈7󰀈}inSID100.TJLSfirstlocates󰀇A;t1󰀈onposition(1)and󰀇C;t3󰀈onposition(2)intheL-transactionset,andthecorrespondingsubpath󰀇ABC󰀈isextractedfromthesubpath󰀇ABC󰀈shownin(3).Then,TJLSlocates󰀇C;t3󰀈onposition(2)and󰀇P;t7󰀈onposition(4)intheL-transactionset,andthecorrespondingsubpath󰀇CDEFGLP󰀈isextractedfromthesubpath󰀇CDEFGHQGLP󰀈shownin(5).NotethattheredundancyofHQGiseliminatedbecausetheycauseacyclebetweenL-transaction󰀇C;t3󰀈andL-transaction󰀇P;t7󰀈.Afterscanningdatabaseforcountingthesupportofcandi-datesequentialpatterns,TJLSobtainslargesequentialpat-ternsintheprocedureofdestructingthemobilesequencetree.Eachlargesequentialpatternisgeneratedwhenitssup-portexceedstheminimumsupport.Forexample,onecande-structthemobilesequencetreeinFig.9todetermineS4inFig.8.

286IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

Fig.9.Datastructureofamobilesequentialtreeforstoringcandidate4-sequentialpatternsinalgorithmTJLS.

Fig.10.ProcedureofcountingsupportofL-transaction󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇P;t7󰀈}:ABCDEFGLP󰀈inSID100.

set

B.AlgorithmTJPT(TransactionsetJoinWithPathTrimming)Withoutexploitingthepathsoflargesequentialpat-terns,algorithmTJLStendstocountthesupportsofalotofout-of-pathsequentialpatterns(i.e.,thesequentialpatternsthatdonotstaywithinthepath),thusdegradingtheperformance.Inlightoftheconceptofpathtrimming,algorithmTJPTisdesignedbytakingboththeL-transactionsetsandpathsoflargesequentialpatternsintoconsiderationtogeneratecandidatesequentialpatterns.Explicitly,duringthegenerationoflargesequentialpatterns,bydestructingthemo-bilesequencetree,TJPTnotonlydetermineslargesequentialpatternsbutalsomaintainsabufferthatcontainstheleafnodesinthetransactioncomponentandthecorrespondingpathsinthepathcomponentsoastoclassifythepatterns.Thepurposeofclassifyingthepatternsisthatthepatterns,whosepathsdonot

containeachother,neednotbeconsideredtogeneratecandidatesequentialpatternstogether.Thus,TJPTcantrimthegenerationofcandidatesequentialpatternsaccordingtothepaths.Thisisreferredtoasthepathtrimmingtechnique.Asaresult,TJPTuti-lizeslargesequentialpatternstogeneratecandidatesequentialpatternsinthecandidategenerationforsolvingtheout-of-pathsequentialpatternprobleminTJLSmentionedabove.

Example3.3:FortheexampleshowninFig.8,inSID300,whenthesupportforcandidate4-L-transactionset{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈}isbeingcounted,thecorrespondingpath󰀇AWBCEFG󰀈willbegeneratedinthepathcomponentofmobilesequencetreetoaccountforonesupportcountfor󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈}:AWBCEFG󰀈.Notethat󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈}:AWBCEFG󰀈hasfoursubpatternsincluding󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈}:AWBCEF󰀈,󰀈{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇G;t5󰀈}:AWBCEFG󰀈,󰀇{󰀇A;t1󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈}:AWBCEFG󰀈,and󰀇{󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈}:CEFG󰀈.However,allofthemarenotlarge3-sequentialpatterns.Instead,theyareout-of-path3-sequentialpatternsinround3inthesensethatnotalloftheirsubpatternsarelarge2-sequentialpatterns.Explicitly,only󰀇{󰀇F;t4󰀈,󰀇G;t5󰀈}:FG󰀈isalarge2-sequentialpatterninthiscase.However,algorithmTJLSstillcountsthesupportsoftheminround3.InalgorithmTJLS,out-of-pathsequentialpatternswillbegeneratedineachroundifthecandidateL-transactionsetsarecontainedintheL-transactionsetsofmaximalL-transactionsequencesinDT.Forexample,theout-of-pathsequentialpatternsgeneratedby

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT287

C.AlgorithmTJPF(TransactionsetJoinWithPatternFamily)AlgorithmTJPFissimilartoalgorithmTJPTinthatitem-ploystheconceptofutilizinglargesequentialpatternsforgener-atingcandidatesequentialpatternstoreducethecomputationaloverheadcausedbyout-of-pathsequentialpatternsbutisdif-ferentfromthelatterinthatalgorithmTJPFbyutilizingtheinformationinpatternsandisabletoreducethenumberofuncertaincandidatesequentialpatternsandstorecandidatese-quentialpatternswithacompactapproach,thusfurtherreducingthecorrespondingoverhead.RecallthatalgorithmTJPTutilizespathtrimmingtechniqueforthegenerationofcandidatesequen-tialpatternsbycomparingthepathsofitssubpatternstoidentifyifonepathcontainsanother.Example3.6:FortheexampleinFig.12,algorithmTJPTgeneratesthecandidate5-sequentialpatterninFig.12(a)byjoiningL-transactionset{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈}insubpattern󰀇1󰀈andL-transactionset{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇Q;t6󰀈}insubpat-tern󰀇2󰀈withthepathtrimmingtechniquetoidentifythatpath󰀇ABCDEFGHQ󰀈containspath󰀇ABCDEFG󰀈.

ComparingthepathsofsubpatternsincursO(|P|)computa-tion,where|P|istheaveragepathlengthoflargesequentialpatterns.Inaddition,algorithmTJPTisrequiredtostorethesamepathsasthebranchesindifferentsubtreesofthetrans-actioncomponentinthemobilesequencetree,whichincursanexcessiveuseofmemory.Notethatevenbytreatingthecellsasotheritemsinthepatterns,modifiedalgorithmGSPstillneedstocomparethewholecellsinthepathandincursO(|P|)computation.Hence,algorithmTJPFsurpassesalgorithmTJPTandthemodifiedGSPinthatwiththepatternfamilytechnique,TJPFisabletogenerateamorecompacttreetostorethepatternstominimizethecorrespondingoverhead.1)RemarksofAlgorithmTJPF:AlgorithmTJPFisdevisedinlightofthepatternfamilytechnique.Tofacilitateourdescrip-tionofalgorithmTJPF,sometheoreticalpropertiesofpatternfamilyaredevisedbelow.

Definition1:Amaximalsequentialpatternisalargesequen-tialpatternthatisnotcontainedinanyotherlargesequentialpattern.Foreachmaximalsequentialpattern,itspatternfamilyconsistsofthepatternitselfandallitssubpatternsgeneratedineachround.

Example3.7:FortheexampleshowninFig.8,oneofthemaximalsequentialpatternsis󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈whichisalsoalarge5-sequentialpattern.ThecorrespondingpatternfamilyisshowninFig.13.

Definition2:ForapatternfamilywhosemaximalsequentialpatternisskwhichconsistsofL-transactions{x1,x2,...,xk}andpath󰀇m1m2...mq󰀈,amaximal-pathlarge2-sequentialpattern(abbreviatedlyasMS2),󰀈{x1,xk}:m1m2...mq󰀈,isalarge2-sequentialpatternwhichhasthesamepathasthemaximalsequentialpatternofthispatternfamily.

Example3.8:Foreachpatternfamily,itisnotedthatMS2isthelarge2-sequentialsubpatternwiththemaximalpath.FortheexampleshowninFig.13,patternsmarkedgrayarethepatternswithMS2=󰀇{󰀇A;t1󰀈,󰀈Q;t6󰀈}:ABCDEFGHQ󰀈,whichisthe

Fig.11.Examplefordescribingtheout-of-pathsequentialpatternprobleminSID300causedbyalgorithmTJLS.

SID300areshowninFig.11.Suchanout-of-pathsequentialpatternproblemwillhappeninroundkfork>2.Thisinturnimpliesthatonecantrimthesupportcountingoftheredundantsequentialpatternsaccordingtothepathstraversed.

RecallthatSkrepresentsthesetoflargek-sequentialpatternsandCkisthesetofcandidatek-sequentialpatterns.Inthecan-didategenerationphase,TJPTconstructsboththetransactionandpathcomponentsofmobilesequentialtreeforstoringCk.Inthecandidategeneration,TJPTjoinstheL-transactionsetsoflarge(k−1)-sequentialpatternsforthegenerationofcandidatek-L-transactionsetandcomparesthepathsoflarge(k−1)-sequentialpatterns.Ifonepathdoesnotcontaintheotherpath,thegeneratedcandidatek-L-transactionsetistrimmed.Ifonepathpcontainstheotherpathq,TJPTgeneratesacandidatek-sequentialpatternsconsistingofthecandidatek-L-transactionsetandp.

Example3.4:ConsidertheexamplescenarioshowninFig.12.InalgorithmTJPT,thecandidate5-sequentialpattern󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈inFig.12(a)isgeneratedbyjoiningL-transactionset{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈}insubpattern󰀇1󰀈andL-transactionset{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇Q;t6󰀈}insubpattern󰀇2󰀈withthepathtrimmingtechniquetoidentifythefactthatpath󰀇ABCDEFGHQ󰀈containspath󰀇ABCDEFG󰀈.Finally,󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈isqualifiedasacandidate5-sequentialpatternafterTJPTidentifiesthattheothersubpatterns(i.e.,subpatterns󰀇3󰀈,󰀇4󰀈,and󰀇5󰀈)arelarge4-sequentialpatternsinFig.12(b).

Byclassifyingthelargek-sequentialpatterns,TJPTcanefficientlygeneratecandidatek-sequentialpatterns.Partic-ularly,byclassifyingthepatternsinSkfork≥2,TJPTwillnotgenerateanyout-of-path(k+1)-sequentialpattern.ThisdemonstratestheveryadvantageofthepathtrimmingtechniqueTJPTemploys.

Example3.5:FortheexampleshowninFig.9,TJPTgen-eratesthecompletemobilesequencetreebyhashingnotonlyL-transactionsetsbutalsopathsincandidategenerationsothatthepath󰀇AWBCEFG󰀈willnotbecountedforthesupport.Notethatsuchout-of-pathsequentialpatternsastheoneshowninFig.11willnotoccuranymore,showingasignificantperformanceimprovementofTJPToverTJLS.

288IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

Fig.12.Candidate5-sequentialpatternshownin(a)isgeneratedbyidentifyingtheexistenceofitsfivelarge4-sequentialsubpatternsshownin(b).

Fig.13.Onepatternfamilyexample.Patternsmarkedgrayarethepatternshavingthemaximal-pathlarge2-sequentialpattern.

large2-sequentialpatternwithpathlengthequalto9,largerthanthoseofotherlarge2-sequentialpatterns.

Definition3:Supposesk,k≥3,isthemaximalsequen-tialpatternofapatternfamily,andskconsistsofL-transactions{x1,x2,...,xk}andpath󰀇m1m2...mq󰀈.Thecentro-subtransactionsetofapatterninthispatternfamilyis{x2,...,xk−1}.

Example3.9:Forexample,thelarge4-sequentialpat-ABCDEFGHQ󰀈tern󰀇{󰀇A;t1󰀈,󰀈C;t3󰀈,󰀇F;t4󰀈,󰀇Q;t6󰀈}:

canbeviewedastwoparts,i.e.,MS2=󰀇{󰀇A;t1󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈andthecentro-subtransactionsetbeing{󰀇C;t3󰀈,󰀇F;t4󰀈}.Notethatalargek-sequentialpatternisapatternconsistingofkL-transactionsandapath.Foreachlargek-sequentialpattern,allitsk(k−1)-sequentialsubpatternsarelarge.Explicitly,foralargek-sequentialpatternwithL-transaction{x1,x2,...,xk},theL-transactionsofitsk(k−1)-sequentialsubpatternscanberepresentedby{x2,x3,...,xk},{x1,x3,...,xk},...,and{x1,x2,...,xk−1}.Then,wehavethefollowingremarks.Remark1:Foralargek-sequentialpatternpk,k≥3,thereexistatleastk−2large(k−1)-subpatternswhosepathsareidenticaltothatofpk.

Example3.10:Forthelarge5-sequentialpattern󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈showninFig.13(a),thereexist3large4-subpatterns,

󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈,󰀇{󰀇A;t1󰀈,

󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇G;t5󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈,and󰀇{󰀇A;t1󰀈,󰀇F;t4󰀈,󰀇G;t5󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈,showninFig.13(b),whosepathsarethesamewiththeoneofthelarge5-sequentialpattern.

Remark2:Notethatamaximalsequentialpatternisalsoalargesequentialpattern.Thus,foramaximalsequen-tialpatternskwithL-transactions{x1,x2,...,xk}andpath󰀇m1m2...mq󰀈,thereexistk−2large(k−1)-sequentialsub-patternswhichhaveidenticalmaximal-pathlarge2-sequentialpattern󰀇{x1,xk}:m1m2...mq󰀈.

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT289

Fig.15.AlgorithmTJPFjoinsthecentro-subtransactionsetswithintegercomparison.

Fig.14.AlgorithmTJPFhashesthelastL-transactionfirstsothatthecentro-subtransactionsetisidentifiedandcomparestheindividualintegersstoredinshared-pathtree.

2)AlgorithmTJPFUsingShared-PathTreeinCan-didateGeneration:AlgorithmTJPFisabletogenerateamaximalsequentialpatternskwithL-transactions{x1,x2,...,xk}andpath󰀇m1m2...mq󰀈asacandidatesequentialpatternbyjoiningthepreviouslargesequentialpatterns󰀇{x1,(x2,x3,...,xk−3,xk−2),xk}:m1,m2,...,mq󰀈and󰀇{x1,(x2,x3,...,xk−3,xk−1),xk}:m1,m2,...,mq󰀈withthepatternfamilytechnique.Explicitly,TJPFobtains:1){x2,x3,...,xk−3,xk−2,xk−1}byjoiningthecentro-subtransactionsets{x2,x3,...,xk−3,xk−2}and{x2,x3,...,xk−3,xk−1}and2)thenewMS2=󰀇{x1,xk}:m1,m2,...,mq󰀈bycomparingtheMS2’sinthepreviouslargesequentialpatterns.ByhashingthelastL-transactionfirstandstoringanintegerforeachpath,TJPFconstructsthemobilesequencetreewithaformthatforeachcandidatesequentialpatterns,twoL-transactionsofMS2comefirst,acentro-subtransactionsetisinthemiddle,andaninteger,whichisreturnedfromshared-pathtreeforindexingthecorrespondingpath,isintheleaf.

Example3.11:Forexample,TJPFstoresthecan-didatesequentialpatterns󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇F;t4󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈and󰀇{󰀇A;t1󰀈,󰀇C;t3󰀈,󰀇G;t5󰀈,󰀇Q;t6󰀈}:ABCDEFGHQ󰀈inFig.12(b)intothemobilese-quencetreeasinFig.14(a).TJPFhashesthelastL-transaction,󰀇Q;t6󰀈,inthefirstpositionofthemobilesequencetree.Notethatpath󰀇ABCDEFGHQ󰀈isrepresentedbytheinteger󰀇p3󰀈,derivedfromthemappingoftheshared-pathtreeinFig.14(b).Then,TJPFcanjointheL-transactionsetsofthesetwolarge4-sequentialpatterns,i.e.,pattern󰀇2󰀈and󰀇3󰀈showninFig.15(b),withabuffertokeepablockthatcontainstheleafnodesinthetransactioncomponentandthecorrespondingintegersinthepathcomponenttoclassifythepatternsforgeneratingthecandidate5-sequentialpatterninFig.12(a)efficiently.

FromRemark2,weknowthatTJPFjoinsSkforgeneratingCk+1withthepatternfamilytechnique,andthepathsofalllarge

sequentialpatternstobejoinedareidenticaltooneanotherinroundk,k≥3.Thus,TJPFjoinsthecentro-subtransactionsetswithcomparingindividualintegersforachievingperformanceimprovement.

Example3.12:FortheexampleshowninFig.15,TJPFjoinsthecentro-subtransactionsetsinpattern󰀇4󰀈and󰀇5󰀈showninFig.15(c)withcomparingintegerswhichareequaltoeachother(i.e.,p3)togeneratepattern󰀇2󰀈inFig.15(b).Similarly,TJPFjoinsthecentro-subtransactionsetsinpattern󰀇2󰀈and󰀇3󰀈showninFig.15(b)bycomparingintegerstogeneratepattern󰀇1󰀈inFig.15(b).

ThemethodforalgorithmTJPFtoreducecomputationaloverheadandmemoryconsumptionisasfollows.Inthefirstround,algorithmTJPFalsojoinstheL-transactionsinS1forgeneratingR2tobestoredinthetransactioncomponentofamo-bilesequencetree.However,inthesecondround,TJPFhasheseachcombinationof2-L-transactionssetineachmaximalL-transactionsequenceintothetransactioncomponentbyhashingthelastL-transactionfirst.Then,TJPFhashesthecorrespond-ingpathintotheshared-pathtreewhichhasanassignedintegerineachleafnodeforrepresentingthepathfromtherootnodetotheparentnodeofthatleafnode.TJPFnextreturnstheinte-gerforconstructingthepathcomponentofthemobilesequencetreewhilekeepingcountingthesupport.Afterthecandidate2-sequentialpatternswiththeminimumsupportareidentifiedasthelarge2-sequentialpatterns,algorithmTJPFjoinstheL-transactionsetswiththepathtrimmingtechniquetogeneratecandidate3-sequentialpatterns.Notethattheshared-pathtreeconstructedinroundtwowillbeusedforthemappingbetweenpathsandintegersinthefollowingrounds.

Example3.13:Forexample,theshared-pathtreeshowninFig.14(b)mapsthepathsoflarge4-sequentialpatternsinFig.12(b)intointegers{p1,p2,p3,p4,p5}.

Inthethirdround,TJPFcountsthesupportsofcandidate3-sequentialpatternsbyhashingintothemobilesequencetreetheL-transactionsetsandintegersreturnedfromtheshared-pathtree.Indestructingthemobilesequencetree,TJPFnotonlydetermineslarge3-sequentialpatternsbutalsousesabuffer

290IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

Fig.16.UncertaincandidatetransactionpatternsgeneratedbyTJTJPTand

PF.

tokeepablockthatcontainstheleafnodesinthetransactioncomponentandthecorrespondingintegersinthepathcompo-nenttoclassifythepatterns.ByhashingthelastL-transactionfirst,themobilesequencetreeiswell-structuredandTJPFcom-parestheindividualintegerstotrimthegenerationofcandidatesequentialpatternsaccordingtothepaths.Ineachsubsequentround,TJPFconstructsthemobilesequencetreebyhashingthelastL-transactionfirstandutilizingtheshared-pathtreeformappingsothatTJPFcomparestheindividualintegersintrimmingthegenerationofcandidatesequentialpatterns.Al-gorithmTJPFthushasO(1)executiontimecomplexityinthisstep,betterthanO(|P|)byalgorithmTJPT,where|P|istheaveragepathlengthoflargesequentialpatterns.Inaddition,un-likealgorithmTJPT,algorithmTJPFutilizesMS2tofilteroutsomeuncertaincandidatesequentialpatternsbeforesubpatternidentification.Foracandidatek-sequentialpatterns,itshouldhaveklarge(k−1)-sequentialpatterns.Foranuncertaincan-didatek-sequentialpatterns,itisgeneratedbyjoiningtwolarge(k−1)-sequentialpatterns.Thus,forprovingthatanuncer-taincandidatek-sequentialpatternsisqualifiedasacandidatek-sequentialpatterns,therearek−2subpatternidentificationstheneedtobeconducted.

Example3.14:Forexample,takingthelarge4-sequentialpatternsshowninFig.8,TJPTgeneratesfouruncertaincan-didate5-sequentialpatternsshowninFig.16(a).However,byutilizingthepatternfamilytechnique,TJPFonlygeneratesoneuncertaincandidate5-sequentialpatternshowninFig.16(b).Thus,TJPTconducts12subpatternidentificationsandTJPFconductsthreesubpatternidentifications.ThisdemonstratestheveryadvantageofthepatternfamilytechniqueTJPFemploys.IV.EXPERIMENTALRESULTS

ToassesstheperformanceofTJLS,TJPT,andTJPF,weconductedseveralexperimentstodeterminelargesequentialpatterns.Theseexperimentsareperformedonacomputerwitha1-GHzIntelCPUand512MBofmemory.ThemethodusedtogeneratesyntheticdataisdescribedinSectionIV-A.InSectionIV-B,performanceofTJLS,TJPT,andTJPFiscomparativelystudied.

A.GenerationofSyntheticMobileTransactionSequencesIntheexperiments,themovingscenariowithtransactionsmadeinamobilecommerceenvironmentissimulated.Sincethemobilecommerceserviceisanewapplicationinthenearfuture,webelievethatthecustomershavethesimilarbehaviorstothoseoftheminthecurrentdatanetworkwhentheyfirstusethisservice.Afterthisserviceisusedbycustomers,thebehav-

Fig.17.Parametersusedinthesimulation.

Fig.18.Meshnetworktosimulatemobilecommerceenvironment.

iorswillthenbechangedaccordingtotheirusageexperiences.Currently,thereisnorealscenariothatwecanmimic.Thus,inthispaper,thesimulationmodelforgeneratingsyntheticmobiletransactionsequencesisinfactsimilartothatinthecompanionpapers[43],[49].Explicitly,themethodforgeneratingmovingpatternsissimilartothatin[49]andthemethodforgeneratingtransactionsissimilartothatin[43].

Fig.17summarizesthemeaningsofvariousparametersusedintheexperiments.First,weconstructann×nmeshnetwork[40]withamodificationbytakingthegeographicboundaryintoconsiderationtolimitthenumberofneighborssoastomimicthemobileenvironment,whereeachnoderepresentsonecell[49].Thenumberofitemsineachcellisdeterminedfromauniformdistributionwithinagivenrange,denotedbynI.Foreachcell,theadvancingprobabilityPaofeachneighboristheprobabilityforacustomertomovetoneighboringcellstopurchasetheitemssoldthere.Inessence,eachdirectededgefromonecellAtoanothercellBisassignedwithaweight,correspondingtotheadvancingprobabilityofBforA.Inthemodel,theadvancingprobabilityisobtainedbytheratioofthenumberofitemssoldineachneighbortothosenum-bersofotherneighbors.Forthe3×3meshnetworkexampleshowninFig.18(a),therearefourneighbors󰀇N1,N2,N3,N4󰀈forcellYwiththecorrespondingadvancingprobabili-ties󰀇Pa1,Pa2,Pa3,Pa4󰀈.Inaddition,󰀇iN1,iN2,iN3,iN4󰀈arethenumbersofitemssoldincells󰀇N1,N2,N3,N4󰀈andwehavePa1=(iN1)/(iN1+iN2+iN3+iN4)andPa2=(iN2)/(iN1+iN2+iN3+iN4).

Intheexperiments,|D|isthenumberofmobiletransactionsequencesgenerated.Whenacustomermovesamongcellsfor

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT291

Fig.19.(a)ExecutiontimeofalgorithmsTJLS,TJPT,andTJPFwhenminimumsupportvariesand(b)executiontimeofalgorithmsTJLS,TJPT,andTJPFwhennumberofmobiletransactionsequencesvaries.

shoppingintheMCenvironment,themobiletransactionse-quencecompletedbythiscustomerconsistsofamovingpathandasetoftransactionsmadeinthecorrespondingcells.Thestartingpositionofeachmobilesequentialpatterncanbeeithervistorlocationregister(VLR)orhomelocationregister(HLR)andisrandomlyselectedamongthesecells[34].Amovingpathconsistsofcellsmovedbyauser.ThesizeofeachmovingpathisdeterminedfromaPoissondistributionwithmeanequalto|P|.Whenacustomermovestoacell,theprobabilitythatthiscustomermakesthetransactioninthiscellisdenotedbyPb.NotethatthenumberofitemsineachcellisdeterminedfromauniformdistributionwithinagivenrangenI.Foreachcell,oncethenumberofitemsisdetermined,theitemsthatcouldbepurchasedineachcellarefixed.Themethodforgeneratingtransactiondataineachcellissimilartotheoneinthepriorwork[43].Inthemobilecommerceenvironment,peopletendtobuysetsofitemstogether,whicharealsocalledpotentialmaximalfrequentsets.Thesizeofthemaximalelementsisclusteredaroundameanwithafewlongitemsets.Atransactionmaycontainoneormoreofsuchfrequentsets.Thetransactionsizeisalsoclusteredaroundamean,whichisdenoted|T|.Theprobabilitythatauserwillmovefromthecurrentcellbacktothecellfromwhichhe/shecame,calledthebackwardweight,isdenotedbyP0,whichisequaltoPa×Pd,wherePdisadamp-ingfactorbecauseofthebackwardmovement.Withoutlossofgenerality,Pdissetto0.8inourexperiments.TheprobabilityofmovingtoeachneighborPmisalsodeterminedbythead-vancingprobabilityandthesumoftheweightsforallthesecellsisequalto1−P0.ForthemeshnetworkshowninFig.18(a),whenoneuservisitscellYfromcellN1,theprobabilitiesoftheneighborsthatthisuserwillmovetoareshowninFig.18(b).B.PerformanceComparison

Inthefollowingexperiments,weconstructan8×8meshnetworkandset|D|=200K,s=0.5%,nI=200,Pb=0.5,Pd=0.8,|T|=4,and|P|=20.

1)ExperimentOne:WhentheMinimumSupportVaries:Inthisexperiment,svariesfrom1.5%to0.25%.Fig.19(a)showsthatTJPTandTJPFingeneral,outperformTJLSforvariousminimumsupports.Withthepathtrimmingandthepatternfamilytechniques,bothTJPTandTJPFcangeneratefewercandidatesequentialpatternsthanTJLS,whichsuffersalotofout-of-pathsequentialpatternsineveryround.Astheminimumsupportdecreases,theexecutiontimesofallthealgorithmsincreasebecauseoftheincreasesinthetotalnumberofcandidateandlargesequentialpatterns.

2)ExperimentTwo:WhentheNumberofMobileTransactionSequencesVaries:Inthisexperiment,|D|variesfrom200to1000K.Fig.19(b)showsthattheexecutiontimesofTJPTandTJPFincreaselinearlyasthedatabasesizeincreases,indicatingthegoodscale-upfeatureofTJPTandTJPF.

3)ExperimentThree:WhenPurchasingProbabilityVaries:NotethatalgorithmTJLSsufferstheout-of-pathsequentialpat-ternproblem.Toaddressthisproblem,weconductthisex-perimentwiththepurchaseprobabilityPbvaryingfrom0.5to0.3,andtheresultisshowninFig.20(a).Foreachal-gorithm,itsexecutiontimeistakenasthebasepointwhenPbis0.5,andFig.20(a)showstheexecutiontimewhenPbvaries.Whenthepurchaseprobabilitydecreases,theexecutiontimesofallthealgorithmsdecreasebecauseofthedecreasesinthetotalnumberofcandidateandlargesequentialpatterns.However,thepathlengthsoftheout-of-pathsequentialpat-ternsincreasebecausetheaveragenumberofcellsvisitedpertransactionincreases.Notethatalthoughthetotalnumberofcandidateandlargesequentialpatternsdecreases,theout-of-pathsequentialpatternproblemcausesalgorithmTJLStostillcountthesupportsofnonlargesequentialpatterns.Asare-sult,whenPbdecreases,thedecreaseoftheexecutiontimeofTJLSisnotasprominentasthoseofTJPTandTJPF.Toprovidemoreinsightintotheperformancecomparisonsofalgo-rithms,itisshowninFig.21thatTJPTandTJPFoutperformTJLSindifferentdatabasesizes,whichindicatesthatTJPTandTJPFarerobustinthesensitivityanalysisofthepurchasingprobability.

4)ExperimentFour:WhentheAveragePathLengthVaries:Toexaminethesensitivityofvaryingtheaveragepathlength,|P|variesfrom10to30.TheresultisshowninFig.20(b).Foreachalgorithm,itsexecutiontimeistakenasthebasepointwhen|P|is10,andFig.20(b)showstheexecutiontimewhen|P|varies.ItcanbeseenthatTJPFislesssensitivetothevariationofpathlengththanTJPT.ThisagreeswiththefactthatTJPFhasO(1)executiontimeforcomparingthepathinthecandidategener-ationstage,whereasthecorrespondingcomplexityofTJPTis

292IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

Fig.20.(a)ExecutiontimeofalgorithmsTJLS,TJPT,andTJPFwhenpurchasingprobabilityvariesand(b)executiontimeofalgorithmsTJPTandTJPFwhenaveragepathlengthofmobiletransactionsequencesvaries.

Fig.21.ExecutiontimeofalgorithmsTJLS,TJPT,andTJPFwhenpurchasingprobabilityvariesindifferentdatabasesizes.

O(|P|).Inaddition,itisalsoshowninFig.22thatTJPFout-performsTJPTinthesensitivityanalysisoftheaveragepathlengthwithdifferentdatabasesizes.ToprovidemoreinsightintothecandidategenerationstageofTJPTandTJPF,itisshowninFig.23thattheratio(TJPT)/(TJPF)ofexecutiontimewhichisincurredbycomparingthepathisalmostequalto(O(|P|))/(O(1)).

5)ExperimentFive:PerformanceComparisonBetweenTJPTandTJPFinEachRound:Toprovidemoreinsightsintothesharedpathtreefeatureexploitedbypatternfamilytechnique,weset|D|=200000,s=0.5%,nI=200,Pb=0.5,Pd=0.8,|T|=4,and|P|=20andcomparetheperfor-manceofTJPTandTJPFineachround.BecauseS1isob-tainedinthelarge-transactiongenerationphase,wethususeroundonetorefertotheprocedureperformedtoobtain(R2)anduseroundtwotorefertotheprocedureperformedtoobtain(C2,S2,C3).NotethatTJPTandTJPFgenerateSkalongwiththegenerationofCk+1,weuseroundk,k≥3torefertotheprocedureperformedtoobtain(Sk,Ck+1).AsshowninFig.24(a),TJPFconsistentlyoutperformsTJPTinallrounds,exceptroundone.Thisagreeswithourintuition.Notethatinroundone,withoutanypathinformation,bothTJPTandTJPF

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT293

Fig.22.ExecutiontimeofalgorithmsTJPTandTJPFwhenaveragepathlengthofmobiletransactionsequencesvariesindifferentdatabasesizes.

Fig.23.Ratioofexecutiontimewhichisincurredbycomparingthepath.

jointheL-transactionsinS1forgeneratingR2tobestoredinthetransactioncomponentofamobilesequencetree.Inroundtwo,TJPTconstructsthepathcomponentofthemobilesequentialtreeforstoringC2.Inthefollowingrounds,whenTJPTstoresthepathinformationofCk,k≥3,TJPTstillneedstoconstructthepathcomponentofthemobilesequentialtreeforstoringCk.However,byutilizingthepatternfamilyrela-tionship,TJPFcanusetheshared-pathtreegeneratedinC2forindexingthepathinformationofCk,k≥3,inthefollow-ingrounds,leadingtomoreefficientexecution.Inaddition,toprovidemoreinsightsintoTJPFandTJPT,thenumbersofbranchesforstoringthepathinformationofCkareshowninFig.24(b).InTJPT,thesebranchesarestoredinthemobilese-quentialtreesforallrounds.InTJPF,thesebranchesarestored

Fig.24.PerformancecomparisonbetweenTJPTandTJPFineachround.(a)Executiontimeand(b)thenumberofpathsstored.

intheshared-pathtreegeneratedinroundtwo,andthus,theamountofmemorysavingsis25.8MB.

V.CONCLUSION

Inthispaper,weexploredadataminingcapabilitywhichinvolvesminingmobilesequentialpatternsforanMCenvi-ronment.Inessence,theminingofmobilesequentialpatternsaggregatestheconceptsofminingassociationrules(miningpathtraversalpatternsandminingsequentialpatterns)andthus

294IEEETRANSACTIONSONSYSTEMS,MAN,ANDCYBERNETICS—PARTC:APPLICATIONSANDREVIEWS,VOL.37,NO.2,MARCH2007

requiresacombineduseofcorrespondingtechniques.Byhav-ingdifferentprioritiesonthefactorsinvolvinglargeitemsets,traversalpaths,andordersofpurchases,wehavedevisedthreealgorithms(algorithmTJLS,algorithmTJPT,andalgorithmTJPF)fordetermininglargesequentialpatternsfrommobiletransactionsequences.TJLSisdevisedinlightoftheconceptofassociationrules,andTJPTisdevisedbytakingboththeconceptsofassociationrulesandpathtraversalpatternsintoconsideration.Byutilizingthepatternfamilytechnique,TJPFisabletogeneratelargesequentialpatternsveryefficiently.AsimulationmodelfortheMCenvironmentwasdeveloped,andasyntheticworkloadwasgeneratedforperformancestud-ies.Inourperformancestudy,theproposedalgorithmTJPFsignificantlyoutperformsothersinbothexecutionefficiencyandmemorysaving,indicatingtheusefulnessofthepatternfamilytechnique.Itisshownbyourresultsthatbytakingbothmovingpatternsandpurchasepatternsintoconsideration,onecanhaveabettermodelforanMCsystemandthusbeabletoexploittheintrinsicrelationshipbetweenthesetwoimportantcustomerbehaviorsfortheefficientminingofmobilesequentialpatterns.

REFERENCES

[1][Online].Available:http://www.mobilecommerceworld.com

[2]WalletApplication[Online].Available:http://www.forum.nokia.com.[3]R.Agarwal,C.Aggarwal,andV.V.V.Prasad,“Atreeprojectionalgorithm

forgenerationoffrequentitemsets,”J.ParallelDistrib.Comput.,2000.[4]C.C.Aggarwal,J.L.Wolf,K.-L.Wu,andP.S.Yu,“Theintelligent

recommendationanalyzer,”inProc.ICDCSInt.WorkshopofKnowledgeDiscoveryandDataMiningintheWorld-WideWeb,Taipei,Taiwan,Apr.10–13,2000,pp.F67–F72.

[5]R.Agrawal,T.Imielinski,andA.Swami,“Miningassociationrulesbe-tweensetsofitemsinlargedatabases,”inProc.ACMSIGMODInt.Conf.ManagementofData,May1993,pp.207–216.

[6]R.AgrawalandR.Srikant,“Fastalgorithmsforminingassociationrules

inlargedatabases,”inProc.20thVLDBConf.,Sep.1994,pp.478–499.[7],“Miningsequentialpatterns,”inProc.11thInt.Conf.DataEngi-neering,Mar.1995,pp.3–14.

[8]J.M.AleandG.H.Rossi,“Anapproachtodiscoveringtemporalassoci-ationrules,”inProc.ACMSymp.AppliedComputing,Como,Italy,Mar.2000,pp.294–300.

[9]A.M.Ayad,N.M.El-Makky,andY.Taha,“Incrementalminingof

constraintedassociationrules,”inProc.1stSIAMConf.DataMining,Chicago,IL,Apr.2001.

[10]Y.Bastide,R.Taouil,N.Pasquier,G.Stumme,andL.Lakhal,“Miningfre-quentpatternswithcountinginference,”ACMSIGKDDExplor.Newslett.,vol.2,no.2,pp.66–75,Dec.2000.

[11]M.-S.Chen,J.Han,andP.S.Yu,“Datamining:Anoverviewfrom

adatabaseperspective,”IEEETrans.Knowl.DataEng.vol.8,no.6,pp.866–833,Dec.1996.

[12]M.-S.Chen,J.-S.Park,andP.S.Yu,“Efficientdataminingforpath

traversalpatterns,”IEEETrans.Knowl.DataEng.,vol.10,no.2,pp.209–221,Apr.1998.

[13]X.ChenandI.Petrounias,“Discoveringtemporalassociationrules:Al-gorithms,languageandsystem,”inProc.16thInt.Conf.DataEng.,SanDiego,CA,Feb.28–Mar.3,2000,p.306.

[14]D.Cheung,J.Han,V.Ng,andC.Y.Wong,“Maintenanceofdiscovered

associationrulesinlargedatabases:Anincrementalupdatingtechniques,”inProc.Int.Conf.DataEng.,Feb.1996,pp.106–114.

[15]F.Coenen,G.Goulbourne,andP.Leng,“Computingassociationrules

usingpartialtotals,”inProc.5thEur.Conf.PrinciplesofDataMiningandKnowledgeDiscovery,Freiburg,Germany,Sep.3–5,2001,pp.54–66.[16]R.Cooley,B.Mobasher,andJ.Srivastava,“Datapreparationformining

worldwidewebbrowsingpatterns,”J.Knowl.Inf.Syst.,vol.1,no.1,1999.

[17]G.Das,K.-I.Lin,H.Mannila,G.Renganathan,andP.Smyth,“Rule

discoveryfromtimeseries,”Proc.4thInt.Conf.KnowledgeDiscoveryandDataMining(KDD-98),Aug.1998.

[18]U.M.Fayyad,G.Piatetsky-Shapiro,P.Smyth,andR.Uthurasamy,Ad-vancesinKnowledgeDiscoveryandDataMining.Cambridge,MA:MITPress,1996.

[19]R.Floyd,B.Housel,andC.Tait,“Mobilewebaccessusingenetworkweb

express,”IEEEPers.Commun.,vol.5,no.5,pp.47–52,Oct.1998.

[20]BluetoothOverview,1999[Online].Available:http://www.bluetooth.com[21]WAPForumWirelessApplicationProtocol.[Online].Available:

http://www.wapforum.org/

[22]J.Han,G.Dong,andY.Yin,“Efficientminingofpartialperiodicpatterns

intimeseriesdatabase,”inProc.15thInt.Conf.DataEngineering,Mar.1999.

[23]J.HanandY.Fu,“Discoveryofmultiple-levelassociationrulesfromlarge

databases,”inProc.21thVLDBConf.,Sep.1995,pp.420–431.

[24]J.Han,W.Gong,andY.Yin,“Miningsegment-wiseperiodicpatternsin

time-relateddatabases,”inProc.4thInt.Conf.KnowledgeDiscoveryandDataMining(KDD-98),Aug.1998,pp.214–218.

[25]J.Han,L.V.S.Lakshmanan,andR.T.Ng,“Constraint-based,multidi-mensionaldatamining,”IEEEComput.,vol.32,no.8,pp.46–50,Aug.1999.

[26]J.Han,J.Pei,B.Mortazavi-Asl,Q.Chen,U.Dayal,andM.-C.Hsu,

“FreeSpan:Frequentpattern-projectedsequentialpatternmining,”inProc.2000Int.Conf.KnowledgeDiscoveryandDataMining,Aug.2000,pp.355–359.

[27]J.Han,J.Pei,andY.Yin,“Miningfrequentpatternswithoutcandidate

generation,”inProc.ACMSIGMODConf.,2000.

[28]L.V.S.Lakshmanan,R.Ng,J.Han,andA.Pang,“Optimization

ofconstrainedfrequentsetquerieswith2-variableconstraints,”inProc.ACM-SIGMODConf.ManagementofData,Jun.1999,pp.157–168.

[29]C.-H.Lee,C.-R.Lin,andM.-S.Chen,“Onmininggeneraltemporal

associationrulesinapublicationdatabase,”inProc.1stIEEEInt.Conf.DataMining(ICDM2001),Nov./Dec.2001.[30],“Sliding-windowfiltering:Anefficientalgorithmforincremental

mining,”inProc.ACM10thInt.Conf.Inf.andKnowledgeManagement,Nov.2001.

[31]C.-R.Lin,C.-H.Yun,andM.-S.Chen,“Usingslicescanandselective

hashforepisodemining,”inProc.WorkshoponTemporalDataMining(SIGKDD2001),Aug.2001.

[32]J.-L.LinandM.H.Dunham,“Miningassociationrules:Anti-skewalgo-rithms,”inProc.Int.Conf.DataEngineering,1998,pp.486–493.

[33]M.-Y.LinandS.-Y.Lee,“Incrementalupdateonsequentialpatternsin

largedatabases,”inProc.10thIEEEInt.Conf.ToolswithArtificialIntel-ligence,Nov.1998,pp.24–31.

[34]Y.-B.Lin,“ModelingtechniquesforPCSnetworks,”IEEECommun.

Mag.,vol.35,no.2,pp.102–107,Feb.1997.

[35]B.Liu,W.Hsu,andY.Ma,“Miningassociationruleswithmultiple

minimumsupports,”inProc.5thACMSIGKDDInt.Conf.KnowledgeDiscoveryandDataMining,Aug.1999.

[36]Ky.S.M.N.GarofalakisandR.Rastogi,“SPIRIT:Sequentialpattern

miningwithregularexpressionconstraints,”VLDBJ.,1999.

[37]H.MannilaandC.Meek,“Globalpartialordersfromsequentialdata,”

inProc.6thACMSIGKDDInt.Conf.KnowledgeDiscoveryandDataMining,Boston,MA,2000,pp.161–168.

[38]H.Mannila,H.Toivonen,andA.I.Verkamo,“Discoveryoffrequent

episodesineventsequences,”DataMiningKnowl.Discov.,vol.1,no.3,pp.259–289,1997.

[39]F.Masseglia,F.Cathala,andP.Poncelet,“ThePSPapproachformining

sequentialpatterns,”inProc.2ndEur.Conf.PrinciplesofDataMiningandKnowledgeDiscovery,Nantes,France,Sep.1998,vol.1510,pp.176–184.

[40]E.ModianoandA.Ephremides,“Efficientalgorithmsforperforming

packetbroadcastsinameshnetwork,”IEEE/ACMTrans.Networking,vol.4,no.4,pp.639–648,Aug.1996.

[41]R.Muntz,W.Wang,andJ.Yang,“TAR:Temporalassociationruleson

evolvingnumericalattributes,”inProc.17thInt.Conf.DataEngineering,2001.

[42]B.PadmanabhanandA.Tuzhilin,“Patterndiscoveryintemporal

databases:Atemporallogicapproach,”inProc.2ndInt.Conf.Knowl-edgeDiscoveryandDataMining,1996,pp.35–354.

[43]J.-S.Park,M.-S.Chen,andP.S.Yu,“Aneffectivehashbasedalgorithm

forminingassociationrules,”inProc.ACMSIGMODConf.,May1995,pp.175–186.

[44]S.Parthasarathy,M.J.Zaki,M.Ogihara,andS.Dwarkadas,“Incremental

andinteractivesequencemining,”inProc.ACMCIKMInt.Conf.Inf.andKnowledgeManagement,1999,pp.251–258.

YUNANDCHEN:MININGMOBILESEQUENTIALPATTERNSINAMOBILECOMMERCEENVIRONMENT295

[45]N.Pasquier,Y.Bastide,R.Taouil,andL.Lakhal,“Discoveringfrequent

closeditemsetsforassociationrules,”inProc.7thInt.Conf.DatabaseTheory,Jan.1999.

[46]J.PeiandJ.Han,“Canwepushmoreconstraintsintofrequentpattern

mining?”inProc.2000Int.Conf.KnowledgeDiscoveryandDataMining,Aug.2000.

[47]J.Pei,J.Han,andR.Mao,“CLOSET:Anefficientalgorithmformining

frequentcloseditemsets,”inProc.ACM-SIGMODWorkshoonRes.IssuesinDataMiningandKnowledgeDiscovery,2000,pp.21–30.

[48]J.Pei,J.Han,H.Pinto,Q.Chen,U.Dayal,andM.-C.Hsu,“PrefixSpan:

Miningsequentialpatternsefficientlybyprefix-projectedpatterngrowth,”inProc.17thInt.Conf.DataEng.,Apr.2001.

[49]W.-C.PengandM.-S.Chen,“Developingdataallocationschemesbyin-crementalminingofusermovingpatternsinamobilecomputingsystem,”IEEETrans.Knowl.DataEng.,vol.15,no.1,pp.70–85,Feb.2003.[50]P.PirolliandJ.E.Pitkow,“DistributionsofSurfers’pathsthroughthe

worldwideweb:Empiricalcharacterization,”WorldWideWeb,vol.2,no.1–2,pp.29–45,1999.

[51]J.E.PitkowandP.Pirolli,“Mininglongestrepeatedsubsequencesto

predictworldwidewebsurfing,”inProc.2ndUSENIXSymp.InternetTechnologiesandSyst.,Oct.1999.

[52]J.B.Schafer,J.Konstan,andJ.Riedl,“RecommendersystemsinE-commerce,”inProc.1stACMConf.ElectronicCommerce,Denver,CO,Nov.1999,pp.158–166.

[53]R.SrikantandR.Agrawal,“Mininggeneralizedassociationrules,”in

Proc.21thVLDBConf.,Sep.1995,pp.407–419.

[54]R.AgrawalandR.Srikant,“Miningquantitativeassociationrulesinlarge

relationaltables,”inProc.1996ACM-SIGMODConf.ManagementofData,1996,pp.1–12.

[55]R.SrikantandR.Agrawal,“Miningsequentialpatterns:Generalizations

andperformanceimprovements,”inProc.1996Int.Conf.ExtendingDatabaseTechnology(EDBT’96),Mar.1996,pp.201–212.

[56]A.K.H.Tung,H.Lu,J.Han,andL.Feng,“Breakingthebarrierof

transactions:Miningintertransactionassociationrules,”inProc.5thACMSIGKDDInt.Conf.KnowledgeDiscoveryandDataMining,Aug.1999,pp.297–301.

[57]U.Varshney,R.J.Vetter,andR.Kalakota,“Mobilecommerce:Anew

frontier,”IEEEComput.,vol.33,no.10,pp.32–38,Oct.2000.

[58]J.Veijalainene,“Transactionsinmobileelectroniccommerce,”inProc.

8thInt.WorkshoponFoundationsofModelsandLanguagesforDataandObjects,Sep.1999,pp.203–224.

[59]R.Villafane,K.A.Hua,D.Tran,andB.Maulik,“Knowledgediscovery

fromseriesofintervalevents,”J.Intell.Inf.Syst.,vol.15,no.1,pp.71–89,2000.

[60]K.Wang,Y.He,andJ.Han,“Miningfrequentitemsetsusingsupport

constraints,”inProc.26thVLDBConf.,Sep.2000,pp.43–52.

[61]K.Wang,S.Q.Zhou,andS.C.Liew,“Buildinghierarchicalclassifiers

usingclassproximity,”inProc.25thVLDBConf.,1999,pp.363–374.[62]C.Yang,U.Fayyad,andP.Bradley,“Efficientdiscoveryoferror-tolerant

frequentitemsetsinhighdimensions,”inProc.7thACMSIGKDDInt.Conf.KnowledgeDiscoveryandDataMining,SanFrancisco,CA,2001,pp.194–203.

[63]J.Yang,W.Wang,andP.S.Yu,“Miningasynchronousperiodicpatterns

intimeseriesdata,”inProc.6thACMSIGKDDInt.Conf.KnowledgeDiscoveryandDataMining,2000,pp.275–279.

[64]C.-H.YunandM.-S.Chen,“Usingpattern-joinandpurchase-combination

forminingwebtransactionpatternsinanelectroniccommerceenviron-ment,”inProc.24thIEEEAnnu.Int.ComputerSoftwareandApplicationConf.,Oct.2000,pp.99–104.

[65]M.J.Zaki,“Sequenceminingincategoricaldomains:Incorporatingcon-straints,”inProc.2000ACMCIKMInt.Conf.Inf.andKnowledgeMan-agement,Nov.2000.[66],“SPADE:Anefficientalgorithmforminingfrequentsequences,”

Mach.Learn.,vol.42,no.1/2,pp.31–60,2001.

[67]M.J.ZakiandC.Hsiao,“CHARM:Anefficientalgorithmforclosed

itemsetmining,”inProc.2ndSIAMConf.DataMining,Apr.2002.[Online].Available:http://www.siam.org/meetings/sdm02/proceedings/sdm02-27.pdf

[68]M.J.Zaki,S.Parthasarathy,M.Ogihara,andW.Li,“Newalgorithmsfor

fastdiscoveryofassociationrules,”inProc.3rdACMSIGKDDInt.Conf.KnowledgeDiscoveryandDataMining,Aug.1997,pp.283–286.

[69]Q.Zheng,K.Xu,S.Ma,andW.Lv,“Thealgorithmsofupdatingsequential

patterns,”inProc.5thInt.WorkshoponHighPerformanceDataMininginconjunctionwith2ndSIAMConf.DataMining,2002.[Online].Available:http://arxiv.org/abs/cs.DB/0203027

Ching-HuangYunreceivedtheB.S.andPh.D.de-greesinelectricalengineeringfromtheNationalTaiwanUniversity,Taipei,Taiwan,R.O.C.,in1997and2005,respectively.

HeiscurrentlyanR&DmanagerwithIBTekInc.,Taipei.Hisresearchinterestsincludedatamining,webtechnologies,andvideosurveillancesystems.

Ming-SyanChen(S’88–M’88–SM’93–F’04)re-ceivedtheB.S.degreeinelectricalengineeringfromtheNationalTaiwanUniversity,Taipei,Taiwan,R.O.C.,andtheM.S.andPh.D.degreesincomputer,information,andcontrolengineeringfromTheUni-versityofMichigan,AnnArbor,in1985and1988,respectively.

HeiscurrentlyaProfessorandtheChairmanoftheGraduateInstituteofCommunicationEngineer-ingandaProfessorwiththeElectricalEngineeringDepartmentandtheComputerScienceandInforma-tionEngineeringDepartment,NationalTaiwanUniversity.HewasaResearchStaffMemberattheIBMThomasJ.WatsonResearchCenter,YorktownHeights,NY,from1988to1996.Hisresearchinterestsincludedatabasesystems,datamining,mobilecomputingsystems,andmultimedianetworking,andhehaspublishedmorethan200papersinhisresearchareas.Heholds,orhasappliedfor,18U.S.patentsandsevenR.O.C.patentsintheareasofdatamining,Webapplications,interactivevideoplayout,videoserverdesign,andconcurrencyandcoherencycontrolprotocols.HeiscurrentlyontheeditorialboardoftheVeryLargeDataBase(VLDB)Journal,theKnowledgeandInformationSystems(KAIS)Journal,theJournalofInformationScienceandEngineering,andtheInternationalJournalofElectricalEngineering.

Dr.ChenservedasanAssociateEditorofIEEETRANSACTIONSONKNOWLEDGEANDDATAENGINEERINGfrom1997to2001.HewasaDistin-guishedVisitoroftheIEEEComputerSocietyforAsia-Pacificfrom1998to2000.HeservedastheProgramChairofPacificAreaKnowledgeDiscoveryandDataMining,InternationalViceChairofINFOCOM2005,andProgramVice-ChairofIEEEICDCS2005,ICPP2003,andVLDB-2002.HewasakeynotespeakeronWebdataminingattheInternationalComputerCongressinHongKong,1999,atutorialspeakeronWebdataminingatDASFAA-1999,andonparalleldatabasesatthe11thIEEEICDEin1995.HewasalsoaGuestCoeditorforIEEETKDEonaspecialissueondatamininginDecember1996.HeisarecipientoftheNationalScienceCouncilDistinguishedResearchAwardandK.-T.LiResearchPenetrationAwardforhisresearchwork.HealsoreceivedtheOutstandingInnovationAwardfromIBMCorporateforhiscontributiontoamajordatabaseproductandreceivednumerousawardsforhisresearch,teaching,inventions,andpatentapplications.HeisaMemberoftheACM.

因篇幅问题不能全部显示,请点此查看更多更全内容

Top