I.J.ModernEducationandComputerScience,2013,8,8-17PublishedOnlineOctober2013inMECS()DOI:10.5815/ijmecs.2013.08.02Copyright©2013MECSI.J.ModernEducationandComputerScience,2013,8,8-17UtilizationofDataMiningTechniquesforPredictionandDiagnosisofTuberculosisDiseaseSurvivabilityK.R.LakshmiDirector,IERDS,MaddurNagar,Kurnool,AndhraPradesh,IndiaEmail:krlakshmi_cse@yahoo.comM.VeeraKrishnaDepartmentofMathematics,RayalaseemaUniversity,Kurnool,AndhraPradesh,IndiaEmail:veerakrishna_maths@yahoo.comS.PremKumarProfessor&Head,DepartmentofCSE&IT,G.PullaiahcollegeofEngineering&Technology,NandikotkurRoad,Kurnool,AndhraPradesh,India.E-mail:mcahod@gpcet.ac.inAbstract—ThepredictionanddiagnosisofTuberculosissurvivabilityhasbeenachallengingresearchproblemformanyresearchers.Sincetheearlydatesoftherelatedresearch,muchadvancementhasbeenrecordedinseveralrelatedfields.Forinstance,thankstoinnovativebiomedicaltechnologies,betterexplanatoryprognosticfactorsarebeingmeasuredandrecorded;thankstolowcostcomputerhardwareandsoftwaretechnologies,highvolumebetterqualitydataisbeingcollectedandstoredautomatically;andfinallythankstobetteranalyticalmethods,thosevoluminousdataisbeingprocessedeffectivelyandefficiently.TuberculosisisoneoftheleadingdiseasesforallpeopleindevelopedcountriesincludingIndia.Itisthemostcommoncauseofdeathinhumanbeing.ThehighincidenceofTuberculosisinallpeoplehasincreasedsignificantlyinthelastyears.InthispaperwehavediscussedvariousdataminingapproachesthathavebeenutilizedforTuberculosisdiagnosisandprognosis.ThisstudypapersummarizesvariousreviewandtechnicalarticlesonTuberculosisdiagnosisandprognosisalsowefocusoncurrentresearchbeingcarriedoutusingthedataminingtechniquestoenhancetheTuberculosisdiagnosisandprognosis.Here,wetookadvantageofthoseavailabletechnologicaladvancementstodevelopthebestpredictionmodelforTuberculosissurvivability.IndexTerms—SVM,C4.5,k-NN,PLS-DA,Dataminingtechniques,TuberculosisandSpecificityI.IntroductionDataminingisabroadareathatintegratestechniquesfromseveralfieldsincludingmachinelearning,statistics,patternrecognition,artificialintelligence,anddatabasesystems,fortheanalysisoflargevolumesofdata.Therehavebeenalargenumberofdataminingalgorithmsrootedinthesefieldstoperformdifferentdataanalysistasks.Dataminingistheknowledgediscoveryprocesswhichhelpsinextractinginterestingpatternsfromlargeamountofdata.Withtheamountofdatadoublingeverythreeyears,dataminingisbecominganincreasinglyimportanttooltotransformthesedataintoinformation.Itiscommonlyusedinawiderangeofprofilingpractices,suchasmarketing,surveillance,andfrauddetection,medicalandscientificdiscovery.Humanshavebeenmanuallyextractingpatternsfromdataforcenturies,buttheincreasingvolumeofdatainmoderntimeshascalledformoreautomatedapproaches.Asdatasetshavegrowninsizeandcomplexity,directhands-ondataanalysishasincreasinglybeenaugmentedwithindirect,automaticdataprocessing.Thishasbeenaidedbyotherdiscoveriesincomputerscience,suchasneuralnetworks,clustering,geneticalgorithms,decisiontreesandsupportvectormachines.Dataminingistheprocessofapplyingthesemethodstodatawiththeintentionofuncoveringhiddenpatterns.Manyhospitalinformationsystemsaredesignedtosupportpatientbilling,inventorymanagementandgenerationofsimplestatistics.Somehospitalsusedecisionsupportsystems,buttheyarelargelylimited.Theycananswersimplequerieslike“Whatistheaverageageofpatientswhohaveheartdisease?”,“Howmanysurgerieshadresultedinhospitalstayslongerthan10days?”,“Identifythefemalepatientswhoaresingle,above30yearsold,andwhohavebeentreatedforcancer.”However,theycannotanswercomplexqueriesManuscriptreceivedFebuary13,2013;RevisedJuly25,2013;acceptedSeptember15,2013.Correspondingauthor:K.R.Lakshmi.UtilizationofDataMiningTechniquesforPredictionandDiagnosisofTuberculosisDiseaseSurvivability9Copyright©2013MECSI.J.ModernEducationandComputerScience,2013,8,8-17like“IdentifytheimportantPreoperativepredictorsthatincreasethelengthofhospitalstay”,“Givenpatientrecordsoncancer,shouldtreatmentincludechemotherapyalone,radiationalone,orbothchemotherapyandradiation?”,and“Givenpatientrecords,predicttheprobabilityofpatientsgettingaheartdisease”.Medicaldecisionsareoftenmadebasedondoctors‟intuitionandexperienceratherthanontheknowledge-richdatahiddeninthedatabase.Thispracticeleadstounwantedbiases,errorsandexcessivemedicalcostswhichaffectsthequalityofserviceprovidedtopatients.Wu,etalproposedthatintegrationofmedicaldecisionsupportwithcomputer-basedpatientrecordscouldreducemedicalerrors,enhancepatientsafety,decreaseunwantedpracticevariation,andimprovepatientoutcome.Thissuggestionispromisingasdatamodelingandanalysistools,e.g.,datamining,havethepotentialtogenerateaknowledge-richenvironmentwhichcanhelptosignificantlyimprovethequalityofmedicaldecisions.Amajorchallengefacinghealthcareorganizations(hospitals,medicalcenters)istheprovisionofqualityservicesataffordablecosts.Qualityserviceimpliesdiagnosingpatientscorrectlyandadministeringtreatmentsthatareeffective.Poorclinicaldecisionscanleadtodisastrousconsequenceswhicharethereforeunaccepta