Domain Driven KDD Methodology | |
Introduction to Domain Driven Data Mining | p. 3 |
Why Domain Driven Data Mining | p. 3 |
What Is Domain Driven Data Mining | p. 5 |
Basic Ideas | p. 5 |
D3M for Actionable Knowledge Discovery | p. 6 |
Open Issues and Prospects | p. 9 |
Conclusions | p. 9 |
References | p. 10 |
Post-processing Data Mining Models for Actionability | p. 11 |
Introduction | p. 11 |
Plan Mining for Class Transformation | p. 12 |
Overview of Plan Mining | p. 12 |
Problem Formulation | p. 14 |
From Association Rules to State Spaces | p. 14 |
Algorithm for Plan Mining | p. 17 |
Summary | p. 19 |
Extracting Actions from Decision Trees | p. 20 |
Overview | p. 20 |
Generating Actions from Decision Trees | p. 22 |
The Limited Resources Case | p. 23 |
Learning Relational Action Models from Frequent Action Sequences | p. 25 |
Overview | p. 25 |
ARMS Algorithm: From Association Rules to Actions | p. 26 |
Summary of ARMS | p. 28 |
Conclusions and Future Work | p. 29 |
References | p. 29 |
On Mining Maximal Pattern-Based Clusters | p. 31 |
Introduction | p. 32 |
Problem Definition and Related Work | p. 34 |
Pattern-Based Clustering | p. 34 |
Maximal Pattern-Based Clustering | p. 35 |
Related Work | p. 35 |
Algorithms MaPle and MaPle+ | p. 36 |
An Overview of MaPle | p. 37 |
Computing and Pruning MDS's | p. 38 |
Progressively Refining, Depth-first Search of Maximal pClusters | p. 40 |
MaPle+: Further Improvements | p. 44 |
Empirical Evaluation | p. 46 |
The Data Sets | p. 46 |
Results on Yeast Data Set | p. 47 |
Results on Synthetic Data Sets | p. 48 |
Conclusions | p. 50 |
References | p. 50 |
Role of Human Intelligence in Domain Driven Data Mining | p. 53 |
Introduction | p. 53 |
DDDM Tasks Requiring Human Intelligence | p. 54 |
Formulating Business Objectives | p. 54 |
Setting up Business Success Criteria | p. 55 |
Translating Business Objective to Data Mining Objectives | p. 56 |
Setting up of Data Mining Success Criteria | p. 56 |
Assessing Similarity Between Business Objectives of New and Past Projects | p. 57 |
Formulating Business, Legal and Financial Requirements | p. 57 |
Narrowing down Data and Creating Derived Attributes | p. 58 |
Estimating Cost of Data Collection, Implementation and Operating Costs | p. 58 |
Selection of Modeling Techniques | p. 59 |
Setting up Model Parameters | p. 59 |
Assessing Modeling Results | p. 59 |
Developing a Project Plan | p. 60 |
Directions for Future Research | p. 60 |
Summary | p. 61 |
References | p. 61 |
Ontology Mining for Personalized Search | p. 63 |
Introduction | p. 63 |
Related Work | p. 64 |
Architecture | p. 65 |
Background Definitions | p. 66 |
World Knowledge Ontology | p. 66 |
Local Instance Repository | p. 67 |
Specifying Knowledge in an Ontology | p. 68 |
Discovery of Useful Knowledge in LIRs | p. 70 |
Experiments | p. 71 |
Experiment Design | p. 71 |
Other Experiment Settings | p. 74 |
Results and Discussions | p. 75 |
Conclusions | p. 77 |
References | p. 77 |
Novel KDD Domains & Techniques | |
Data Mining Applications in Social Security | p. 81 |
Introduction and Background | p. 81 |
Case Study I: Discovering Debtor Demographic Patterns with Decision Tree and Association Rules | p. 83 |
Business Problem and Data | p. 83 |
Discovering Demographic Patterns of Debtors | p. 83 |
Case Study II: Sequential Pattern Mining to Find Activity Sequences of Debt Occurrence | p. 85 |
Impact-Targeted Activity Sequences | p. 86 |
Experimental Results | p. 87 |
Case Study III: Combining Association Rules from Heterogeneous Data Sources to Discover Repayment Patterns | p. 89 |
Business Problem and Data | p. 89 |
Mining Combined Association Rules | p. 89 |
Experimental Results | p. 90 |
Case Study IV: Using Clustering and Analysis of Variance to Verify the Effectiveness of a New Policy | p. 92 |
Clustering Declarations with Contour and Clustering | p. 92 |
Analysis of Variance | p. 94 |
Conclusions and Discussion | p. 94 |
References | p. 95 |
Security Data Mining: A Survey Introducing Tamper-Resistance | p. 97 |
Introduction | p. 97 |
Security Data Mining | p. 98 |
Definitions | p. 98 |
Specific Issues | p. 99 |
General Issues | p. 101 |
Tamper-Resistance | p. 102 |
Reliable Data | p. 102 |
Anomaly Detection Algorithms | p. 104 |
Privacy and Confidentiality Preserving Results | p. 105 |
Conclusion | p. 108 |
References | p. 108 |
A Domain Driven Mining Algorithm on Gene Sequence Clustering | p. 111 |
Introduction | p. 111 |
Related Work | p. 112 |
The Similarity Based on Biological Domain Knowledge | p. 114 |
Problem Statement | p. 114 |
A Domain-Driven Gene Sequence Clustering Algorithm | p. 117 |
Experiments and Performance Study | p. 121 |
Conclusion and Future Work | p. 124 |
References | p. 125 |
Domain Driven Tree Mining of Semi-structured Mental Health Information | p. 127 |
Introduction | p. 127 |
Information Use and Management within Mental Health Domain | p. 128 |
Tree Mining - General Considerations | p. 130 |
Basic Tree Mining Concepts | p. 131 |
Tree Mining of Medical Data | p. 135 |
Illustration of the Approach | p. 139 |
Conclusion and Future Work | p. 139 |
References | p. 140 |
Text Mining for Real-time Ontology Evolution | p. 143 |
Introduction | p. 144 |
Related Text Mining Work | p. 145 |
Terminology and Multi-representations | p. 145 |
Master Aliases Table and OCOE Data Structures | p. 149 |
Experimental Results | p. 152 |
CAV Construction and Information Ranking | p. 153 |
Real-Time CAV Expansion Supported by Text Mining | p. 154 |
Conclusion | p. 155 |
Acknowledgement | p. 156 |
References | p. 156 |
Microarray Data Mining: Selecting Trustworthy Genes with Gene Feature Ranking | p. 159 |
Introduction | p. 159 |
Gene Feature Ranking | p. 161 |
Use of Attributes and Data Samples in Gene Feature Ranking | p. 162 |
Gene Feature Ranking: Feature Selection Phase 1 | p. 163 |
Gene Feature Ranking: Feature Selection Phase 2 | p. 163 |
Application of Gene Feature Ranking to Acute Lymphoblastic Leukemia data | p. 164 |
Conclusion | p. 166 |
References | p. 167 |
Blog Data Mining for Cyber Security Threats | p. 169 |
Introduction | p. 169 |
Review of Related Work | p. 170 |
Intelligence Analysis | p. 171 |
Information Extraction from Blogs | p. 171 |
Probabilistic Techniques for Blog Data Mining | p. 172 |
Attributes of Blog Documents | p. 172 |
Latent Dirichlet Allocation | p. 173 |
Isometric Feature Mapping (Isomap) | p. 174 |
Experiments and Results | p. 175 |
Data Corpus | p. 175 |
Results for Blog Topic Analysis | p. 176 |
Blog Content Visualization | p. 178 |
Blog Time Visualization | p. 179 |
Conclusions | p. 180 |
References | p. 181 |
Blog Data Mining: The Predictive Power of Sentiments | p. 183 |
Introduction | p. 183 |
Related Work | p. 185 |
Characteristics of Online Discussions | p. 186 |
Blog Mentions | p. 186 |
Box Office Data and User Rating | p. 187 |
Discussion | p. 187 |
S-PLSA: A Probabilistic Approach to Sentiment Mining | p. 188 |
Feature Selection | p. 188 |
Sentiment PLSA | p. 188 |
ARSA: A Sentiment-Aware Model | p. 189 |
The Autoregressive Model | p. 190 |
Incorporating Sentiments | p. 191 |
Experiments | p. 192 |
Experiment Settings | p. 192 |
Parameter Selection | p. 193 |
Conclusions and Future Work | p. 194 |
References | p. 194 |
Web Mining: Extracting Knowledge from the World Wide Web | p. 197 |
Overview of Web Mining Techniques | p. 197 |
Web Content Mining | p. 199 |
Classification: Multi-hierarchy Text Classification | p. 199 |
Clustering Analysis: Clustering Algorithm Based on Swarm Intelligence and k-Means | p. 200 |
Semantic Text Analysis: Conceptual Semantic Space | p. 202 |
Web Structure Mining: Page Rank vs. HITS | p. 203 |
Web Event Mining | p. 204 |
Preprocessing for Web Event Mining | p. 205 |
Multi-document Summarization: A Way to Demonstrate Event's Cause and Effect | p. 206 |
Conclusions and Future Works | p. 206 |
References | p. 207 |
DAG Mining for Code Compaction | p. 209 |
Introduction | p. 209 |
Related Work | p. 211 |
Graph and DAG Mining Basics | p. 211 |
Graph-based versus Embedding-based Mining | p. 212 |
Embedded versus Induced Fragments | p. 213 |
DAG Mining Is NP-complete | p. 213 |
Algorithmic Details of DAGMA | p. 214 |
A Canonical Form for DAG enumeration | p. 214 |
Basic Structure of the DAG Mining Algorithm | p. 215 |
Expansion Rules | p. 216 |
Application to Procedural Abstraction | p. 219 |
Evaluation | p. 220 |
Conclusion and Future Work | p. 222 |
References | p. 223 |
A Framework for Context-Aware Trajectory Data Mining | p. 225 |
Introduction | p. 225 |
Basic Concepts | p. 227 |
A Domain-driven Framework for Trajectory Data Mining | p. 229 |
Case Study | p. 232 |
The Selected Mobile Movement-aware Outdoor Game | p. 233 |
Transportation Application | p. 234 |
Conclusions and Future Trends | p. 238 |
References | p. 239 |
Census Data Mining for Land Use Classification | p. 241 |
Content Structure | p. 241 |
Key Research Issues | p. 242 |
Land Use and Remote Sensing | p. 242 |
Census Data and Land Use Distribution | p. 243 |
Census Data Warehouse and Spatial Data Mining | p. 243 |
Concerning about Data Quality | p. 243 |
Concerning about Domain Driven | p. 244 |
Applying Machine Learning Tools | p. 246 |
Data Integration | p. 247 |
Area of Study and Data | p. 247 |
Supported Digital Image Processing | p. 248 |
Putting All Steps Together | p. 248 |
Results and Analysis | p. 249 |
References | p. 251 |
Visual Data Mining for Developing Competitive Strategies in Higher Education | p. 253 |
Introduction | p. 253 |
Square Tiles Visualization | p. 255 |
Related Work | p. 256 |
Mathematical Model | p. 257 |
Framework and Case Study | p. 260 |
General Insights and Observations | p. 261 |
Benchmarking | p. 262 |
High School Relationship Management (HSRM) | p. 263 |
Future Work | p. 264 |
Conclusions | p. 264 |
References | p. 265 |
Data Mining For Robust Flight Scheduling | p. 267 |
Introduction | p. 267 |
Flight Scheduling in the Presence of Delays | p. 268 |
Related Work | p. 270 |
Classification of Flights | p. 272 |
Subspaces for Locally Varying Relevance | p. 272 |
Integrating Subspace Information for Robust Flight Classification | p. 272 |
Algorithmic Concept | p. 274 |
Monotonicity Properties of Relevant Attribute Subspaces | p. 274 |
Top-down Class Entropy Algorithm: Lossless Pruning Theorem | p. 275 |
Algorithm: Subspaces, Clusters, Subspace Classification | p. 276 |
Evaluation of Flight Delay Classification in Practice | p. 278 |
Conclusion | p. 280 |
References | p. 280 |
Data Mining for Algorithmic Asset Management | p. 283 |
Introduction | p. 283 |
Backbone of the Asset Management System | p. 285 |
Expert-based Incremental Learning | p. 286 |
An Application to the iShare Index Fund | p. 290 |
References | p. 294 |
Reviewer List | p. 297 |
Index | p. 299 |
Table of Contents provided by Publisher. All Rights Reserved. |
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.