MALCOLM ATKINSON, PhD, is Professor of e-Science in the School of Informatics at the University of Edinburgh in Scotland. He is also Data-Intensive Research Group leader, Director of the e-Science Institute, IT architect for the ADMIRE and VERCE EU projects and UK e-Science Envoy. Professor Atkinson has been leading research projects for several decades and served on many advisory bodies.
CONTRIBUTORS xv
FOREWORD xvii
PREFACE xix
THE EDITORS xxix
PART I STRATEGIES FOR SUCCESS IN THE DIGITAL-DATA REVOLUTION 1
1. The Digital-Data Challenge 5
Malcolm Atkinson and Mark Parsons
1.1 The Digital Revolution / 5
1.2 Changing How We Think and Behave / 6
1.3 Moving Adroitly in this Fast-Changing Field / 8
1.4 Digital-Data Challenges Exist Everywhere / 8
1.5 Changing How We Work / 9
1.6 Divide and Conquer Offers the Solution / 10
1.7 Engineering Data-to-Knowledge Highways / 12
References / 13
2. The Digital-Data Revolution 15
Malcolm Atkinson
2.1 Data, Information, and Knowledge / 16
2.2 Increasing Volumes and Diversity of Data / 18
2.3 Changing the Ways We Work with Data / 28
References / 33
3. The Data-Intensive Survival Guide 37
Malcolm Atkinson
3.1 Introduction: Challenges and Strategy / 38
3.2 Three Categories of Expert / 39
3.3 The Data-Intensive Architecture / 41
3.4 An Operational Data-Intensive System / 42
3.5 Introducing DISPEL / 44
3.6 A Simple DISPEL Example / 45
3.7 Supporting Data-Intensive Experts / 47
3.8 DISPEL in the Context of Contemporary Systems / 48
3.9 Datascopes / 51
3.10 Ramps for Incremental Engagement / 54
3.11 Readers’ Guide to the Rest of This Book / 56
References / 58
4. Data-Intensive Thinking with DISPEL 61
Malcolm Atkinson
4.1 Processing Elements / 62
4.2 Connections / 64
4.3 Data Streams and Structure / 65
4.4 Functions / 66
4.5 The Three-Level Type System / 72
4.6 Registry, Libraries, and Descriptions / 81
4.7 Achieving Data-Intensive Performance / 86
4.8 Reliability and Control / 108
4.9 The Data-to-Knowledge Highway / 116
References / 121
PART II DATA-INTENSIVE KNOWLEDGE DISCOVERY 123
5. Data-Intensive Analysis 127
Oscar Corcho and Jano van Hemert
5.1 Knowledge Discovery in Telco Inc. / 128
5.2 Understanding Customers to Prevent Churn / 130
5.3 Preventing Churn Across Multiple Companies / 134
5.4 Understanding Customers by Combining Heterogeneous Public
and Private Data / 137
5.5 Conclusions / 144
References / 145
6. Problem Solving in Data-Intensive Knowledge Discovery 147
Oscar Corcho and Jano van Hemert
6.1 The Conventional Life Cycle of Knowledge Discovery / 148
6.2 Knowledge Discovery Over Heterogeneous Data Sources / 155
6.3 Knowledge Discovery from Private and Public, Structured and Nonstructured Data / 158
6.4 Conclusions / 162
References / 162
7. Data-Intensive Components and Usage Patterns 165
Oscar Corcho
7.1 Data Source Access and Transformation Components / 166
7.2 Data Integration Components / 172
7.3 Data Preparation and Processing Components / 173
7.4 Data-Mining Components / 174
7.5 Visualization and Knowledge Delivery Components / 176
References / 178
8. Sharing and Reuse in Knowledge Discovery 181
Oscar Corcho
8.1 Strategies for Sharing and Reuse / 182
8.2 Data Analysis Ontologies for Data Analysis Experts / 185
8.3 Generic Ontologies for Metadata Generation / 188
8.4 Domain Ontologies for Domain Experts / 189
8.5 Conclusions / 190
References / 191
PART III DATA-INTENSIVE ENGINEERING 193
9. Platforms for Data-Intensive Analysis 197
David Snelling
9.1 The Hourglass Reprise / 198
9.2 The Motivation for a Platform / 200
9.3 Realization / 201
References / 201
10. Definition of the DISPEL Language 203
Paul Martin and Gagarine Yaikhom
10.1 A Simple Example / 204
10.2 Processing Elements / 205
10.3 Data Streams / 213
10.4 Type System / 217
10.5 Registration / 222
10.6 Packaging / 224
10.7 Workflow Submission / 225
10.8 Examples of DISPEL / 227
10.9 Summary / 235
References / 236
11. DISPEL Development 237
Adrian Mouat and David Snelling
11.1 The Development Landscape / 237
11.2 Data-Intensive Workbenches / 239
11.3 Data-Intensive Component Libraries / 247
11.4 Summary / 248
References / 248
12. DISPEL Enactment 251
Chee Sun Liew, Amrey Krause, and David Snelling
12.1 Overview of DISPEL Enactment / 251
12.2 DISPEL Language Processing / 253
12.3 DISPEL Optimization / 255
12.4 DISPEL Deployment / 266
12.5 DISPEL Execution and Control / 268
References / 273
PART IV DATA-INTENSIVE APPLICATION EXPERIENCE 275
13. The Application Foundations of DISPEL 277
Rob Baxter
13.1 Characteristics of Data-Intensive Applications / 277
13.2 Evaluating Application Performance / 280
13.3 Reviewing the Data-Intensive Strategy / 283
14. Analytical Platform for Customer Relationship Management 287
Maciej Jarka and Mark Parsons
14.1 Data Analysis in the Telecoms Business / 288
14.2 Analytical Customer Relationship Management / 289
14.3 Scenario 1: Churn Prediction / 291
14.4 Scenario 2: Cross Selling / 293
14.5 Exploiting the Models and Rules / 296
14.6 Summary: Lessons Learned / 299
References / 299
15. Environmental Risk Management 301
Ladislav Hluch´y, Ondrej Habala, Viet Tran, and Branislav ? Simo
15.1 Environmental Modeling / 302
15.2 Cascading Simulation Models / 303
15.3 Environmental Data Sources and Their Management / 305
15.4 Scenario 1: ORAVA / 309
15.5 Scenario 2: RADAR / 313
15.6 Scenario 3: SVP / 318
15.7 New Technologies for Environmental Data Mining / 321
15.8 Summary: Lessons Learned / 323
References / 325
16. Analyzing Gene Expression Imaging Data in Developmental
Biology 327
Liangxiu Han, Jano van Hemert, Ian Overton, Paolo Besana, and
Richard Baldock
16.1 Understanding Biological Function / 328
16.2 Gene Image Annotation / 330
16.3 Automated Annotation of Gene Expression Images / 331
16.4 Exploitation and Future Work / 341
16.5 Summary / 345
References / 346
17. Data-Intensive Seismology: Research Horizons 353
Michelle Galea, Andreas Rietbrock, Alessandro Spinuso, and Luca Trani
17.1 Introduction / 354
17.2 Seismic Ambient Noise Processing / 356
17.3 Solution Implementation / 358
17.4 Evaluation / 369
17.5 Further Work / 372
17.6 Conclusions / 373
References / 375
PART V DATA-INTENSIVE BEACONS OF SUCCESS 377
18. Data-Intensive Methods in Astronomy 381
Thomas D. Kitching, Robert G. Mann, Laura E. Valkonen, Mark S. Holliman,
Alastair Hume, and Keith T. Noddle
18.1 Introduction / 381
18.2 The Virtual Observatory / 382
18.3 Data-Intensive Photometric Classification of Quasars / 383
18.4 Probing the Dark Universe with Weak Gravitational Lensing / 387
18.5 Future Research Issues / 392
18.6 Conclusions / 392
References / 393
19. The World at One’s Fingertips: Interactive
Interpretation of Environmental Data 395
Jon Blower, Keith Haines, and Alastair Gemmell
19.1 Introduction / 395
19.2 The Current State of the Art / 397
19.3 The Technical Landscape / 401
19.4 Interactive Visualization / 403
19.5 From Visualization to Intercomparison / 406
19.6 Future Development: The Environmental Cloud / 409
19.7 Conclusions / 411
References / 412
20. Data-Driven Research in the Humanities—the DARIAH
Research Infrastructure 417
Andreas Aschenbrenner, Tobias Blanke, Christiane Fritze, andWolfgang Pempe
20.1 Introduction / 417
20.2 The Tradition of Digital Humanities / 420
20.3 Humanities Research Data / 422
20.4 Use Case / 426
20.5 Conclusion and Future Development / 429
References / 430
21. Analysis of Large and Complex Engineering and Transport Data 431
Jim Austin
21.1 Introduction / 431
21.2 Applications and Challenges / 432
21.3 The Methods Used / 434
21.4 Future Developments / 438
21.5 Conclusions / 439
References / 440
22. Estimating Species Distributions—Across Space,
Through Time, and with Features of the Environment 441
Steve Kelling, Daniel Fink, Wesley Hochachka, Ken Rosenberg,
Robert Cook, Theodoros Damoulas, Claudio Silva,
and William Michener
22.1 Introduction / 442
22.2 Data Discovery, Access, and Synthesis / 443
22.3 Model Development / 448
22.4 Managing Computational Requirements / 449
22.5 Exploring and Visualizing Model Results / 450
22.6 Analysis Results / 452
22.7 Conclusion / 454
References / 456
PART VI THE DATA-INTENSIVE FUTURE 459
23. Data-Intensive Trends 461
Malcolm Atkinson and Paolo Besana
23.1 Reprise / 461
23.2 Data-Intensive Applications / 469
References / 476
24. Data-Rich Futures 477
Malcolm Atkinson
24.1 Future Data Infrastructure / 478
24.2 Future Data Economy / 485
24.3 Future Data Society and Professionalism / 489
References / 494
Appendix A: Glossary 499
Michelle Galea and Malcolm Atkinson
Appendix B: DISPEL Reference Manual 507
Paul Martin
Appendix C: Component Definitions 531
Malcolm Atkinson and Chee Sun Liew
INDEX 537
The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.