IMPORTANT COVID-19 UPDATES

9781119693413

Smarter Data Science

by ; ;
  • ISBN13:

    9781119693413

  • ISBN10:

    1119693411

  • Format: Paperback
  • Copyright: 2020-05-05
  • Publisher: John Wiley & Sons Inc

Note: Supplemental materials are not guaranteed with Rental or Used book purchases.

Purchase Benefits

  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $50.00 Save up to $14.00
  • Rent Book $42.50
    Add to Cart Free Shipping Icon Free Shipping

    TERM
    PRICE
    DUE
    USUALLY SHIPS IN 3-4 BUSINESS DAYS
    *This item is part of an exclusive publisher rental program and requires an additional convenience fee. This fee will be reflected in the shopping cart.

Supplemental Materials

What is included with this book?

Summary

Organizations can make data science a repeatable, predictable tool, which business professionals use to get more value from their data

Enterprise data and AI projects are often scattershot, underbaked, siloed, and not adaptable to predictable business changes. As a result, the vast majority fail. These expensive quagmires can be avoided, and this book explains precisely how. 

Data science is emerging as a hands-on tool for not just data scientists, but business professionals as well. Managers, directors, IT leaders, and analysts must expand their use of data science capabilities for the organization to stay competitive. Smarter Data Science helps them achieve their enterprise-grade data projects and AI goals. It serves as a guide to building a robust and comprehensive information architecture program that enables sustainable and scalable AI deployments.

When an organization manages its data effectively, its data science program becomes a fully scalable function that’s both prescriptive and repeatable. With an understanding of data science principles, practitioners are also empowered to lead their organizations in establishing and deploying viable AI. They employ the tools of machine learning, deep learning, and AI to extract greater value from data for the benefit of the enterprise.

By following a ladder framework that promotes prescriptive capabilities, organizations can make data science accessible to a range of team members, democratizing data science throughout the organization. Companies that collect, organize, and analyze data can move forward to additional data science achievements:

  • Improving time-to-value with infused AI models for common use cases
  • Optimizing knowledge work and business processes
  • Utilizing AI-based business intelligence and data visualization
  • Establishing a data topology to support general or highly specialized needs
  • Successfully completing AI projects in a predictable manner
  • Coordinating the use of AI from any compute node. From inner edges to outer edges: cloud, fog, and mist computing

When they climb the ladder presented in this book, businesspeople and data scientists alike will be able to improve and foster repeatable capabilities. They will have the knowledge to maximize their AI and data assets for the benefit of their organizations.

Author Biography

NEAL FISHMAN is a Distinguished Engineer and CTO of Data-Based Pathology at IBM. He is an IBM-certified Senior IT Architect and Open Group Distinguished Chief Architect.

COLE STRYKER is a journalist based in Los Angeles. He is the author of Epic Win for Anonymous and Hacking the Future.

Table of Contents

Foreword for Smarter Data Science xix

Epigraph xxi

Preamble xxiii

Chapter 1 Climbing the AI Ladder 1

Readying Data for AI 2

Technology Focus Areas 3

Taking the Ladder Rung by Rung 4

Constantly Adapt to Retain Organizational Relevance 8

Data-Based Reasoning is Part and Parcel in the Modern Business 10

Toward the AI-Centric Organization 14

Summary 16

Chapter 2 Framing Part I: Considerations for Organizations Using AI 17

Data-Driven Decision-Making 18

Using Interrogatives to Gain Insight 19

The Trust Matrix 20

The Importance of Metrics and Human Insight 22

Democratizing Data and Data Science 23

Aye, a Prerequisite: Organizing Data Must Be a Forethought 26

Preventing Design Pitfalls 27

Facilitating the Winds of Change: How Organized Data Facilitates Reaction Time 29

Quae Quaestio (Question Everything) 30

Summary 32

Chapter 3 Framing Part II: Considerations for Working with Data and AI 35

Personalizing the Data Experience for Every User 36

Context Counts: Choosing the Right Way to Display Data 38

Ethnography: Improving Understanding Through Specialized Data 42

Data Governance and Data Quality 43

The Value of Decomposing Data 43

Providing Structure Through Data Governance 43

Curating Data for Training 45

Additional Considerations for Creating Value 45

Ontologies: A Means for Encapsulating Knowledge 46

Fairness, Trust, and Transparency in AI Outcomes 49

Accessible, Accurate, Curated, and Organized 52

Summary 54

Chapter 4 A Look Back on Analytics: More Than One Hammer 57

Been Here Before: Reviewing the Enterprise Data Warehouse 57

Drawbacks of the Traditional Data Warehouse 64

Paradigm Shift 68

Modern Analytical Environments: The Data Lake 69

By Contrast 71

Indigenous Data 72

Attributes of Difference 73

Elements of the Data Lake 75

The New Normal: Big Data is Now Normal Data 77

Liberation from the Rigidity of a Single Data Model 78

Streaming Data 78

Suitable Tools for the Task 78

Easier Accessibility 79

Reducing Costs 79

Scalability 79

Data Management and Data Governance for AI 80

Schema-on-Read vs. Schema-on-Write 81

Summary 84

Chapter 5 A Look Forward on Analytics: Not Everything Can Be a Nail 87

A Need for Organization 87

The Staging Zone 90

The Raw Zone 91

The Discovery and Exploration Zone 92

The Aligned Zone 93

The Harmonized Zone 98

The Curated Zone 100

Data Topologies 100

Zone Map 103

Data Pipelines 104

Data Topography 105

Expanding, Adding, Moving, and Removing Zones 107

Enabling the Zones 108

Ingestion 108

Data Governance 111

Data Storage and Retention 112

Data Processing 114

Data Access 116

Management and Monitoring 117

Metadata 118

Summary 119

Chapter 6 Addressing Operational Disciplines on the AI Ladder 121

A Passage of Time 122

Create 128

Stability 128

Barriers 129

Complexity 129

Execute 130

Ingestion 131

Visibility 132

Compliance 132

Operate 133

Quality 134

Reliance 135

Reusability 135

The xOps Trifecta: DevOps/MLOps, DataOps, and AIOps 136

DevOps/MLOps 137

DataOps 139

AIOps 142

Summary 144

Chapter 7 Maximizing the Use of Your Data: Being Value Driven 147

Toward a Value Chain 148

Chaining Through Correlation 152

Enabling Action 154

Expanding the Means to Act 155

Curation 156

Data Governance 159

Integrated Data Management 162

Onboarding 163

Organizing 164

Cataloging 166

Metadata 167

Preparing 168

Provisioning 169

Multi-Tenancy 170

Summary 173

Chapter 8 Valuing Data with Statistical Analysis and Enabling Meaningful Access 175

Deriving Value: Managing Data as an Asset 175

An Inexact Science 180

Accessibility to Data: Not All Users are Equal 183

Providing Self-Service to Data 184

Access: The Importance of Adding Controls 186

Ranking Datasets Using a Bottom-Up Approach for Data Governance 187

How Various Industries Use Data and AI 188

Benefi ting from Statistics 189

Summary 198

Chapter 9 Constructing for the Long-Term 199

The Need to Change Habits: Avoiding Hard-Coding 200

Overloading 201

Locked In 202

Ownership and Decomposition 204

Design to Avoid Change 204

Extending the Value of Data Through AI 206

Polyglot Persistence 208

Benefi ting from Data Literacy 213

Understanding a Topic 215

Skillsets 216

It’s All Metadata 218

The Right Data, in the Right Context, with the Right Interface 219

Summary 221

Chapter 10 A Journey’s End: An IA for AI 223

Development Efforts for AI 224

Essential Elements: Cloud-Based Computing, Data, and Analytics 228

Intersections: Compute Capacity and Storage Capacity 234

Analytic Intensity 237

Interoperability Across the Elements 238

Data Pipeline Flight Paths: Preflight, Inflight, Postflight 242

Data Management for the Data Puddle, Data Pond, and Data Lake 243

Driving Action: Context, Content, and Decision-Makers 245

Keep It Simple 248

The Silo is Dead; Long Live the Silo 250

Taxonomy: Organizing Data Zones 252

Capabilities for an Open Platform 256

Summary 260

Appendix Glossary of Terms 263

Index 269

Rewards Program

Write a Review