9780120884070

Foreword

Preface

xxiii

Updated and revised content

xxvii

Acknowledgments

xxix

Part I Machine learning tools and techniques

(362)

What's it all about?

(38)

Data mining and machine learning

(5)

Describing structural patterns

(1)

Machine learning

(2)

Data mining

(1)

Simple examples: The weather problem and others

(13)

The weather problem

(3)

Contact lenses: An idealized problem

(2)

Irises: A classic numeric dataset

(1)

CPU performance: Introducing numeric prediction

(1)

Labor negotiations: A more realistic example

(1)

Soybean classification: A classic machine learning success

(4)

Fielded applications

(7)

Decisions involving judgment

(1)

Screening images

(1)

Load forecasting

(1)

Diagnosis

(1)

Marketing and sales

(2)

Other applications

(1)

Machine learning and statistics

(1)

Generalization as search

(5)

Enumerating the concept space

(1)

Bias

(3)

Data mining and ethics

(2)

Further reading

(4)

Input: Concepts, instances, and attributes

(20)

What's a concept?

(3)

What's in an example?

(4)

What's in an attribute?

(3)

Preparing the input

(8)

Gathering the data together

(1)

ARFF format

(2)

Sparse data

(1)

Attribute types

(2)

Missing values

(1)

Inaccurate values

(1)

Getting to know your data

(1)

Further reading

(1)

Output: Knowledge representation

(22)

Decision tables

(1)

Decision trees

(3)

Classification rules

(4)

Association rules

(1)

Rules with exceptions

(3)

Rules involving relations

(3)

Trees for numeric prediction

(1)

Instance-based representation

(5)

Clusters

(1)

Further reading

(1)

Algorithms: The basic methods

(60)

Inferring rudimentary rules

(4)

Missing values and numeric attributes

(2)

Discussion

(1)

Statistical modeling

(9)

Missing values and numeric attributes

(2)

Bayesian models for document classification

(2)

Discussion

(1)

Divide-and-conquer: Constructing decision trees

(8)

Calculating information

100

(2)

Highly branching attributes

102

(3)

Discussion

105

(1)

Covering algorithms: Constructing rules

105

(7)

Rules versus trees

107

(1)

A simple covering algorithm

107

(4)

Rules versus decision lists

111

(1)

Mining association rules

112

(7)

Item sets

113

(1)

Association rules

113

(4)

Generating rules efficiently

117

(1)

Discussion

118

(1)

Linear models

119

(9)

Numeric prediction: Linear regression

119

(2)

Linear classification: Logistic regression

121

(3)

Linear classification using the perceptron

124

(2)

Linear classification using Winnow

126

(2)

Instance-based learning

128

(8)

The distance function

128

(1)

Finding nearest neighbors efficiently

129

(6)

Discussion

135

(1)

Clustering

136

(3)

Iterative distance-based clustering

137

(1)

Faster distance calculations

138

(1)

Discussion

139

(1)

Further reading

139

(4)

Credibility: Evaluating what's been learned

143

(44)

Training and testing

144

(2)

Predicting performance

146

(3)

Cross-validation

149

(2)

Other estimates

151

(2)

Leave-one-out

151

(1)

The bootstrap

152

(1)

Comparing data mining methods

153

(4)

Predicting probabilities

157

(4)

Quadratic loss function

158

(1)

Informational loss function

159

(1)

Discussion

160

(1)

Counting the cost

161

(15)

Cost-sensitive classification

164

(1)

Cost-sensitive learning

165

(1)

Lift charts

166

(2)

ROC curves

168

(3)

Recall--precision curves

171

(1)

Discussion

172

(1)

Cost curves

173

(3)

Evaluating numeric prediction

176

(3)

The minimum description length principle

179

(4)

Applying the MDL principle to clustering

183

(1)

Further reading

184

(3)

Implementations: Real machine learning schemes

187

(98)

Decision trees

189

(11)

Numeric attributes

189

(2)

Missing values

191

(1)

Pruning

192

(1)

Estimating error rates

193

(3)

Complexity of decision tree induction

196

(2)

From trees to rules

198

(1)

C4.5: Choices and options

198

(1)

Discussion

199

(1)

Classification rules

200

(14)

Criteria for choosing tests

200

(1)

Missing values, numeric attributes

201

(1)

Generating good rules

202

(3)

Using global optimization

205

(2)

Obtaining rules from partial decision trees

207

(3)

Rules with exceptions

210

(3)

Discussion

213

(1)

Extending linear models

214

(21)

The maximum margin hyperplane

215

(2)

Nonlinear class boundaries

217

(2)

Support vector regression

219

(3)

The kernel perceptron

222

(1)

Multilayer perceptrons

223

(12)

Discussion

235

(1)

Instance-based learning

235

(8)

Reducing the number of exemplars

236

(1)

Pruning noisy exemplars

236

(1)

Weighting attributes

237

(1)

Generalizing exemplars

238

(1)

Distance functions for generalized exemplars

239

(2)

Generalized distance functions

241

(1)

Discussion

242

(1)

Numeric prediction

243

(11)

Model trees

244

(1)

Building the tree

245

(1)

Pruning the tree

245

(1)

Nominal attributes

246

(1)

Missing values

246

(1)

Pseudocode for model tree induction

247

(3)

Rules from model trees

250

(1)

Locally weighted linear regression

251

(2)

Discussion

253

(1)

Clustering

254

(17)

Choosing the number of clusters

254

(1)

Incremental clustering

255

(5)

Category utility

260

(2)

Probability-based clustering

262

(3)

The EM algorithm

265

(1)

Extending the mixture model

266

(2)

Bayesian clustering

268

(2)

Discussion

270

(1)

Bayesian networks

271

(14)

Making predictions

272

(4)

Learning Bayesian networks

276

(2)

Specific algorithms

278

(2)

Data structures for fast learning

280

(3)

Discussion

283

(2)

Transformations: Engineering the input and output

285

(60)

Attribute selection

288

(8)

Scheme-independent selection

290

(2)

Searching the attribute space

292

(2)

Scheme-specific selection

294

(2)

Discretizing numeric attributes

296

(9)

Unsupervised discretization

297

(1)

Entropy-based discretization

298

(4)

Other discretization methods

302

(1)

Entropy-based versus error-based discretization

302

(2)

Converting discrete to numeric attributes

304

(1)

Some useful transformations

305

(7)

Principal components analysis

306

(3)

Random projections

309

(1)

Text to attribute vectors

309

(2)

Time series

311

(1)

Automatic data cleansing

312

(3)

Improving decision trees

312

(1)

Robust regression

313

(1)

Detecting anomalies

314

(1)

Combining multiple models

315

(22)

Bagging

316

(3)

Bagging with costs

319

(1)

Randomization

320

(1)

Boosting

321

(4)

Additive regression

325

(2)

Additive logistic regression

327

(1)

Option trees

328

(3)

Logistic model trees

331

(1)

Stacking

332

(2)

Error-correcting output codes

334

(3)

Using unlabeled data

337

(4)

Clustering for classification

337

(2)

Co-training

339

(1)

EM and co-training

340

(1)

Further reading

341

(4)

Moving on: Extensions and applications

345

(18)

Learning from massive datasets

346

(3)

Incorporating domain knowledge

349

(2)

Text and Web mining

351

(5)

Adversarial situations

356

(2)

Ubiquitous data mining

358

(3)

Further reading

361

(2)

Part II The Weka machine learning workbench

363

(122)

Introduction to Weka

365

(4)

What's in Weka?

366

(1)

How do you use it?

367

(1)

What else can you do?

368

(1)

How do you get it?

368

(1)

The Explorer

369

(58)

Getting started

369

(11)

Preparing the data

370

(1)

Loading the data into the Explorer

370

(3)

Building a decision tree

373

(1)

Examining the output

373

(4)

Doing it again

377

(1)

Working with models

377

(1)

When things go wrong

378

(2)

Exploring the Explorer

380

(13)

Loading and filtering files

380

(4)

Training and testing learning schemes

384

(4)

Do it yourself: The User Classifier

388

(1)

Using a metalearner

389

(2)

Clustering and association rules

391

(1)

Attribute selection

392

(1)

Visualization

393

(1)

Filtering algorithms

393

(10)

Unsupervised attribute filters

395

(5)

Unsupervised instance filters

400

(1)

Supervised filters

401

(2)

Learning algorithms

403

(11)

Bayesian classifiers

403

(3)

Trees

406

(2)

Rules

408

(1)

Functions

409

(4)

Lazy classifiers

413

(1)

Miscellaneous classifiers

414

(1)

Metalearning algorithms

414

(4)

Bagging and randomization

414

(2)

Boosting

416

(1)

Combining classifiers

417

(1)

Cost-sensitive learning

417

(1)

Optimizing performance

417

(1)

Retargeting classifiers for different tasks

418

(1)

Clustering algorithms

418

(1)

Association-rule learners

419

(1)

Attribute selection

420

(7)

Attribute subset evaluators

422

(1)

Single-attribute evaluators

422

(1)

Search methods

423

(4)

The Knowledge Flow interface

427

(10)

Getting started

427

(3)

The Knowledge Flow components

430

(1)

Configuring and connecting the components

431

(2)

Incremental learning

433

(4)

The Experimenter

437

(12)

Getting started

438

(3)

Running an experiment

439

(1)

Analyzing the results

440

(1)

Simple setup

441

(1)

Advanced setup

442

(1)

The Analyze panel

443

(2)

Distributing processing over several machines

445

(4)

The command-line interface

449

(12)

Getting started

449

(1)

The structure of Weka

450

(6)

Classes, instances, and packages

450

(1)

The weka.core package

451

(2)

The weka.classifiers package

453

(2)

Other packages

455

(1)

Javadoc indices

456

(1)

Command-line options

456

(5)

Generic options

456

(2)

Scheme-specific options

458

(3)

Embedded machine learning

461

(10)

A simple data mining application

461

(1)

Going through the code

462

(9)

Main()

462

(1)

MessageClassifier()

462

(6)

updateData()

468

(1)

classifyMessage()

468

(3)

Writing new learning schemes

471

(14)

An example classifier

471

(12)

buildClassifier()

472

(1)

makeTree()

472

(8)

computeInfoGain()

480

(1)

classifyInstance()

480

(1)

main()

481

(2)

Conventions for implementing classifiers

483

(2)

References

485

(20)

Index

505

(20)

About the authors

525

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Data Mining : Practical Machine Learning Tools and Techniques

0120884070

Supplemental Materials

Summary

Table of Contents

Supplemental Materials

Rewards Program