9780137151448

Global organizations must quickly and cost-effectively analyze, translate, synthesize, and distill massive amount of text in multiple languages. The technology needed to automate this process - multilingual natural language processing (NLP)- is advancing rapidly. This is the first comprehensive, "one-stop-shop" guide to building robust and accurate multilingual NLP systems. Multilingual Natural Language Applicationscombines all the essential background and realistic, up-to-date guidance practitioners will need to succeed. Containing new contributions from leading researchers at IBM, Google, Stanford, CMU, Columbia, and ISI, it integrates cutting-edge advances with practical solutions drawn from extensive field experience. Part I focuses primarily on multilingual NLP's core technologies, including technologies for understanding the structure of words and documents; analyzing syntax; modeling language; recognizing entailment, and detecting redundancy. Part II delves into the theoretical and practical considerations involved in using these technologies to construct real-world applications. It contains detailed chapters on information extraction, machine translation, information retrieval and search, summarization, question answering, distillation, and processing pipelines.

Daniel M. Bikel is a senior research scientist at Google, developing new methods for NLP and speech recognition. While at IBM, he architected the distillation system for IBM’s GALE multilingual information extraction and question-answering system. While pursuing his doctorate at Penn, he built the first extensible multilingual syntactic parsing engine.

Imed Zitouni is a senior research scientist at IBM. He has led IBM’s Arabic information extraction and data resources efforts since 2004. He previously led both DIALOCA’s Speech/NLP group and Bell Labs/ Alcatel-Lucent’s language modeling and call routing activities. His work involves machine translation, NLP, and spoken dialog systems.

Preface         xxi
Acknowledgments         xxv

About the Authors         xxvii

Part I: In Theory 1

Chapter 1: Finding the Structure of Words 3

1.1 Words and Their Components 4

1.2 Issues and Challenges 8

1.3 Morphological Models 15

1.4 Summary 22

Chapter 2: Finding the Structure of Documents 29

2.1 Introduction 29

2.2 Methods 33

2.3 Complexity of the Approaches 40

2.4 Performances of the Approaches 41

2.5 Features 41

2.6 Processing Stages 48

2.7 Discussion 48

2.8 Summary 49

Chapter 3: Syntax 57

3.1 Parsing Natural Language 57

3.2 Treebanks: A Data-Driven Approach to Syntax 59

3.3 Representation of Syntactic Structure 63

3.4 Parsing Algorithms 70

3.5 Models for Ambiguity Resolution in Parsing 80

3.6 Multilingual Issues: What Is a Token? 87

3.7 Summary 92

Chapter 4: Semantic Parsing 97

4.1 Introduction 97

4.2 Semantic Interpretation 98

4.3 System Paradigms 101

4.4 Word Sense 102

4.5 Predicate-Argument Structure 118

4.6 Meaning Representation 147

4.7 Summary 152

Chapter 5: Language Modeling 169

5.1 Introduction 169

5.2 n-Gram Models 170

5.3 Language Model Evaluation 170

5.4 Parameter Estimation 171

5.5 Language Model Adaptation 176

5.6 Types of Language Models 178

5.7 Language-Specific Modeling Problems 188

5.8 Multilingual and Crosslingual Language Modeling 195

5.9 Summary 198

Chapter 6: Recognizing Textual Entailment 209

6.1 Introduction 209

6.2 The Recognizing Textual Entailment Task 210

6.3 A Framework for Recognizing Textual Entailment 219

6.4 Case Studies 238

6.5 Taking RTE Further 248

6.6 Useful Resources 252

6.7 Summary 253

Chapter 7: Multilingual Sentiment and Subjectivity Analysis 259

7.1 Introduction 259

7.2 Definitions 260

7.3 Sentiment and Subjectivity Analysis on English 262

7.4 Word- and Phrase-Level Annotations 264

7.5 Sentence-Level Annotations 270

7.6 Document-Level Annotations 272

7.7 What Works, What Doesn’t 274

7.8 Summary 277

Part II: In Practice 283

Chapter 8: Entity Detection and Tracking 285

8.1 Introduction 285

8.2 Mention Detection 287

8.3 Coreference Resolution 296

8.4 Summary 303

Chapter 9: Relations and Events 309

9.1 Introduction 309

9.2 Relations and Events 310

9.3 Types of Relations 311

9.4 Relation Extraction as Classification 312

9.5 Other Approaches to Relation Extraction 317

9.6 Events 320

9.7 Event Extraction Approaches 320

9.8 Moving Beyond the Sentence 323

9.9 Event Matching 323

9.10 Future Directions for Event Extraction 326

9.11 Summary 326

Chapter 10: Machine Translation 331

10.1 Machine Translation Today 331

10.2 Machine Translation Evaluation 332

10.3 Word Alignment 337

10.4 Phrase-Based Models 343

10.5 Tree-Based Models 350

10.6 Linguistic Challenges 354

10.7 Tools and Data Resources 356

10.8 Future Directions 358

10.9 Summary 359

Chapter 11: Multilingual Information Retrieval 365

11.1 Introduction 366

11.2 Document Preprocessing 366

11.3 Monolingual Information Retrieval 372

11.4 CLIR 378

11.5 MLIR 382

11.6 Evaluation in Information Retrieval 386

11.7 Tools, Software, and Resources 391

11.8 Summary 393

Chapter 12: Multilingual Automatic Summarization 397

12.1 Introduction 397

12.2 Approaches to Summarization 399

12.3 Evaluation 412

12.4 How to Build a Summarizer 420

12.5 Competitions and Datasets 424

12.6 Summary 426

Chapter 13: Question Answering 433

13.1 Introduction and History 433

13.2 Architectures 435

13.3 Source Acquisition and Preprocessing 437

13.4 Question Analysis 440

13.5 Search and Candidate Extraction 443

13.6 Answer Scoring 450

13.7 Crosslingual Question Answering 454

13.8 A Case Study 455

13.9 Evaluation 460

13.10 Current and Future Challenges 464

13.11 Summary and Further Reading 465

Chapter 14: Distillation 475

14.1 Introduction 475

14.2 An Example 476

14.3 Relevance and Redundancy 477

14.4 The Rosetta Consortium Distillation System 479

14.5 Other Distillation Approaches 488

14.6 Evaluation and Metrics 491

14.7 Summary 495

Chapter 15: Spoken Dialog Systems 499

15.1 Introduction 499

15.2 Spoken Dialog Systems 499

15.3 Forms of Dialog 509

15.4 Natural Language Call Routing 510

15.5 Three Generations of Dialog Applications 510

15.6 Continuous Improvement Cycle 512

15.7 Transcription and Annotation of Utterances 513

15.8 Localization of Spoken Dialog Systems 513

15.9 Summary 520

Chapter 16: Combining Natural Language Processing Engines 523

16.1 Introduction 523

16.2 Desired Attributes of Architectures for Aggregating Speech and NLP Engines 524

16.3 Architectures for Aggregation 527

16.4 Case Studies 531

16.5 Lessons Learned 540

16.6 Summary 542

16.7 Sample UIMA Code 542

Index 551

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Amazon no longer offers textbook rentals. We do!

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

Multilingual Natural Language Processing Applications From Theory to Practice

0137151446

Supplemental Materials

Summary

Author Biography

Table of Contents

Supplemental Materials

Rewards Program