did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9781930110007

Data Munging With Perl

by
  • ISBN13:

    9781930110007

  • ISBN10:

    1930110006

  • Format: Paperback
  • Copyright: 2000-12-01
  • Publisher: Manning Pubns Co
  • Purchase Benefits
  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $36.95

Summary

This book shows you how to process data productively with Perl. It discusses general munging techniques and how to think about data munging problems. You will learn how to decouple the various stages of munging programs, how to design data structures, how to emulate the Unix filter model, etc. If you need to work with complex data formats it will teach you how to do that and also how to build your own tools to process these formats. The book includes detailed techniques for processing HTML and XML. And, it shows you how to build your own parsers to process data of arbitrary complexity.

Table of Contents

foreword xi
preface xiii
about the cover illustration xviii
PART I FOUNDATIONS 1(78)
Data, data munging, and Perl
3(15)
What is data munging?
4(3)
Data munging processes
4(1)
Data recognition
5(1)
Data parsing
6(1)
Data filtering
6(1)
Data transformation
6(1)
Why is data munging important?
7(2)
Accessing corporate data repositories
7(1)
Transferring data between multiple systems
7(1)
Real-world data munging examples
8(1)
Where does data come from? Where does it go?
9(3)
Data files
9(1)
Databases
10(1)
Data pipes
11(1)
Other sources/sinks
11(1)
What forms does data take?
12(2)
Unstructured data
12(1)
Record-oriented data
13(1)
Hierarchical data
13(1)
Binary data
13(1)
What is Perl?
14(2)
Getting Perl
15(1)
Why is Perl good for data munging?
16(1)
Further information
17(1)
Summary
17(1)
General munging-practices
18(21)
Decouple input, munging, and output processes
19(1)
Design data structures carefully
20(5)
Example: the CD file revisited
20(5)
Encapsulate business rules
25(6)
Reasons to encapsulate business rules
26(1)
Ways to encapsulate business rules
26(1)
Simple module
27(1)
Object class
28(3)
Use UNIX ``filter'' model
31(5)
Overview of the filter model
31(1)
Advantages of the filter model
32(4)
Write audit trails
36(2)
What to write to an audit trail
36(1)
Sample audit trail
37(1)
Using the UNIX system logs
37(1)
Further information
38(1)
Summary
38(1)
Useful Perl idioms
39(18)
Sorting
40(7)
Simple sorts
40(1)
Complex sorts
41(1)
The Orcish Manoeuvre
42(1)
Schwartzian transform
43(3)
The Guttman-Rosler transform
46(1)
Choosing a sort technique
46(1)
Database Interface (DBI)
47(2)
Sample DBI program
47(2)
Data::Dumper
49(2)
Benchmarking
51(2)
Command line scripts
53(2)
Further information
55(1)
Summary
56(1)
Pattern matching
57(22)
String handling functions
58(2)
Substrings
58(1)
Finding strings within strings (index and rindex)
59(1)
Case transformations
60(1)
Regular expressions
60(17)
What are regular expressions?
60(1)
Regular expression syntax
61(4)
Using regular expressions
65(5)
Example: translating from English to American
70(3)
More examples: /etc/passwd
73(3)
Taking it to extremes
76(1)
Further information
77(1)
Summary
78(1)
PART II DATA MUNGING 79(68)
Unstructured data
81(15)
ASCII text files
82(5)
Reading the file
82(2)
Text transformations
84(1)
Text statistics
85(2)
Data conversions
87(7)
Converting the character set
87(1)
Converting line endings
88(2)
Converting number formats
90(4)
Further information
94(1)
Summary
95(1)
Record-oriented data
96(31)
Simple record-oriented data
97(11)
Reading simple record-oriented data
97(3)
Processing simple record-oriented data
100(2)
Writing simple record-oriented data
102(3)
Caching data
105(3)
Comma-separated files
108(2)
Anatomy of CSV data
108(1)
Text::CSV_XS
109(1)
Complex records
110(4)
Example: a different CD file
111(2)
Special values for $/
113(1)
Special problems with date fields
114(9)
Built-in Perl date functions
114(6)
Date::Calc
120(1)
Date::Manip
121(1)
Choosing between date modules
122(1)
Extended example: web access logs
123(3)
Further information
126(1)
Summary
126(1)
Fixed-width and binary data
127(20)
Fixed-width data
128(11)
Reading fixed-width data
128(7)
Writing fixed-width data
135(4)
Binary data
139(5)
Reading PNG files
140(3)
Reading and writing MP3 files
143(1)
Further information
144(1)
Summary
145(2)
PART III SIMPLE DATA PARSING 147(78)
Complex data formats
149(14)
Complex data files
150(4)
Example: metadata in the CD file
150(2)
Example: reading the expanded CD file
152(2)
How not to parse HTML
154(4)
Removing tags from HTML
154(3)
Limitations of regular expressions
157(1)
Parsers
158(4)
An introduction to parsers
158(3)
Parsers in Perl
161(1)
Further information
162(1)
Summary
162(1)
HTML
163(12)
Extracting HTML data from the World Wide Web
164(1)
Parsing HTML
165(2)
Example: simple HTML parsing
165(2)
Prebuilt HTML parsers
167(5)
HTML::LinkExtor
167(2)
HTML::TokeParser
169(2)
HTML::TreeBuilder and HTML::Element
171(1)
Extended example: getting weather forecasts
172(2)
Further information
174(1)
Summary
174(1)
XML
175(34)
XML overview
176(2)
What's wrong with HTML?
176(1)
What is XML?
176(2)
Parsing XML with XML::Parser
178(13)
Example: parsing weather.xml
178(1)
Using XML::Parser
179(2)
Other XML::Parser styles
181(7)
XML::Parser handlers
188(3)
XML::DOM
191(2)
Example: parsing XML using XML::DOM
191(2)
Specialized parsers--XML::RSS
193(4)
What is RSS?
193(1)
A sample RSS file
193(2)
Example: creating an RSS file with XML::RSS
195(1)
Example: parsing an RSS file with XML::RSS
196(1)
Producing different document formats
197(11)
Sample XML input file
197(1)
XML document transformation script
198(7)
Using the XML document transformation script
205(3)
Further information
208(1)
Summary
208(1)
Building your own parsers
209(16)
Introduction to Parse::RecDescent
210(2)
Example: parsing simple English sentences
210(2)
Returning parsed data
212(5)
Example: parsing a Windows INI file
212(1)
Understanding the INI file grammar
213(1)
Parser actions and the @item array
214(1)
Example: displaying the contents of @item
214(2)
Returning a data structure
216(1)
Another example: the CD data file
217(6)
Understanding the CD grammar
218(1)
Testing the CD file grammar
219(1)
Adding parser actions
220(3)
Other features of Parse::RecDescent
223(1)
Further information
224(1)
Summary
224(1)
PART IV THE BIG PICTURE 225(7)
Looking back--and ahead
227(5)
The usefulness of things
228(1)
The usefulness of data munging
228(1)
The usefulness of Perl
228(1)
The usefulness of the Perl community
229(1)
Things to know
229(3)
Know your data
229(1)
Know your tools
230(1)
Know where to go for more information
230(2)
appendix A Modules reference 232(22)
appendix B Essential Perl 254(19)
index 273

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program