did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9780471356011

Blueprints for High Availability: Designing Resilient Distributed Systems

by ;
  • ISBN13:

    9780471356011

  • ISBN10:

    0471356018

  • Format: Hardcover
  • Copyright: 2000-01-01
  • Publisher: John Wiley & Sons Inc
  • View Upgraded Edition
  • Purchase Benefits
  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $60.00

Summary

"Rely on this book for information on the technologies and methods you'll need to design and implement high-availability systems...It will help you transform the vision of always-on networks into a reality."-Dr. Eric Schmidt, Chairman and CEO, Novell Corporation Your system will crash! The reason could be something as complex as network congestion or something as mundane as an operating system fault. The good news is that there are steps you can take to maximize your system availability and prevent serious downtime. This authoritative book will provide you with the tools to deploy a system with confidence. The authors guide you through the building of a network that runs with high availability, resiliency, and predictability. They clearly show you how to assess the elements of a system that can fail, select the appropriate level of reliability, and provide steps for designing, implementing, and testing your solution to reduce downtime to a minimum. All the while, they help you determine how much you can afford to spend by balancing costs and benefits. This book of practical, hands-on blueprints: * Examines what can go wrong with the various components of your system * Provides twenty key system design principles for attaining resilience and high availability * Discusses how to arrange disks and disk arrays for protection against hardware failures * Looks at failovers, the software that manages them, and sorts through the myriad of different failover configurations * Provides techniques for improving network reliability and redundancy * Reviews techniques for replicating data and applications to other systems across a network * Offers guidance on application recovery * Examines Disaster Recovery

Author Biography

EVAN MARCUS is a Senior Systems Engineer at VERITAS Software Corporation and co-designed a key piece of the first commercial Sun-based software for High Availability. He has been the companys consultant for successful implementations of VERITAS High Availability Products around the world.

Table of Contents

Foreword xv
Eric Schmidt
Preface xvii
Introduction
1(8)
Why an Availability Book?
2(1)
Our Approach to the Problem Set
3(1)
What's Not Here
4(1)
Our Mission
4(1)
The Availability Index
5(1)
Summary
6(1)
Organization of the Book
7(1)
Key Points
8(1)
What is Resiliency?
9(22)
Measuring Availability
10(6)
Defining Downtime
11(1)
Causes of Downtime
11(2)
What is Availability?
13(1)
`M' is for Mean
14(2)
Failure Modes
16(6)
Hardware
16(1)
Environmental and Physical Failures
17(1)
Network Failures
18(1)
Database System Failures
19(2)
Web Server Failures
21(1)
File and Print Server Failures
22(1)
Cost/Risk Tradeoffs
22(7)
The Costs of Downtime
22(2)
Explaining the Problems to Management
24(1)
Levels of Availability (The Availability Continuum)
25(1)
Regular Availability: Do Nothing Special
25(1)
Increased Availability: Protect the Data
25(1)
High Availability: Protect the System
26(1)
Disaster Recovery: Protect the Organization
26(1)
Fault-Tolerant Systems
26(1)
Balancing Risk and Rewards
27(1)
Don't Overspend
28(1)
Key Points
29(2)
Twenty Key System Design Principles
31(16)
Spend Money ... but Not Blindly
31(1)
Assume Nothing
32(1)
Remove Single Points of Failure
32(1)
Maintain Tight Security
33(1)
Consolidate Your Servers
33(1)
Automate Common Tasks
34(1)
Document Everything
34(2)
Establish Service Level Agreements
36(1)
Plan Ahead
36(1)
Test Everything
37(1)
Maintain Separate Environments
38(1)
Invest in Failure Isolation
39(1)
Examine the History of the System
39(1)
Build for Growth
40(1)
Choose Mature Software
40(1)
Select Reliable and Serviceable Hardware
41(1)
Reuse Configurations
42(1)
Exploit External Resources
43(1)
One Problem, One Solution
44(1)
KISS: Keep It Simple
44(3)
Highly Available Data Management
47(28)
Fundamental Truths
47(3)
Disk Hardware and Connectivity Terminology
50(8)
SCSI (Small Computer Systems Interface)
50(2)
Fibrechannel
52(1)
Multihosting
53(1)
Multipathing
53(1)
Disk Array
53(1)
JBOD (Just a Bunch of Disks)
53(1)
Hot-Pluggable Disks
54(1)
Warm-Pluggable Disks
54(1)
Hot Spares
54(1)
Write Cache
54(1)
Storage Area Network (SAN)
54(3)
SCSI versus Fibrechannel
57(1)
RAID Technology
58(12)
RAID Levels
58(1)
Striping
58(1)
Mirroring
59(1)
Combining RAID-0 and RAID-1
60(2)
Hamming Encoding
62(1)
Parity RAID
62(2)
Hardware RAID
64(3)
Disk Arrays
67(1)
Software RAID
68(1)
Logical Volume Management
69(1)
The Right Answer
70(1)
Disk Space and FileSystems
70(4)
What Happens When a LUN Fills up?
71(1)
Managing Disk and Volume Availability
72(1)
File System Recovery
73(1)
Key Points
74(1)
Redundant Server Design
75(22)
Server Failures and Failover
75(2)
Logical, Application-Centric Thinking
77(2)
Failover Requirements
79(1)
Servers
80(3)
Failing Over between Incompatible Servers
81(2)
Networks
83(10)
Heartbeat Networks
83(2)
When the Heartbeat Stops
85(1)
Running Heartbeat Networks
86(1)
Public Networks
87(1)
Redundant Network Connectivity
88(1)
Moving Network Identities
88(2)
IP Addresses and Names
90(1)
Selecting Logical Hostnames
91(1)
Administrative Networks
92(1)
Disks
93(3)
Private Disks
93(1)
Shared Disks
94(1)
Placing Critical Applications on Disks
95(1)
Key Points
96(1)
Failover Management
97(10)
Component Monitoring
97(3)
When Component Tests Fail
99(1)
Time to Manual Failover
100(2)
Homegrown Failover Software versus Commercial Software
102(1)
Commercial Failover Management Software
103(2)
Key Points
105(2)
Failover Configurations and Issues
107(30)
Two-Node Failover Configurations
107(9)
Asymmetric 1-to-1 Configuration
108(1)
How Can I Use the Standby Server?
109(3)
Symmetric 1-to-1 Failover
112(2)
Symmetric or Asymmetric?
114(1)
Service Level Failover
115(1)
More Complex Failover Configurations
116(4)
N-to-1 Asymmetric
117(1)
N Host, Networked
118(2)
Offbeat Failover Configurations
120(4)
N-to-1 Symmetric
121(1)
1-to-N (Spray) Asymmetric
121(1)
Round-Robin Symmetric
122(2)
When Good Failovers Go Bad
124(5)
Split-Brain Syndrome
124(1)
Causes and Remedies of Split-Brain Syndrome
125(3)
Undesirable Failovers
128(1)
Verification and Testing
129(3)
State Transition Diagrams
129(2)
Testing the Works
131(1)
Managing Failovers
132(3)
System Monitoring
132(1)
Consoles
133(1)
Utilities
134(1)
Time Matters
135(1)
Key Points
135(2)
Redundant Network Services
137(30)
Network Failure Taxonomy
138(9)
Network Reliability Challenges
138(2)
Network Failure Modes
140(1)
Physical Device Failures
141(1)
IP Level Failures
142(1)
IP Address Configuration
142(1)
Routing Information
143(1)
Congestion-Induced Failures
144(1)
Network Traffic Congestion
144(2)
Design and Operations Guidelines
146(1)
Building Redundant Networks
147(12)
Virtual IP Addresses
148(1)
Redundant Network Connections
149(1)
Redundant Network Attach
150(1)
Multiple Network Attach
150(2)
Interface Trunking
152(1)
Configuring Multiple Networks
153(3)
IP Routing Redundancy
156(3)
Choosing the Failover Mechanism
159(1)
Network Service Reliability
159(7)
Network Service Dependencies
160(4)
Hardening Core Services
164(1)
Denial-of-Service Attacks
165(1)
Key Points
166(1)
Data Service Reliability
167(22)
Network FileSystem Services
168(7)
Detecting RPC Failures
168(2)
NFS Server Constraints
170(1)
Inside an NFS Failover
170(1)
Optimizing NFS Recovery
171(1)
File Locking
172(2)
Stale File Handles
174(1)
Database Servers
175(8)
Managing Recovery Time
176(1)
Database Probes
176(1)
Database Restarts
177(2)
Client Reconnection
179(1)
Surviving Corruption
180(1)
Unsafe at Any (High) Speed
180(1)
Transaction Size and Checkpointing
181(1)
Parallel Databases
181(2)
Web Servers
183(5)
Availability Constraints
183(1)
Web Server Farms
184(1)
High-Availability Pairs
184(1)
Round-Robin DNS
185(1)
IP Redirection
186(1)
Deep or Wide?
187(1)
Key Points
188(1)
Replication Techniques
189(24)
What is Replication?
190(3)
Replication Applications
190(2)
Overview of Replication Techniques
192(1)
Filesytem Replication
193(6)
Archive Distribution
194(2)
Distribution Utilities
196(1)
File Replication with Finesse
197(1)
Software Distribution
198(1)
Database Replication
199(8)
Log Replay
200(1)
Database Replication Managers
201(1)
To Block Copy or Not?
202(1)
Transaction Processing Monitors
203(1)
Queuing Systems
204(3)
Process Replication
207(4)
Redundant Service Processes
207(2)
Process State Multicast
209(1)
Checkpointing
210(1)
Key Points
211(2)
Application Recovery
213(24)
Application Recovery Overview
214(4)
Application Failure Modes
214(1)
Application Recovery Techniques
215(2)
Kinder, Gentler Failures
217(1)
Tolerating Data Service Failures
218(5)
File Server Client Recovery
218(1)
NFS Soft Mounts
219(1)
Automounter Tricks
220(1)
Database Application Recovery
221(1)
Web Client Recovery
222(1)
Application Recovery from System Failures
223(5)
Virtual Memory Exhaustion
224(1)
I/O Errors
225(1)
Network Connectivity
226(1)
Restarting Network Services
227(1)
Internal Application Failures
228(2)
Memory Access Faults
228(1)
Memory Corruption and Recovery
229(1)
Hanging Processes
230(1)
Developer Hygiene
230(5)
Return Value Checks
231(1)
Boundary Condition Checks
232(1)
Value-Based Security
233(1)
Logging Support
234(1)
Assume Nothing, Manage Everything
235(1)
Key Points
236(1)
Backups and Restores
237(32)
The Basic Rules for Backups
237(2)
Backup Software
239(4)
Commercial or Homegrown?
239(1)
Examples of Commercial Backup Software
240(1)
Commercial Backup Software Features
241(2)
Backup Performance
243(9)
Improving Backup Performance: Find the Bottleneck
243(5)
Solving for Performance
248(4)
Backup Styles
252(3)
Incremental Backups of Databases
254(1)
Backup Windows
255(7)
Hot Backups
255(2)
Have Less Data, Save More Time (and Space)
257(1)
Hierarchical Storage Management
257(1)
Archives
258(1)
Use More Hardware
258(1)
Off-Host Backups
259(1)
Third-Mirror Breakoff
259(1)
Backing Up Directly to Disk
260(1)
Sophisticated Software Features
261(1)
Copy-on-Write Snapshots
261(1)
Multiplexed Backups
261(1)
Fast and Flash Backup
262(1)
Handling Backup Tapes and Data
262(2)
General Backup Security
264(1)
Restores
264(2)
Disk Space Requirements for Restores
265(1)
Summary
266(1)
Key Points
267(2)
System Operations
269(28)
System Management and Modifications
269(5)
Maintenance Plans and Processes
270(1)
System Modifications
270(2)
Software Patches
272(1)
Spare Parts Policies
273(1)
Preventative Maintenance
274(1)
Environmental and Physical Issues
274(8)
Data Centers
275(2)
Data Center Racks
277(2)
Electricity
279(2)
Cooling and Environmental Issues
281(1)
Vendor Management
282(5)
Choosing Key Vendors
282(2)
Working with Your Vendors
284(1)
The Vendor's Role in System Recovery
285(1)
Hardware Service
285(1)
Software Service
286(1)
Escalation
286(1)
Vendor Integration
286(1)
Vendor Consulting Services
286(1)
People and Processes
287(8)
Security
287(2)
Data Center Security
289(1)
Documentation
289(2)
System Administrators
291(3)
Internal Escalation
294(1)
Trouble Ticketing
295(1)
Key Points
295(2)
Disaster Recovery
297(20)
Disaster Recovery or High Availability?
298(2)
Local Failover
298(1)
Disaster Recovery
299(1)
Do You NEED Disaster Recovery?
300(1)
Choosing Your Disaster
301(3)
Populating the DR Site
304(6)
What Actually Goes to the DR Site?
304(1)
Filling DR Disks
304(3)
Once the Data Makes It to the Remote Site
307(1)
Prioritization
307(1)
Rerouting Telecommunications
308(1)
Starting the Applications
309(1)
Application Licensing
310(1)
Accessing and Using the DR Site
310(1)
Personnel Issues
311(2)
Other Issues
313(1)
Testing the Whole Thing
314(1)
Whoops!
315(1)
Key Points
315(2)
Parting Shot
317(4)
How We Got Here
317(1)
Where We're Going
318(3)
Appendix A: Glossary 321(8)
Appendix B: A List of URLs 329(2)
Index 331

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program