did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

did-you-know? rent-now

Amazon no longer offers textbook rentals. We do!

We're the #1 textbook rental company. Let us show you why.

9780470604656

Design for Reliability Information and Computer-Based Systems

by
  • ISBN13:

    9780470604656

  • ISBN10:

    0470604654

  • Edition: 1st
  • Format: Hardcover
  • Copyright: 2010-10-04
  • Publisher: Wiley-IEEE Press
  • Purchase Benefits
  • Free Shipping Icon Free Shipping On Orders Over $35!
    Your order must be $35 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • eCampus.com Logo Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $146.08 Save up to $0.73
  • Buy New
    $145.35
    Add to Cart Free Shipping Icon Free Shipping

    PRINT ON DEMAND: 2-4 WEEKS. THIS ITEM CANNOT BE CANCELLED OR RETURNED.

Supplemental Materials

What is included with this book?

Summary

System reliability, availability and robustness are often not well understood by system architects, engineers and developers. They often don't understand what drives customer's availability expectations, how to frame verifiable availability/robustness requirements, how to manage and budget availability/robustness, how to methodically architect and design systems that meet robustness requirements, and so on. The book takes a very pragmatic approach of framing reliability and robustness as a functional aspect of a system so that architects, designers, developers and testers can address it as a concrete, functional attribute of a system, rather than an abstract, non-functional notion.

Author Biography

Eric Bauer is Reliability Engineering Manager in the Wireline Division of Alcatel-Lucent. After two decades of software development experience, he joined the Lucent reliability team to lead a reliability group, and has since worked reliability engineering on a variety of wireless and wireline products and solutions. Mr. Bauer currently focuses on increasing the reliability of Alcatel-Lucent's IP Multimedia Subsystem (IMS) solution and the network elements that comprise the IMS solution. He has been awarded twelve U.S. patents, coauthored Practical System Reliability (Wiley), and has published several papers in the Bell Labs Technical Journal.

Table of Contents

Figuresp. xiii
Tablesp. xv
Prefacep. xvii
Acknowledgmentsp. xxi
Reliability Basics
Reliability and Availability Conceptsp. 3
Reliability and Availabilityp. 3
Faults, Errors, and Failuresp. 5
Error Severityp. 6
Failure Recoveryp. 7
Highly Available Systemsp. 9
Quantifying Availabilityp. 12
Outage Attributabilityp. 14
Hardware Reliabilityp. 16
Software Reliabilityp. 22
Problemsp. 28
For Further Studyp. 29
System Basicsp. 31
Hardware and Softwarep. 31
External Entitiesp. 35
System Managementp. 37
System Outagesp. 43
Service Qualityp. 47
Total Cost of Ownershipp. 49
Problemsp. 56
What Can Go Wrongp. 57
Failures in the Real Worldp. 57
Eight-Ingredient Frameworkp. 59
Mapping Ingredients to Error Categoriesp. 63
Applying Error Categoriesp. 66
Error Category: Field-Replaceable Unit (FRU) Hardwarep. 68
Error Category: Programming Errorsp. 70
Error Category: Data Errorp. 71
Error Category: Redundancyp. 73
Error Category: System Powerp. 74
Error Category: Networkp. 75
Error Category: Application Protocolp. 76
Error Category: Proceduresp. 77
Summaryp. 79
Problemsp. 80
For Further Studyp. 80
Reliability Concepts
Failure Containment and Redundancyp. 85
Units of Designp. 85
Failure Recovery Groupsp. 91
Redundancyp. 92
Summaryp. 96
Problemsp. 97
For Further Studyp. 97
Robust Design Principlesp. 99
Robust Design Principlesp. 99
Robust Protocolsp. 101
Robust Concurrency Controlsp. 103
Overload Controlp. 103
Process, Resource, and Throughput Monitoringp. 108
Data Auditingp. 109
Fault Correlationp. 110
Failed Error Detection, Isolation, or Recoveryp. 111
Geographic Redundancyp. 112
Security, Availability, and System Robustnessp. 114
Procedural Considerationsp. 119
Problemsp. 130
For Further Studyp. 130
Error Detectionp. 131
Detecting Field-Replaceable Unit (FRU) Hardware Faultsp. 131
Detecting Programming and Data Faultsp. 132
Detecting Redundancy Failuresp. 134
Detecting Power Failuresp. 139
Detecting Networking Failuresp. 141
Detecting Application Protocol Failuresp. 142
Detecting Procedural Failuresp. 144
Problemsp. 144
For Further Studyp. 144
Analyzing and Modeling Reliability and Robustnessp. 145
Reliability Block Diagramsp. 145
Qualitative Model of Redundancyp. 147
Failure Mode and Effects Analysisp. 149
Availability Modelingp. 151
Planned Downtimep. 165
Problemsp. 168
For Further Studyp. 168
Design for Reliability
Reliability Requirementsp. 171
Backgroundp. 171
Defining Service Outagesp. 172
Service Availability Requirementsp. 175
Detailed Service Availability Requirementsp. 177
Service Reliability Requirementsp. 180
Triangulating Reliability Requirementsp. 181
Problemsp. 182
Reliability Analysisp. 185
Step 1: Enumerate Recoverable Modulesp. 186
Step 2: Construct Reliability Block Diagramsp. 191
Step 3: Characterize Impact of Recoveryp. 193
Step 4: Characterize Impact of Proceduresp. 198
Step 5: Audit Adequacy of Automatic Failure Detection and Recoveryp. 200
Step 6: Consider Failures of Robustness Mechanismsp. 201
Step 7: Prioritizing Gapsp. 202
Reliability of Sourced Modules and Componentsp. 202
Problemsp. 206
Reliability Budgeting and Modelingp. 207
Downtime Categoriesp. 208
Service Downtime Budgetp. 209
Availability Modelingp. 212
Update Downtime Budgetp. 213
Robustness Latency Budgetsp. 215
Problemsp. 218
Robustness and Stability Testingp. 219
Robustness Testingp. 219
Context of Robustness Testingp. 220
Factoring Robustness Testingp. 221
Robustness Testing in the Development Processp. 222
Robustness Testing Techniquesp. 223
Selecting Robustness Test Casesp. 232
Analyzing Robustness Test Resultsp. 233
Stability Testingp. 234
Release Criteriap. 240
Problemsp. 243
Closing the Loopp. 245
Analyzing Field Outage Eventsp. 245
Reliability Roadmappingp. 255
Problemsp. 260
Design for Reliability Case Studyp. 263
System Contextp. 263
System Reliability Requirementsp. 268
Reliability Analysisp. 270
Downtime Budgetingp. 283
Availability Modelingp. 284
Reliability Roadmapp. 286
Robustness Testingp. 287
Stability Testingp. 289
Reliability Reviewp. 290
Reliability Reportp. 291
Release Criteriap. 292
Field Data Analysisp. 293
Conclusionp. 295
Overview of Design for Reliabilityp. 295
Concluding Remarksp. 299
Problemsp. 300
Appendix: Assessing Design for Reliability Diligencep. 301
Assessment Methodologyp. 302
Reliability Requirementsp. 304
Reliability Analysisp. 306
Reliability Modeling and Budgetingp. 307
Robustness Testingp. 308
Stability Testingp. 310
Release Criteriap. 311
Field Availabilityp. 312
Reliability Roadmapp. 313
Hardware Reliabilityp. 313
Abbreviationsp. 315
Referencesp. 317
Photo Creditsp. 319
About the Authorp. 321
Indexp. 323
Table of Contents provided by Ingram. All Rights Reserved.

Supplemental Materials

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Rewards Program