The Practice of Cloud System Administration Designing and Operating Large Distributed Systems, Volume 2

by ; ;
  • ISBN13:


  • ISBN10:


  • Edition: 1st
  • Format: Paperback
  • Copyright: 9/3/2014
  • Publisher: Addison-Wesley Professional
  • Purchase Benefits
  • Free Shipping On Orders Over $59!
    Your order must be $59 or more to qualify for free economy shipping. Bulk sales, PO's, Marketplace items, eBooks and apparel do not qualify for this offer.
  • Get Rewarded for Ordering Your Textbooks! Enroll Now
List Price: $54.99 Save up to $8.25
  • Buy New


Supplemental Materials

What is included with this book?

  • The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.
  • The eBook copy of this book is not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.


“There’s an incredible amount of depth and thinking in the practices described here, and it’s impressive to see it all in one place.”

—Win Treese, coauthor of Designing Systems for Internet Commerce


The Practice of Cloud System Administration, Volume 2, focuses on “distributed” or “cloud” computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach.


Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that are useful to all enterprises. The new companion to the best-selling first volume, The Practice of System and Network Administration, Second Edition, this guide offers expert coverage of the following and many other crucial topics:


Designing and building modern web and distributed systems

  • Fundamentals of large system design
  • Understand the new software engineering implications of cloud administration
  • Make systems that are resilient to failure and grow and scale dynamically
  • Implement DevOps principles and cultural changes
  • IaaS/PaaS/SaaS and virtual platform selection

Operating and running systems using the latest DevOps/SRE strategies

  • Upgrade production systems with zero down-time
  • What and how to automate; how to decide what not to automate
  • On-call best practices that improve uptime
  • Why distributed systems require fundamentally different system administration techniques
  • Identify and resolve resiliency problems before they surprise you

Assessing and evaluating your team’s operational effectiveness

  • Manage the scientific process of continuous improvement
  • A forty-page, pain-free assessment system you can start using today


Author Biography

Thomas A. Limoncelli is an internationally recognized author, speaker, and system administrator with more than twenty years of experience at companies like Google, Bell Labs, and StackExchange.com.


Strata R. Chalup has more than twenty-five years of experience in Silicon Valley focusing on IT strategy, best-practices, and scalable infrastructures at firms including Apple, Sun, Cisco, McAfee, and Palm.


Christina J. Hogan has more than twenty years of experience in system administration and network engineering, from Silicon Valley to Italy and Switzerland. She has a master’s degree in computer science, a doctorate in aeronautical engineering, and has been part of a Formula 1 racing team.

Table of Contents


1. Ideal Environment (What's the Goal?)

2. DevOps Principles and Culture

3. Designing for Operation

4. High Level Design Patterns

5. Elements of Design

6. Scaling Techniques

7. Reliability Through Redundancy

8. Automation + repeatability:

9. Deployment Lifecycle

10. Metrics/ Data Driven Ops

11. Backups and Restores

12. Fire Drills/Game Days

13. Provisioning in the Automated World

14. SWE [Software Engineering] Applications Sysadmins Need:

15. Design Docs (For Big and Small Things)

16. Operational Hygiene

17. Operational Capabilities

Rewards Program

Write a Review