Foundations of Data Intensive Applications Large Scale Data Analytics under the Hood

by ;
  • ISBN13:


  • ISBN10:


  • Edition: 1st
  • Format: Paperback
  • Copyright: 2021-07-21
  • Publisher: Wiley
  • Purchase Benefits
List Price: $55.00 Save up to $2.75
  • Buy New
    Add to Cart Free Shipping Icon Free Shipping


Supplemental Materials

What is included with this book?


There is an ever increasing need to store this data, process them and incorporate the knowledge into everyday business operations of the companies. Before big data systems. there were high performance systems designed to do large calculations. Around the time big data became popular, high performance computing systems were mature enough to support the scientific community. But they weren’t ready for the enterprise needs of data analytics. Because of the lack of system support for big data systems at that time, there was a large number of systems created to store and process data. These systems were created according to different design principles and some of them thrived through the years while some didn’t succeed. Because of the diverse nature of systems and tools available for data analytics, there is a need to understand these systems and their applications from a theoretical perspective. These systems are masking the user from underlying details, and they use them without knowing how they work. This works for simple applications but when developing more complex applications that need to scale, users find themselves without the required foundational knowledge to reason about the issues. This knowledge is currently hidden in the systems and research papers.

 The underlying principles behind data processing systems originate from the parallel and distributed computing paradigms. Among the many systems and APIs for data processing, they use the same fundamental ideas under the hood with slightly different variations. We can breakdown data analytics systems according to these principles and study them to understand the inner workings of applications.

This book defines these foundational components of large scale, distributed data processing systems and go into details independently of specific frameworks. It draws examples of current systems to explain how these principles are used in practice. Major design decisions around these foundational components define the performance, type of applications supported and usability. One of the goals of the book is to explain these differences so that readers can take informed decisions when developing applications. Further it will help readers to acquire in-depth knowledge and recognize problems in their applications such as performance issues, distributed operation issues, and fault tolerance aspects.

This book aims to use state of the art research when appropriate to discuss some ideas and future of data analytics tools.

Table of Contents

Chapter 1: Introduction
Chapter 2: Large Data
Chapter 3: Going Distributed
Chapter 4: Distributing Applications
Chapter 5: Messaging is the Key
Chapter 6: CPUs or GPUs
Chapter 7: In Memory Data Structures
Chapter 8: Programming Abstractions
Chapter 9: Handling Faults
Chapter 10: Performance and Productivity

Rewards Program

Write a Review