This book’s straightforward, step-by-step approach shows you how to deploy, program, optimize, manage, integrate, and extend Spark–now, and for years to come. You’ll discover how to create powerful solutions encompassing cloud computing, real-time stream processing, machine learning, and more. Every lesson builds on what you’ve already learned, giving you a rock-solid foundation for real-world success.
Whether you are a data analyst, data engineer, data scientist, or data steward, learning Spark will help you to advance your career or embark on a new career in the booming area of Big Data.
Learn how to
• Discover what Apache Spark does and how it fits into the Big Data landscape
• Deploy and run Spark locally or in the cloud
• Interact with Spark from the shell
• Make the most of the Spark Cluster Architecture
• Develop Spark applications with Scala and functional Python
• Program with the Spark API, including transformations and actions
• Apply practical data engineering/analysis approaches designed for Spark
• Use Resilient Distributed Datasets (RDDs) for caching, persistence, and output
• Optimize Spark solution performance
• Use Spark with SQL (via Spark SQL) and with NoSQL (via Cassandra)
• Leverage cutting-edge functional programming techniques
• Extend Spark with streaming, R, and Sparkling Water
• Start building Spark-based machine learning and graph-processing applications
• Explore advanced messaging technologies, including Kafka
• Preview and prepare for Spark’s next generation of innovations
Instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Spark to solve a wide spectrum of Big Data problems.