Practical Machine Learning with Spark: Uncover Apache Spark's Scalable Performance with High-Quality Algorithms Across NLP, Computer Vision and ML

  • 5h 5m
  • Dr. Inder Singh Gupta, Dr. Manish Gupta, Mr. Gourav Gupta
  • BPB Publications
  • 2022

This book provides the reader with an up-to-date explanation of Machine Learning and an in-depth, comprehensive, and straightforward understanding of the architectural techniques used to evaluate and anticipate the futuristic insights of data using Apache Spark.

The book walks readers by setting up Hadoop and Spark installations on-premises, Docker, and AWS. Readers will learn about Spark MLib and how to utilize it in supervised and unsupervised machine learning scenarios. With the help of Spark, some of the most prominent technologies, such as natural language processing and computer vision, are evaluated and demonstrated in a realistic setting. Using the capabilities of Apache Spark, this book discusses the fundamental components that underlie each of these natural language processing, computer vision, and machine learning technologies, as well as how you can incorporate these technologies into your business processes.

Towards the end of the book, readers will learn about several deep learning frameworks, such as TensorFlow and PyTorch. Readers will also learn to execute distributed processing of deep learning problems using the Spark programming language.


  • In-depth practical demonstration of ML/DL concepts using Distributed Framework.
  • Covers graphical illustrations and visual explanations for ML/DL pipelines.
  • Includes live codebase for each of NLP, computer vision and machine learning applications.


Explore the cosmic secrets of Distributed Processing for Deep Learning applications.


  • Learn how to get started with machine learning projects using Spark.
  • Witness how to use Spark MLib's design for machine learning and deep learning operations.
  • Use Spark in tasks involving NLP, unsupervised learning, and computer vision.
  • Experiment with Spark in a cloud environment and with AI pipeline workflows.
  • Run deep learning applications on a distributed network.


This book is valuable for data engineers, machine learning engineers, data scientists, data architects, business analysts, and technical consultants worldwide. It would be beneficial to have some familiarity with the fundamentals of Hadoop and Python.

About the Author

Mr. Gourav Gupta is a Data specialist having 5+ years of experience in Big Data, Artificial Intelligence, Deep Learning, Augment Intelligence, Internet of Things and Digital Twin. Mr. Gourav has worked on several interdisciplinary real time projects which are the conglomerations of Digital Technologies. His expertise is on architectural optimization and technical solutioning on Big Data, AI, Computer Vision, and Internet of Things. He also loves to write research articles and serves as a reviewer with Springer Journal.

In this Book

  • Preface
  • Introduction to Machine Learning
  • Apache Spark Environment Setup and Configuration
  • Apache Spark
  • Apache Spark MLlib
  • Supervised Learning with Spark
  • Un-Supervised Learning with Apache Spark
  • Natural Language Processing with Apache Spark
  • Recommendation Engine with Spark
  • Deep Learning with Spark
  • Computer Vision with Apache Spark