Data Mining for Business Analytics: Concepts, Techniques and Applications in Python

  • 9h 58m
  • Galit Shmueli, Nitin R. Patel, Peter C. Bruce, Peter Gedeck
  • John Wiley & Sons (US)
  • 2020

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python presents an applied approach to data mining concepts and methods, using Python software for illustration

Readers will learn how to implement a variety of popular data mining algorithms in Python (a free and open-source software) to tackle business problems and opportunities.

This is the sixth version of this successful text, and the first using Python. It covers both statistical and machine learning algorithms for prediction, classification, visualization, dimension reduction, recommender systems, clustering, text mining and network analysis. It also includes:

  • A new co-author, Peter Gedeck, who brings both experience teaching business analytics courses using Python, and expertise in the application of machine learning methods to the drug-discovery process
  • A new section on ethical issues in data mining
  • Updates and new material based on feedback from instructors teaching MBA, undergraduate, diploma and executive courses, and from their students
  • More than a dozen case studies demonstrating applications for the data mining techniques described
  • End-of-chapter exercises that help readers gauge and expand their comprehension and competency of the material presented
  • A companion website with more than two dozen data sets, and instructor materials including exercise solutions, PowerPoint slides, and case solutions

Data Mining for Business Analytics: Concepts, Techniques, and Applications in Python is an ideal textbook for graduate and upper-undergraduate level courses in data mining, predictive analytics, and business analytics. This new edition is also an excellent reference for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology.

About the Authors

GALIT SHMUELI, PHD, is Distinguished Professor at National Tsing Hua University's Institute of Service Science. She has designed and instructed data mining courses since 2004 at University of Maryland,, Indian School of Business, and National Tsing Hua University, Taiwan. Professor Shmueli is known for her research and teaching in business analytics, with a focus on statistical and data mining methods in information systems and healthcare. She has authored over 100 publications including books.

PETER C. BRUCE is President and Founder of the Institute for Statistics Education at He has written multiple journal articles and is the developer of Resampling Stats software. He is the author of Introductory Statistics and Analytics: A Resampling Perspective (Wiley) and co-author of Practical Statistics for Data Scientists: 50 Essential Concepts (O'Reilly).

PETER GEDECK, PHD, is a Senior Data Scientist at Collaborative Drug Discovery, where he helps develop cloud-based software to manage the huge amount of data involved in the drug discovery process. He also teaches data mining at

NITIN R. PATEL, PhD, is cofounder and board member of Cytel Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology and at Harvard University. He is a Fellow of the Computer Society of India and was a professor at the Indian Institute of Management, Ahmedabad, for 15 years.

In this Book

  • Foreword by Gareth James
  • Foreword by Ravi Bapna
  • Introduction
  • Overview of the Data Mining Process
  • Data Visualization
  • Dimension Reduction
  • Evaluating Predictive Performance
  • Multiple Linear Regression
  • k-Nearest Neighbors (k-NN)
  • The Naive Bayes Classifier
  • Classification and Regression Trees
  • Logistic Regression
  • Neural Nets
  • Discriminant Analysis
  • Combining Methods—Ensembles and Uplift Modeling
  • Association Rules and Collaborative Filtering
  • Cluster Analysis
  • Handling Time Series
  • Regression-Based Forecasting
  • Smoothing Methods
  • Social Network Analytics
  • Text Mining
  • Cases
  • References
  • Data Files Used in the Book
  • Python Utilities Functions