Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Second Edition

7h 54m
Bruce Ratner
CRC Press
2012

The second edition of a bestseller, Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data is still the only book, to date, to distinguish between statistical data mining and machine-learning data mining. The first edition, titled Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data, contained 17 chapters of innovative and practical statistical data mining techniques. In this second edition, renamed to reflect the increased coverage of machine-learning data mining techniques, the author has completely revised, reorganized, and repositioned the original chapters and produced 14 new chapters of creative and useful machine-learning data mining techniques. In sum, the 31 chapters of simple yet insightful quantitative techniques make this book unique in the field of data mining literature.

The statistical data mining methods effectively consider big data for identifying structures (variables) with the appropriate predictive power in order to yield reliable and robust large-scale statistical models and analyses. In contrast, the author's own GenIQ Model provides machine-learning solutions to common and virtually unapproachable statistical problems. GenIQ makes this possible — its utilitarian data mining features start where statistical data mining stops.

This book contains essays offering detailed background, discussion, and illustration of specific methods for solving the most commonly experienced problems in predictive modeling and analysis of big data. They address each methodology and assign its application to a specific type of problem. To better ground readers, the book provides an in-depth discussion of the basic methodologies of predictive modeling and analysis. While this type of overview has been attempted before, this approach offers a truly nitty-gritty, step-by-step method that both tyros and experts in the field can enjoy playing with.

About the Author

Bruce Ratner, PhD, The Significant Statistician, is president and founder of DM STAT-1 Consulting, the ensample for statistical modeling, analysis and data mining, and machine-learning data mining in the DM Space. DM STAT-1 specializes in all standard statistical techniques and methods using machine-learning/statistics algorithms, such as its patented GenIQ Model, to achieve its clients' goals, across industries including direct and database marketing, banking, insurance, finance, retail, telecommunications, health care, pharmaceutical, publication and circulation, mass and direct advertising, catalog marketing, e-commerce, Web mining, B2B (business to business), human capital management, risk management, and nonprofit fund-raising.

Bruce's par excellence consulting expertise is apparent, as he is the author of the best-selling book Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data. Bruce ensures his clients' marketing decision problems are solved with the optimal problem solution methodology and rapid startup and timely delivery of project results. Client projects are executed with the highest level of statistical practice. He is an often-invited speaker at public industry events, such as the SAS Data Mining Conference, and private seminars at the request of Fortune magazine's top 100 companies.

Bruce has his footprint in the predictive analytics community as a frequent speaker at industry conferences and as the instructor of the advanced statistics course sponsored by the Direct Marketing Association for over a decade. He is the author of over 100 peer-reviewed articles on statistical and machine-learning procedures and software tools. He is a coauthor of the popular textbook the New Direct Marketing and is on the editorial board of the Journal of Database Marketing.

Bruce is also active in the online data mining industry. He is a frequent contributor to KDNuggets Publications, the top resource of the data mining community. His articles on statistical and machine-learning methodologies draw a huge monthly following. Another online venue in which he participates is the professional network LinkedIN. His seminal articles posted on LinkedIN, covering statistical and machine-learning procedures for big data, have sparked countless rich discussions. In addition, he is the author of his own DM STAT-1 Newsletter on the Web.

Bruce holds a doctorate in mathematics and statistics, with a concentration in multivariate statistics and response model simulation. His research interests include developing hybrid modeling techniques, which combine traditional statistics and machine-learning methods. He holds a patent for a unique application in solving the two-group classification problem with genetic programming.

In this Book

Introduction
Two Basic Data Mining Methods for Variable Assessment
CHAID-Based Data Mining for Paired-Variable Assessment
The Importance of Straight Data—Simplicity and Desirability for Good Model-Building Practice
Symmetrizing Ranked Data—A Statistical Data Mining Method for Improving the Predictive Power of Data
Principal Component Analysis—A Statistical Data Mining Method for Many-Variable Assessment
The Correlation Coefficient—Its Values Range between Plus/Minus 1, or Do They?
Logistic Regression—The Workhorse of Response Modeling
Ordinary Regression—The Workhorse of Profit Modeling
Variable Selection Methods in Regression—Ignorable Problem, Notable Solution
CHAID for Interpreting a Logistic Regression Model
The Importance of the Regression Coefficient
The Average Correlation—A Statistical Data Mining Measure for Assessment of Competing Predictive Models and the Importance of the Predictor Variables
CHAID for Specifying a Model with Interaction Variables
Market Segmentation Classification Modeling with Logistic Regression
CHAID as a Method for Filling in Missing Values
Identifying Your Best Customers—Descriptive, Predictive, and Look-Alike Profiling
Assessment of Marketing Models
Bootstrapping in Marketing—A New Approach for Validating Models
Validating the Logistic Regression Model—Try Bootstrapping
Visualization of Marketing Models[*] Data Mining to Uncover Innards of a Model
The Predictive Contribution Coefficient—A Measure of Predictive Importance
Regression Modeling Involves Art, Science, and Poetry, Too
Genetic and Statistic Regression Models—A Comparison
Data Reuse—A Powerful Data Mining Effect of the GenIQ Model
A Data Mining Method for Moderating Outliers Instead of Discarding Them
Overfitting—Old Problem, New Solution
The Importance of Straight Data—Revisited
The GenIQ Model—Its Definition and an Application
Finding the Best Variables for Marketing Models
Interpretation of Coefficient-Free Models

FREE ACCESS

Book Delivering Machine Learning Projects: From Design to Deployment

Course Text Mining and Analytics: Machine Learning for Natural Language Processing

(14)

Course CompTIA Data+: Data Analytics Tools

(32)

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE

Course Big Data Concepts: Getting to Know Big Data

(445)

Book Official Google Cloud Certified Professional Data Engineer Study Guide

Book Machine Learning: Algorithms and Applications

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills & Salary Report

ESG Impact Report

Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data, Second Edition

In this Book

YOU MIGHT ALSO LIKE

PEOPLE WHO VIEWED THIS ALSO VIEWED THESE