Statistical Data Analytics: Foundations for Data Mining, Informatics, and Knowledge Discovery

  • 10h 39m
  • Walter W. Piegorsch
  • John Wiley & Sons (UK)
  • 2015

Applications of data mining and ‘big data’ increasingly take center stage in our modern, knowledge-driven society, supported by advances in computing power, automated data acquisition, social media development and interactive, linkable internet software. This book presents a coherent, technical introduction to modern statistical learning and analytics, starting from the core foundations of statistics and probability. It includes an overview of probability and statistical distributions, basics of data manipulation and visualization, and the central components of standard statistical inferences. The majority of the text extends beyond these introductory topics, however, to supervised learning in linear regression, generalized linear models, and classification analytics. Finally, unsupervised learning via dimension reduction, cluster analysis, and market basket analysis are introduced.

Extensive examples using actual data (with sample R programming code) are provided, illustrating diverse informatic sources in genomics, biomedicine, ecological remote sensing, astronomy, socioeconomics, marketing, advertising and finance, among many others.

Statistical Data Analytics:

  • Focuses on methods critically used in data mining and statistical informatics. Coherently describes the methods at an introductory level, with extensions to selected intermediate and advanced techniques.
  • Provides informative, technical details for the highlighted methods.
  • Employs the open-source R language as the computational vehicle – along with its burgeoning collection of online packages – to illustrate many of the analyses contained in the book.
  • Concludes each chapter with a range of interesting and challenging homework exercises using actual data from a variety of informatic application areas.

This book will appeal as a classroom or training text to intermediate and advanced undergraduates, and to beginning graduate students, with sufficient background in calculus and matrix algebra. It will also serve as a source-book on the foundations of statistical informatics and data analytics to practitioners who regularly apply statistical learning to their modern data.

About the Author

Walter W. Piegorsch is a Professor of Mathematics at the University of Arizona and the Director of Statistical Research & Education at its BIO5 Institute for Collaborative Bioresearch. Professor Piegorsch is an experienced and highly regarded author and editor. He has co-authored one previous book for Wiley, and is a founding and current co-Editor for Wiley's StatsRef: Statistics Reference Online, a comprehensive online reference resource which covers the fundamentals and applications of statistical theory, methods, and practice. He has also been on the editorial board of many scientific journals, and served as joint-Editor of the Journal of the American Statistical Association (Theory and Methods Section).

Over the course of a long and distinguished academic career Professor Piegorsch has taught and developed a number of courses in statistics and quantitative literacy, and he is in an ideal position to write this technical introduction to the use and application of statistical methods for informatics, statistical learning, and data mining.

In this Book

  • Data Analytics and Data Mining
  • Basic Probability and Statistical Distributions
  • Data Manipulation
  • Data Visualization and Statistical Graphics
  • Statistical Inference
  • Techniques for Supervised Learning: Simple Linear Regression
  • Techniques for Supervised Learning: Multiple Linear Regression
  • Supervised Learning: Generalized Linear Models
  • Supervised Learning: Classification
  • Techniques for Unsupervised Learning: Dimension Reduction
  • Techniques for Unsupervised Learning: Clustering and Association