Managing Datasets and Models

  • 3h 36m
  • Oswald Campesato
  • Mercury Learning
  • 2023

This book contains a fast-paced introduction to data-related tasks in preparation for training models ondatasets. It presents a step-by-step, Python-based code sample that uses the kNN algorithm to manage a model on a dataset.

Chapter One begins with an introduction to datasets and issues that can arise, followed by Chapter Two on outliers and anomaly detection. The next chapter explores ways for handling missing data and invalid data, and Chapter Four demonstrates how to train models with classification algorithms. Chapter 5 introduces visualization toolkits, such as Sweetviz, Skimpy, Matplotlib, and Seaborn, along with some simple Python-based code samples that render charts and graphs. An appendix includes some basics on using awk. Companion files with code, datasets, and figures are available for downloading.

FEATURES:

  • Covers extensive topics related to cleaning datasets and working with models
  • Includes Python-based code samples and a separate chapter on Matplotlib and Seaborn
  • Features companion files with source code, datasets, and figures from the book

About the Author

Oswald Campesato is an education junkie: a former PhD Candidate in Mathematics (ABD) who has multiple MS and BS degrees. He has written 20 technical books for Mobile and Web development. In a previous career he worked in South America, Italy, and the French Riviera, which enabled him to travel to 70 countries throughout the world.

He has worked in American and Japanese corporations and various start-ups, with roles ranging from C/C++ and Java developer to CTO. He's comfortable in four languages and aspires to become proficient in Japanese some time in the 21st century.

Currently he provides training for Deep Learning, and also teaches graduate-level courses in Deep Learning/TensorFlow and Machine Learning, and he's also working on an introductory book for TensorFlow 2.0 and another book about Keras.

In this Book

  • Working with Data
  • Outlier and Anomaly Detection
  • Cleaning Datasets
  • Working with Models
  • Matplotlib and Seaborn