Text Mining with MATLAB, 2nd Edition

  • 8h 27m
  • Rafael E. Banchs
  • Springer
  • 2021

Text Mining with MATLAB® provides a comprehensive introduction to text mining using MATLAB. It is designed to help text mining practitioners, as well as those with little-to-no experience with text mining in general, familiarize themselves with MATLAB and its complex applications.

The book is structured in three main parts: The first part, Fundamentals, introduces basic procedures and methods for manipulating and operating with text within the MATLAB programming environment. The second part of the book, Mathematical Models, is devoted to motivating, introducing, and explaining the two main paradigms of mathematical models most commonly used for representing text data: the statistical and the geometrical approach. Eventually, the third part of the book, Techniques and Applications, addresses general problems in text mining and natural language processing applications such as document categorization, document search, content analysis, summarization, question answering, and conversational systems. This second edition includes updates in line with the recently released “Text Analytics Toolbox” within the MATLAB product and introduces three new chapters and six new sections in existing ones.

All descriptions presented are supported with practical examples that are fully reproducible. Further reading, as well as additional exercises and projects, are proposed at the end of each chapter for those readers interested in conducting further experimentation.

About the Author

Rafael E. Banchs is a senior data science manger with more than 25 years of experience in signal processing, data science and text mining applications. Rafael has a similar number of years of practical experience using the MATLAB® product and have completed multiple projects and developed applications with it. He received a PhD in Electrical Engineering from The University of Texas at Austin in 1998 and has published several papers in peer-reviewed Journals and International Conferences.

In this Book

  • Introduction
  • Handling Text Data
  • Regular Expressions
  • Basic Operations with Strings
  • Reading and Writing Files
  • The Structure of Language
  • Basic Corpus Statistics
  • Statistical Models
  • Geometrical Models
  • Dimensionality Reduction
  • Document Categorization
  • Document Search
  • Content Analysis
  • Keyword Extraction and Summarization
  • Question Answering and Dialogue