Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining

  • 8h 55m
  • Christian Rubba, Dominic Nyhuis, Peter Meißner, Simon Munzert
  • John Wiley & Sons (UK)
  • 2015

A hands on guide to web scraping and text mining for both beginners and experienced users of R

  • Introduces fundamental concepts of the main architecture of the web and databases and covers HTTP, HTML, XML, JSON, SQL.
  • Provides basic techniques to query web documents and data sets (XPath and regular expressions).
  • An extensive set of exercises are presented to guide the reader through each technique.
  • Explores both supervised and unsupervised techniques as well as advanced techniques such as data scraping and text management.
  • Case studies are featured throughout along with examples for each technique presented.

In this Book

  • Introduction
  • HTML
  • XML and JSON
  • XPath
  • HTTP
  • AJAX
  • SQL and Relational Databases
  • Regular Expressions and Essential String Functions
  • Scraping the Web
  • Statistical Text Processing
  • Managing Data Projects
  • Collaboration Networks in the US Senate
  • Parsing Information from Semistructured Documents
  • Predicting the 2014 Academy Awards Using Twitter
  • Mapping the Geographic Distribution of Names
  • Gathering Data on Mobile Phones
  • Analyzing Sentiments of Product Reviews
  • References