Course Details

Previous Page

Data Gathering

Target Audience
Expected Duration
Lesson Objectives
Course Number

To carry out data science, you need to gather data. Extracting, parsing, and scraping data from various sources, both internal and external, is a critical first part in the data science pipeline. In this course, you'll explore examples of practical tools for data gathering.

Target Audience
Individuals with some programming and math experience working toward implementing data science in their everyday work


Expected Duration (hours)

Lesson Objectives

Data Gathering

  • start the course
  • describe problems and software tools associated with data gathering
  • use curl to gather data from the Web
  • use in2csv to convert spreadsheet data to CSV format
  • use agate to extract data from spreadsheets
  • use agate to extract tabular data from dbf files
  • extract data from particular tags in an HTML document
  • distinguish between metadata and data
  • work with metadata in HTTP Headers
  • work with Linux log files
  • work with metadata in email headers
  • perform a secure shell connection to a remote server
  • copy remote data using a secure copy
  • synchronize data from a remote server
  • download an HTML file and explore table data
  • Course Number: