Course Details

Previous Page

MLlib, GraphX, and R

Target Audience
Expected Duration
Lesson Objectives
Course Number

MLlib is Spark's machine learning library. GraphX is Spark's API for graphs and graph-parallel computation. SparkR exposes the API and allows users to run jobs from the R shell on a cluster. In this course, you will learn how to work with each of these libraries.

Target Audience
Programmers and developers familiar with Apache Spark who wish to expand their skill sets


Expected Duration (hours)

Lesson Objectives

MLlib, GraphX, and R

  • start the course
  • describe data types
  • recall the basic statistics
  • describe linear SVMs
  • perform logistic regression
  • use na├»ve bayes
  • create decision trees
  • use collaborative filtering with ALS
  • perform clustering with K-means
  • perform clustering with LDA
  • perform analysis with frequent pattern mining
  • describe the property graph
  • describe the graph operators
  • perform analytics with neighborhood aggregation
  • perform messaging with Pregel API
  • build graphs
  • describe vertex and edge RDDs
  • optimize representation through partitioning
  • measure vertices with PageRank
  • install SparkR
  • run SparkR
  • use existing R packages
  • expose RDDs as distributed lists
  • convert existing RDDs into DataFrames
  • read and write parquet files
  • run SparkR on a cluster
  • use the algorithms and utilities in MLlib
  • Course Number: