Mining of Massive Datasets, Second Edition

10h 56m
Anand Rajaraman, Jeffrey David Ullman, Jure Leskovec
Cambridge University Press
2014

Written by leading authorities in database and Web technologies, this book is essential reading for students and practitioners alike. The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. Other chapters cover the PageRank idea and related tricks for organizing the Web, the problems of finding frequent itemsets and clustering. This second edition includes new and extended coverage on social networks, machine learning and dimensionality reduction.

Contains brand new material and extended coverage of important topics
Includes a range of over 150 exercises to challenge even the most able student

About the Authors

Jure Leskovec is Assistant Professor of Computer Science at Stanford University. His research focuses on mining large social and information networks. Problems he investigates are motivated by large scale data, the Web and on-line media. This research has won several awards including a Microsoft Research Faculty Fellowship, the Alfred P. Sloan Fellowship, Okawa Foundation Fellowship, and numerous best paper awards. His research has also been featured in popular press outlets such as the New York Times, the Wall Street Journal, the Washington Post, MIT Technology Review, NBC, BBC, CBC and Wired. Leskovec has also authored the Stanford Network Analysis Platform, a general purpose network analysis and graph mining library that easily scales to massive networks with hundreds of millions of nodes and billions of edges.

Anand Rajaraman is a serial entrepreneur, venture capitalist, and academic based in Silicon Valley. He is a Founding Partner of two early-stage venture capital firms, Milliways Labs and Cambrian Ventures. His investments include Facebook (one of the earliest angel investors in 2005), Aster Data Systems (acquired by Teradata), Efficient Frontier (acquired by Adobe), Neoteris (acquired by Juniper), Transformic (acquired by Google), and several others. Anand was, until recently, Senior Vice President at Walmart Global eCommerce and co-head of @WalmartLabs, where he worked at the intersection of social, mobile, and commerce. He came to Walmart when Walmart acquired Kosmix, the startup he co-founded, in 2011. Kosmix pioneered semantic search technology and semantic analysis of social media. In 1996, Anand co-founded Junglee, an e-commerce pioneer. As Chief Technology Officer, he played a key role in developing Junglee's award-winning Virtual Database technology. In 1998, Amazon.com acquired Junglee, and Anand helped launch the transformation of Amazon.com from a retailer into a retail platform, enabling third-party retailers to sell on Amazon.com's website. Anand is also a co-inventor of Amazon Mechanical Turk, which pioneered the concepts of crowdsourcing and hybrid Human-Machine computation. As an academic, Anand's research has focused at the intersection of database systems, the World-Wide Web, and social media. His research publications have won several awards at prestigious academic conferences, including two retrospective 10-year Best Paper awards at ACM SIGMOD and VLDB. In 2012, Fast Company magazine named Anand to its list of '100 Most Creative People in Business'. In 2013, he was named a Distinguished Alumnus by his alma mater, IIT Madras.

Jeffrey David Ullman is the Stanford W. Ascherman Professor of Computer Science (Emeritus) and he is currently the CEO of Gradiance. His research interests include database theory, data mining, and education using the information infrastructure. He is one of the founders of the field of database theory, and was the doctoral advisor of an entire generation of students who later became leading database theorists in their own right. He was the Ph.D. advisor of Sergey Brin, one of the co-founders of Google, and served on Google's technical advisory board. Ullman was elected to the National Academy of Engineering in 1989, the American Academy of Arts and Sciences in 2012, and he has held Guggenheim and Einstein Fellowships. Recent awards include the Knuth Prize (2000), and the Sigmod E. F. Codd Innovations award (2006). Ullman is also the co-recipient (with John Hopcroft) of the 2010 IEEE John von Neumann Medal, for 'laying the foundations for the fields of automata and language theory and many seminal contributions to theoretical computer science'.

In this Book

Data Mining
MapReduce and the New Software Stack
Finding Similar Items
Mining Data Streams
Link Analysis
Frequent Itemsets
Clustering
Advertising on the Web
Recommendation Systems
Mining Social-Network Graphs
Dimensionality Reduction
Large-Scale Machine Learning

FREE ACCESS

Course Fundamentals of AI & ML: Advanced Data Science Methods

(136)

Course CompTIA Data+: Data Analytics Tools

(32)

Course CompTIA IT Fundamentals: Data, CPUs, & Troubleshooting

(66)

Get Started

Sharpen your skills. Upgrade your career. Find the right learning path for you, based on your role and skills. Take part in hands-on practice, study for a certification, and much more - all personalized for you.

*Not included: Compliance, Leadership Development Program content, and Engineering books

Your content + our content + our platform = a path to learning success

Using our learning experience platform, Percipio, your learners can engage in custom learning paths that can feature curated content from all sources.

Learn More

Aspire to something bigger

Aspire Journeys are guided learning paths that set you in motion for career success.

Browse Aspire Journeys

Explore a world of live learning with Global Knowledge

Choose from convenient delivery formats to get the training you and your team need - where, when and how you want it.

Browse Live Learning

IT Skills & Salary Report

ESG Impact Report

Mining of Massive Datasets, Second Edition

In this Book

YOU MIGHT ALSO LIKE