Principles of Data Integration

  • 10h 25m
  • Alon Halevy, AnHai Doan, Zachary Ives
  • Elsevier Science and Technology Books, Inc.
  • 2012

How do you approach answering queries when your data is stored in multiple databases that were designed independently by different people? This is first comprehensive book on data integration and is written by three of the most respected experts in the field.

This book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts. Data integration is the problem of answering queries that span multiple data sources (e.g., databases, web pages). Data integration problems surface in multiple contexts, including enterprise information integration, query processing on the Web, coordination between government agencies and collaboration between scientists. In some cases, data integration is the key bottleneck to making progress in a field.

The authors provide a working knowledge of data integration concepts and techniques, giving you the tools you need to develop a complete and concise package of algorithms and applications.

  • Offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand.
  • Enables you to build your own algorithms and implement your own data integration applications

About the Authors

AnHai Doan, Associate Professor of Computer Science at the University of Wisconsin-Madison. His interests cover databases, AI, and Web, with a current focus on data integration, schema and ontology matching, information extraction, text management, social media, crowdsourcing, and human computation. He was on the Advisory Board of Transformic, and was Chief Scientist of Kosmix, a social media startup acquired by Walmart in 2011. Currently he also works as Chief Scientist of WalmartLabs, a newly formed research and development lab at Walmart, devoted to analyzing and integrating social and mobile data for e-commerce.

Alon Halevy, Head of the Structured Data Management Research group at Google. Prior to that, he was a professor of Computer Science at the University of Washington in Seattle, where he founded the database group. In 1999, Dr. Halevy co-founded Nimble Technology, one of the first companies in the Enterprise Information Integration space. In 2004, he founded Transformic Inc., a company that created search engines for the deep web, and was acquired by Google. Dr. Halevy is a Fellow of the Association for Computing Machinery.

Zachary Ives, Associate Professor at the University of Pennsylvania and a Faculty Member of the Penn Center for Bioinformatics. His research interests include data integration and sharing, data-centric computation, sensor networks, and data provenance and authoritativeness. He has been awarded the Christian R. and Mary F. Lindback Foundation Award for Distinguished Teaching.

In this Book

  • Introduction
  • Manipulating Query Expressions
  • Describing Data Sources
  • String Matching
  • Schema Matching and Mapping
  • General Schema Manipulation Operators
  • Data Matching
  • Query Processing
  • Wrappers
  • Data Warehousing and Caching
  • XML
  • Ontologies and Knowledge Representation
  • Incorporating Uncertainty into Data Integration
  • Data Provenance
  • Data Integration on the Web
  • Keyword Search–Integration on Demand
  • Peer-to-Peer Integration
  • Integration in Support of Collaboration
  • The Future of Data Integration
  • Bibliography