Query Optimization for Multi-Database and Data Warehousing, NSF project, Fereidoon Sadri

Query Optimization for Multi-Database and Data Warehousing

Project Award Number

IIS-0083312

Principal Investigator

Fereidoon Sadri
Department of Mathematical Sciences
University of North Carolina at Greensboro
P.O. Box 26170
Greensboro, NC 27402-6170
Phone: (336) 256-1136, Fax: (336) 334-5949
Email: f_sadri@uncg.edu
Home page: http://www.uncg.edu/~sadrif

Project URL

http://www.uncg.edu/~sadrif/nsf/query-optimization.html

Participating Students (Partially Supported by this grant)

Undergraduate Students	Graduate Students
Corey F. Couch Marian Neacsu Kristina L. Wood Helen Peastrel Douglas Markland Shelby Soderlund Armin Kalender	Kenneth D. Kindsvater Manjusha Ravindranath Xiang Zhong Archana Nayini Sudarut Anantarak Adina Ivanica John Cocking John Harney

Keywords

Query optimization, restructured views, multi-database systems, interoperability, data warehousing, structural heterogeneity, semantic heterogeneity, SchemaSQL.

Project Summary

In database systems (and other forms of information systems) similar information can be stored under different structures (or schemas). This is known as schematic heterogeneity and current database languages are very limited in coping with it. But many contemporary applications, in particular multi-database systems, require the ability to handle schematic heterogeneity. Higher order database languages have been proposed recently to remedy this shortcoming.

This project is concerned with the query optimization for such higher order languages. It has been shown that these languages are also useful in other important applications, such as data mining and data warehousing. We propose to study the following problems.

Cost-based optimization for higher-order languages. This is the well-known query optimization technique that involves the generation of alternative execution plans, and selecting the best plan among the alternatives. We need to extend this approach for higher-order languages.

Using materialized views for query optimization. A materialized view (known as summary table in data warehousing applications) is information obtained (possibly summarized and refined) from the base data. Researchers have studied the problem of using materialized views for query optimization in database applications. Higher-order languages can generate and handle restructured views, namely, views that cast the information in a different structure than the original. This capability opens up a broad new spectrum for query optimization in database systems.

Handling semantic heterogeneity. Another key issue in multi-database and data transmission applications is that of semantic heterogeneity. Raw data should be accompanied by additional information to indicate its nature and properties. For example a numeric value by itself does not have any meaning, but once we know it represents a salary in US dollars the information becomes meaningful. We would like to integrate semantics as well as schematic capabilities into a single database language.

Physical structure of data. If prior knowledge is available regarding the types and frequency of queries to a database, then, it should be possible to structure the data in a way that optimizes the overall performance of the system. A study of higher-order languages and their optimization problem also sheds light into the important problem of structuring data for optimum system performance.

Publications

Please refer to http://www.uncg.edu/~sadrif/papers.html for a complete publications list.

L. V. S. Lakshmanan and F. Sadri, “On the Information Content of an XML Database.” Manuscript. 2004.
F. Sadri, “Optimization of Queries Using Restructured Views.” Manuscript.
L. V. S. Lakshmanan, and F. Sadri, “Interoperability on XML Data.” 2nd International Semantic Web Conference (ISWC'03), October 20-23, 2003, Sanibel Island, Florida, USA.
L. V. S. Lakshmanan, F. Sadri, and S. N. Subramanian, “SchemaSQL - An Extension to SQL for Multi-database Interoperability.” ACM Transactions on Database Systems (ACM TODS), Vol. 26, No. 4, December 2001, pages 476-519.
K. B. Davis, and F. Sadri, “Optimization of SchemaSQL Queries.” Proceedings of International Database Engineering and Applications Symposium (IDEAS), 2001, Pages 111-116.
L. V. S. Lakshmanan, F. Sadri, and S. N. Subramanian, “On Efficiently Implementing SchemaSQL on an SQL Database System.” Proceedings of the 1999 International Conference on Very Large Databases (VLDB'99), pages 471-482.
F. Sadri, and P. L. Shouse, “A Graphical Language for Relational Multi-Database Querying and Restructuring.” Proceedings of International Conference on Computing and Information (ICCI), 1998, Pages 61-68.
L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian, “Logic and Algebraic Languages for Interoperability in Multidatabase Systems.” Journal of Logic Programming, Vol. 33, No. 2, Pages 101-149, November 1997.
F. Sadri, and S. B. Wilson, “Implementation of SchemaSQL - A Language for Relational Multi-Database Systems.” Manuscript, 1997.
L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian, “SchemaSQL - A Language for Interoperability in Multi-database Systems.” Proceedings of the 1996 International Conference on Very Large Databases (VLDB'96), pages 239-250.
L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian, “A Declarative Language for Querying and Restructuring the Web."” Sixth Int'l Workshop on Research Issues in Data Engineering, RIDE'96, February 1996, Pages 12-19.
L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian, “On the Logical Foundations of Schema Integration and Evolution in Heterogeneous Database Systems.” International Conference on Deductive and Object-Oriented Databases (DOOD), December 1993, Pages 81-100.

Project Impact

Human Resources:

Seven undergraduate and eight graduate students have participated at different stages of this project in the form of directed study (project) courses, Master’s projects and Master’s theses. Please see the list of participating students above.

Goals, Objectives, and Targeted Activities

For the remaining period of this project, I intend to concentrate mainly on expanding our investigation into the problem of interoperability among sources of data (XML, relational, ...). We will further develop algorithms for query processing and optimization in a multi-source system, and study their performance. We will also continue our work on query optimization using structured views. Problems to study include (1) A simulation study of performance, and (2) Physical data structuring and view/index selection for the optimum performance when the workload is known.

Area Background

Different structures can be used to represent the same (or similar) information. In databases, in addition to the information stored within tables, schema objects such as relation names and column labels can also store information. The situation is similar in XML data, where tags and elements can both carry information. Further, the structure of XML data also carries important information.

Such "restructuring" of information appears naturally in many applications: In multi-database applications similar data is often represented in different structures. In data warehousing applications, summarizing data may be combined with restructuring for additional efficiency. In database systems restructured views can prove very effective in query optimization.

Current database and web systems have very limited capability to deal and relate information in such different structures. In this project we address the issues involved in coping with structural heterogeneity, and develop techniques and algorithms to benefit from information restructuring in, for example, database and data warehousing applications.

Acknowledgement: This material is based upon work supported by the National Science Foundation under grant No. 0083312.

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.