Department of Mathematical Sciences
University of North Carolina at Greensboro
P.O. Box 26170
Greensboro, NC 27402-6170
Phone: (336) 256-1136, Fax: (336) 334-5949
Home page: http://www.uncg.edu/~sadrif
Query optimization, restructured views, multi-database systems, interoperability, data warehousing, structural heterogeneity, semantic heterogeneity, SchemaSQL.
In database systems (and other forms of information systems) similar information can be stored under different structures (or schemas). This is known as schematic heterogeneity and current database languages are very limited in coping with it. But many contemporary applications, in particular multi-database systems, require the ability to handle schematic heterogeneity. Higher order database languages have been proposed recently to remedy this shortcoming.
This project is concerned with the query optimization for such higher order languages. It has been shown that these languages are also useful in other important applications, such as data mining and data warehousing. We propose to study the following problems.
Cost-based optimization for higher-order languages. This is the well-known query optimization technique that involves the generation of alternative execution plans, and selecting the best plan among the alternatives. We need to extend this approach for higher-order languages.
Using materialized views for query optimization. A materialized view (known as summary table in data warehousing applications) is information obtained (possibly summarized and refined) from the base data. Researchers have studied the problem of using materialized views for query optimization in database applications. Higher-order languages can generate and handle restructured views, namely, views that cast the information in a different structure than the original. This capability opens up a broad new spectrum for query optimization in database systems.
Handling semantic heterogeneity. Another key issue in multi-database and data transmission applications is that of semantic heterogeneity. Raw data should be accompanied by additional information to indicate its nature and properties. For example a numeric value by itself does not have any meaning, but once we know it represents a salary in US dollars the information becomes meaningful. We would like to integrate semantics as well as schematic capabilities into a single database language.
Physical structure of data. If prior knowledge is available regarding the types and frequency of queries to a database, then, it should be possible to structure the data in a way that optimizes the overall performance of the system. A study of higher-order languages and their optimization problem also sheds light into the important problem of structuring data for optimum system performance.
Please refer to http://www.uncg.edu/~sadrif/papers.html for a complete publications list.
Seven undergraduate and eight graduate students have participated at different stages of this project in the form of directed study (project) courses, Master’s projects and Master’s theses. Please see the list of participating students above.
For the remaining period of this project, I intend to concentrate mainly on expanding our investigation into the problem of interoperability among sources of data (XML, relational, ...). We will further develop algorithms for query processing and optimization in a multi-source system, and study their performance. We will also continue our work on query optimization using structured views. Problems to study include (1) A simulation study of performance, and (2) Physical data structuring and view/index selection for the optimum performance when the workload is known.
Different structures can be used to represent the same (or similar) information. In databases, in addition to the information stored within tables, schema objects such as relation names and column labels can also store information. The situation is similar in XML data, where tags and elements can both carry information. Further, the structure of XML data also carries important information.
Such "restructuring" of information appears naturally in many applications: In multi-database applications similar data is often represented in different structures. In data warehousing applications, summarizing data may be combined with restructuring for additional efficiency. In database systems restructured views can prove very effective in query optimization.
Current database and web systems have very limited capability to deal and
relate information in such different structures. In this project we address the
issues involved in coping with structural heterogeneity, and develop techniques
and algorithms to benefit from information restructuring in, for example,
database and data warehousing applications.
Acknowledgement: This material is based upon work supported by the National Science Foundation under grant No. 0083312.
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.