IIS-0083312
Fereidoon Sadri
Department of Mathematical Sciences
University of North Carolina at Greensboro
P.O. Box 26170
Greensboro, NC 27402-6170
Phone: (336) 256-1136, Fax: (336) 334-5949
Email: f_sadri@uncg.edu
Home page: http://www.uncg.edu/~sadrif
http://www.uncg.edu/~sadrif/nsf/query-optimization.html
Undergraduate Students
|
Graduate Students
|
|
|
Query optimization, restructured views, multi-database
systems, interoperability, data warehousing, structural heterogeneity, semantic
heterogeneity, SchemaSQL.
In database systems (and other forms of information systems)
similar information can be stored under different structures (or schemas).
This is known as schematic heterogeneity and current database languages
are very limited in coping with it. But many contemporary applications, in
particular multi-database systems, require the ability to handle schematic
heterogeneity. Higher order database languages have been proposed recently to remedy this shortcoming.
This project is concerned with the query optimization for such higher order
languages. It has been shown that these languages are also useful in other
important applications, such as data mining and
data warehousing. We propose to study the following problems.
Cost-based optimization for higher-order languages.
This is the well-known query optimization technique that involves the
generation of alternative execution plans, and selecting the best plan among
the alternatives. We need to extend this approach for higher-order languages.
Using materialized views for query optimization. A materialized view
(known as summary table in data warehousing applications) is information
obtained (possibly summarized and refined) from the base data. Researchers have
studied the problem of using materialized views for query optimization in
database applications. Higher-order languages can generate and handle restructured
views, namely, views that cast the information in a different structure
than the original. This capability opens up a broad new spectrum for query
optimization in database systems.
Handling semantic heterogeneity. Another key
issue in multi-database and data transmission applications is that of semantic
heterogeneity. Raw data should be accompanied by additional information to
indicate its nature and properties. For example a numeric value by itself does
not have any meaning, but once we know it represents a salary in US
dollars the information becomes meaningful. We would like to integrate
semantics as well as schematic capabilities into a single database language.
Physical structure of data. If prior
knowledge is available regarding the types and frequency of queries to a
database, then, it should be possible to structure the data in a way that
optimizes the overall performance of the system. A study of higher-order
languages and their optimization problem also sheds light into the important
problem of structuring data for optimum system performance.
Please refer to http://www.uncg.edu/~sadrif/papers.html
for a complete publications list.
Seven undergraduate and eight graduate students have
participated at different stages of this project in the form of directed study
(project) courses, Master’s projects and Master’s theses. Please
see the list of participating students above.
For the remaining period of this project, I intend to
concentrate mainly on expanding our investigation into the problem of interoperability
among sources of data (XML, relational, ...). We will
further develop algorithms for query processing and optimization in a
multi-source system, and study their performance. We will also continue our
work on query optimization using structured views. Problems to study include
(1) A simulation study of performance, and (2) Physical data structuring and
view/index selection for the optimum performance when the workload is known.
Different structures can be used to represent the same (or
similar) information. In databases, in addition to the information stored
within tables, schema objects such as relation names and column labels can also
store information. The situation is similar in XML data, where tags and
elements can both carry information. Further, the structure of XML data also
carries important information.
Such "restructuring" of information appears naturally in many
applications: In multi-database applications similar data is often represented
in different structures. In data warehousing applications, summarizing data may
be combined with restructuring for additional efficiency. In database systems
restructured views can prove very effective in query optimization.
Current database and web systems have very limited capability to deal and
relate information in such different structures. In this project we address the
issues involved in coping with structural heterogeneity, and develop techniques
and algorithms to benefit from information restructuring in, for example,
database and data warehousing applications.
Acknowledgement: This material is based upon work supported by the National
Science Foundation under grant No. 0083312.
Disclaimer: Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author and do not necessarily
reflect the views of the National Science Foundation.