Modeling Uncertainty in Database and Knowledge-Base Systems

Fereidoon (Fred) Sadri
Department of Mathematical Sciences
University of North Carolina at Greensboro

Contact Information

Department of Mathematical Sciences
383, Bryan Building, UNCG
Greensboro, NC 27402
Phone: (336) 334-5836
Fax : (336) 334-5949
Email: sadri@uncg.edu

Keywords

Uncertain data, inaccurate data, probabilistic approach, relational model, deductive databases, Information Source Tracking Method (IST)

Project Award Information

Project Summary

This research is concerned with the modeling and manipulation of uncertain and inaccurate information in database systems. Database systems are evolving into knowledge-base systems, and are increasingly used in applications where handling uncertain and inaccurate data is essential. The research objectives include: (1) Extending existing methods for the modeling and management of uncertain and inaccurate data to contemporary database models such as object-oriented and deductive databases, (2) Query optimization in uncertain databases, (3) Complexity of query processing in uncertain databases: effect of "combination modes" on complexity, and (4) A study of equivalence in uncertain deductive databases: When does classical equivalence coincide with equivalence under uncertainty? The results of this research can be used in realizing more intelligent database and knowledge-base systems that are suitable for applications that require handling uncertain and inaccurate information such as medical, legal and military applications.

Goals, Objectives, and Targeted Activities

This is the last year of this three year project. Some of our contributions are listed in the next section. We will continue our investigation into some remaining issues, such as uncertainty in information obtained through data mining. The techniques we have developed seem to be applicable to the problem of data integration from multiple sources. Our future research will address the issues of querying the web and other forms of information sources, with simplicity of interaction, efficiency of processing and consistency of data as the main concepts to be investigated.

Indication of Success

We have made significant contributions to the area of uncertainty modeling and management in database and knowledge-base systems. I believe the time to incorporate uncertainty management into commercial databases has arrived, and will try to promote this idea during our forthcoming tutorial at IEEE ICDE. Some of our results that are particularly important to the commercialization of these ideas include:
  • A study of different combination functions, and their effect on the complexity of query processing in uncertain databases,
  • Query optimization in uncertain databases,
  • A study of equivalence in uncertain deductive databases: When does classical equivalence coincide with equivalence under uncertainty?

    Project Success and Impact

    This project is funded under RUI program, Research in Undergraduate Institutions. Currently, we have an undergraduate program in Computer Science, offering a B.S. degree (recently accredited by CSAB). Our proposal to establish an M.S. degree in Computer Science is in its last step of (final) approval, and we are looking forward to starting our MS in Computer Science in Fall 1998.
    The following undergraduate students have worked with me on research in the past three years:
    (1) Mr. Jonathan Blakely (now a masters student at Duke University),
    (2) Mr. Bryan Marsh (now with Intelligent Information Systems; he is planning to attend the University of North Carolina at Chapel Hill for graduate studies),
    (3) Mr. Stephen Slocum (now a masters student at NC State University), and
    (4) Ms. Salley Wilson (now with IBM Global Services).
    I also supervised a Masters (in Mathematics). Mr. Patrick Shouse defended his thesis in April 97. He was with US Airways, and has recently joined the Aon Consulting services.

    I will present a tutorial on "Uncertainty Management in Database and Knowledge-base Systems", jointly with Dr. V. S. Lakshmanan, at the 1998 IEEE International Conference on Data Engineering, February 23-27, 1998, Orlando, Florida.

    Project References

    (Please refer to http://www.uncg.edu/~sadrif/papers.html for a more complete listing.)

    V. S. Lakshmanan, and F. Sadri, ``On A Theory of Probabilistic Deductive Databases.'' October 1997. Submitted to JLP.

    V. S. Lakshmanan, and F. Sadri, ``Uncertain Deductive Databases: A Hybrid Approach,'' Information Systems, Vol. 22, No. 8, pp 483-508, December 1997.

    V. S. Alagar, F. Sadri, and J. N. Said, ``Semantics of an Extended Relational Model for Managing Uncertain Information.'' Proceedings of Fourth International Conference on Information and Knowledge Management, 1995, (CIKM'95), pp 234-240.

    F. Sadri, ``Information Source Tracking Method: Efficiency Issues.'' IEEE Transactions on Knowledge and Data Engineering, Vol. 7, No. 6, December 1995, pp 947-954.

    F. Sadri, ``Integrity Constraints in the Information Source Tracking Method.'' IEEE Transactions on Knowledge and Data Engineering, Vol. 7, No. 1, February 1995, pp 106-119.

    Area Background

    Approaches to the modeling and management of uncertainty and inaccuracy in database and knowledge-base systems can be categorized into two broad categories, quantitative and qualitative. Quantitative techniques use numerical factors for uncertainty, and manipulate these factors to obtain numerical measures for the uncertainty of derived data. Numerous methods based on various mathematical concepts, such as probability theory, fuzzy sets and fuzzy logic, and Dempster Shafer theory of evidence have been developed so far. Qualitative techniques are often based on partitioning the data into ``definite'' and ``indefinite'' components, and extend the classical query processing techniques to manipulate these components. Disjunctive logic programming and disjunctive databases are also examples of qualitative approaches to the modeling of uncertainty.

    We advocate a hybrid approach: In the Information Source Tracking (IST) method, the certainty of data is modeled by the reliability of the sources of data. The system keeps track of the association between data and sources, and computes this information for derived data (such as answers to queries). Then, if desired, a numerical factor of certainty can be calculated for derived data as a function of the reliabilities of the contributing sources, and their nature of contribution. This two-phase approach makes it also possible to use different paradigms, such as probability theory and fuzzy sets theory, for the numeric phase while maintaining the non-numeric phase intact.

    Area References

    A recent survey has appeared in Zaniolo et al, Advanced Database Systems, Morgan Kaufmann, 1997 (Part V: Uncertainty in Databases and Knowledge Bases).

    We recently gave a tutorial at the 14th IEEE International Conference on Data Engineering, Orlando, Fl., February 23-27, 1998: V. S. Lakshmanan and F. Sadri, "Uncertainty Management in Database and Knowledge-Base Systems." Slides are available from http://www.uncg.edu/~sadrif/papers/icde98tute.ps.

    Our works on Information Source Tracking and Probabilistic Deductive Databases are listed in http://www.uncg.edu/~sadrif/papers.html. Some papers are available on-line.

    A bibliography of recent publications on uncertainty in database and knowledge-base systems is available from http://www.uncg.edu/~sadrif/papers/uncer-biblio.ps.

    Potential Related Projects

    Relationship / applications of ideas from this project to integration of information from multiple sources merit further investigation.


    This page was last modified on March 24, 98