Publications of Thalheim, B.

On the number of independent functional dependencies

We will investigate the following question: what can be the maximum number of independent functional dependencies in a database of n attributes, that is the maximum cardinality of a system of dependencies which which do not follow from the Armstrong axioms and none of them can be derived from the remaining ones using the Armstrong axioms. An easy and for long time believed to be the best construction is the following: take the maximum possible number of subsets of the attributes such that none of them contains the other one (by the wellknown theorem of Sperner [8] their number is (n / n/2)) and let them all determine all the further values. However, we will show ( n by a specific construction that it is possible to give more than (n / n/2) independent dependencies (the construction will give (I + 1 / n(2))(n / n/2) of them) and – on the other hand – the upper bound is 2(n) – 1, which is roughly root n(n / n/2).

Bertossi L, Katona GO, Schewe KD, Thalheim B. Semantics in databases. In: Lecture notes in computer science. Vol 2582. Berlin: Springer; 2003. p. 1-6. Abstract

Semantics in databases

Semantics in databases: second international workshop, Dagstuhl Castle, Germany, January 7-12, 2001 : revised papers

Demetrovics J, Katona GO, Miklos D. Error-correcting keys in relational databases. In: Schewe KD, Thalheim B, editors. Foundations of Information and Knowledge Systems. Vol 1762. Berlin: Springer; 2000. p. 88-93. (Lecture Notes in Comput. Sci.; vol 1762). Abstract

Error-correcting keys in relational databases

Suppose that the entries of a relational database are collected in an unreliable way, that is the actual database may differ from the true database in at most one data of each individual. An error-correcting key is such a set of attributes, that the knowledge of the actual data of an individual in this set of attributes uniquely determines the individual. It is showed that if the minimal keys are of size at most ii, then the smallest sizes of the minimal error-correcting keys can be ck(3) and this is the best possible, all minimal error-correcting keys have size at most 3k(3).

Anstee R, Demetrovics J, Katona GO, Sali A. Low discrepancy allocation of two-dimensional data. In: Schewe KD, Thalheim B, editors. Foundations of Information and Knowledge Systems. Vol 1762. Berlin: Springer; 2000. p. 1-12. Abstract

Low discrepancy allocation of two-dimensional data

Fast browsing and retrieval of geographically referenced information requires the allocation of data on different storage devices for concurrent retrieval. By dividing the two-dimensional space into tiles, a system can allow users to specify regions of interest using a query rectangle and then retrieving information related to tiles covered by the query. Suppose that there are m I/O devices. A tile is labeled by i if the data corresponding to this area is stored in the ith I/O device. A labeling is efficient if the difference of the numbers of occurrences of distinct labels in any given rectangle is small. Except for some simple cases this discrepancy exceeds i. In the present paper constructions are given to make this discrepancy small relative to m. The constructions use latin squares and a lower bound is given, which shows that the constructions are best possible under certain conditions.

Asymptotic properties of keys and functional dependencies in random databases

Practical database applications give the impression that sets of constraints are rather small and that large sets are unusual and are caused by bad design decisions. Theoretical investigations, however, show that minimal constraint sets are potentially very large. Their size can be estimated to be exponential in terms of the number of attributes. The gap between observation in practice and theory results in the rejection of theoretical results. However, practice is related to average cases and is not related to worst cases. The theory used until now considered the worst-case complexity. This paper aims to develop a theory for the average-case complexity. Several probabilistic models and asymptotics of corresponding probabilities are investigated for random databases formed by independent random tuples with a common discrete distribution. Poisson approximations are studied for the distributions of some characteristics for such databases where the number of tuples is sufficiently large. We intend to prove that the exponential complexity of key sets and sets of functional dependencies is rather unusual and almost all minimal keys in a relation have a length which depends mainly on the size of the relation.

The average length of keys and functional dependencies in (random) databases

Practical database applications engender the impression that sets of constraints are rather small and that large sets are unusual and caused by bad design decisions. Theoretical investigations show, however, that minimal constraint sets are potentially very large. Their size can be estimated to be exponential in terms of the number of attributes. The gap between belief and theory causes non-acceptance of theoretical results. However, beliefs are related to average cases. The theory known so far considered worst case complexity. This paper aims at developing a theory of average case complexity. Several statistic models and asymptotics of corresponding probabilities are investigated for random databases. We show that exponential complexity of independent key sets and independent sets of functional dependencies is rather unusual. Depending on the size of relations almost all minimal keys have a length which mainly depends on the size. The number of minimal keys of other length is exponentially small compared with the number of minimal keys of the derived length. Further, if a key is valid in a relation then it is probably the minimal key. The same results hold for functional dependencies.