Title | On the security of individual data |
Publication Type | Conference Paper |
Authors | Demetrovics, J., G. O. H. Katona, and D. Miklos |
Year | 2004 |
Pages | 49 - 58 |
Conference Name | Lecture notes in computer science |
Language | eng |
ISBN Number | 0302-9743 |
Notes | On the security of individual data; 3540209654 |
Abstract | We will consider the following problem in this paper: Assume there are n numeric data {x(1), x(2),..., x(n)} (like salaries of n individuals) stored in a database and some subsums of these numbers axe disclosed by making public or just available for persons not eligible to learn the original data. Our motivating question is: at most how many of these subsums may be disclosed such that none of the numbers x(1), x(2),...,x(n) can be uniquely determined from these sums. These types of problems arise in the cases when certain tasks concerning a database are done by subcontractors who are not eligible to learn the elements of the database, but naturally should be given some data to fulfill there task. In database theory such examples are called statistical databases as they are used for statistical purposes and no individual data are supposed to be obtained using a restricted list of SUM queries. This problem was originally introduced by Chin and Ozsoyoglu [1], originally solved by Miller et al. [5] and revisited by Griggs [4]. It turned out [5] that the problem is equivalent to the following question: If there are n real, non-zero numbers X = {x(1), x(2),..., x(n)} given, what is the maximum number of 0 subsums of it, that is, what is the maximum number of the subsets of X whose elements sum up to 0. This approach, together with the Sperner theorem shows that no more than ((n)(n/2)) subsums of a given set of secure data may be disclosed without disclosing at least one of the data, which upper bound is sharp as well. However, it is natural to assume that the disclosed subsums of the original elements of the database will contain only a limited number of elements, say at most k (in the applications databases are usually huge, while the number of operations is in most of the cases limited). We have now the same question: at most how many of these subsums of at most k members may be given such that none of the numbers x(1), x(2),...x(n) can be uniquely determined from these sums. The main result of this paper gives an upper bound on this number, which turns out to be sharp if we allow subsums of only k or k – 1 members and asymptotically sharp in case of subsums of at most k members. |