This article may be too technical for most readers to understand.(February 2015) |
A set intersection oracle (SIO) is a data structure which represents a collection of sets and can quickly answer queries about whether the set intersection of two given sets is non-empty.
The input to the problem is n finite sets. The sum of the sizes of all sets is N (which also means that there are at most N distinct elements). The SIO should quickly answer any query of the form:
Without any pre-processing, a query can be answered by inserting the elements of Si into a temporary hash table and then checking for each element of Sk whether it is in the hash table. The query time is .
Alternatively, we can pre-process the sets and create an n-by-n table where the intersection information is already entered. Then the query time is , but the memory required is .
Define a "large set" as a set with at least elements. Obviously there are at most such sets. Create a table of intersection data between every large set to every other large set. This requires memory. Additionally, for each large set, keep a hash table of all its elements. This requires additional memory.
Given two sets, there are three possible cases:
In general, if we define a "large set" as a set with at least elements, then the number of large set is at most so the memory required is , and the query time is .
The SIO problem can be reduced to the approximate distance oracle (DO) problem, in the following way. [1]
This graph has the following properties:
So, with a DO whose approximation factor of less than 2, we can solve the SIO problem.
It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires space to answer queries in time . If this conjecture is true, this implies that there is no DO with an approximation factor of less than 2 and a constant query time. [1]
This article may be too technical for most readers to understand.(February 2015) |
A set intersection oracle (SIO) is a data structure which represents a collection of sets and can quickly answer queries about whether the set intersection of two given sets is non-empty.
The input to the problem is n finite sets. The sum of the sizes of all sets is N (which also means that there are at most N distinct elements). The SIO should quickly answer any query of the form:
Without any pre-processing, a query can be answered by inserting the elements of Si into a temporary hash table and then checking for each element of Sk whether it is in the hash table. The query time is .
Alternatively, we can pre-process the sets and create an n-by-n table where the intersection information is already entered. Then the query time is , but the memory required is .
Define a "large set" as a set with at least elements. Obviously there are at most such sets. Create a table of intersection data between every large set to every other large set. This requires memory. Additionally, for each large set, keep a hash table of all its elements. This requires additional memory.
Given two sets, there are three possible cases:
In general, if we define a "large set" as a set with at least elements, then the number of large set is at most so the memory required is , and the query time is .
The SIO problem can be reduced to the approximate distance oracle (DO) problem, in the following way. [1]
This graph has the following properties:
So, with a DO whose approximation factor of less than 2, we can solve the SIO problem.
It is believed that the SIO problem does not have a non-trivial solution. I.e., it requires space to answer queries in time . If this conjecture is true, this implies that there is no DO with an approximation factor of less than 2 and a constant query time. [1]