Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family converge to their theoretical probabilities. Uniform convergence in probability has applications to statistics as well as machine learning as part of statistical learning theory.
The law of large numbers says that, for each single event , its empirical frequency in a sequence of independent trials converges (with high probability) to its theoretical probability. In many application however, the need arises to judge simultaneously the probabilities of events of an entire class from one and the same sample. Moreover it, is required that the relative frequency of the events converge to the probability uniformly over the entire class of events [1] The Uniform Convergence Theorem gives a sufficient condition for this convergence to hold. Roughly, if the event-family is sufficiently simple (its VC dimension is sufficiently small) then uniform convergence holds.
For a class of predicates defined on a set and a set of samples , where , the empirical frequency of on is
The theoretical probability of is defined as
The Uniform Convergence Theorem states, roughly, that if is "simple" and we draw samples independently (with replacement) from according to any distribution , then with high probability, the empirical frequency will be close to its expected value, which is the theoretical probability. [2]
Here "simple" means that the Vapnik–Chervonenkis dimension of the class is small relative to the size of the sample. In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.
The Uniform Convergence Theorem was first proved by Vapnik and Chervonenkis [1] using the concept of growth function.
The statement of the uniform convergence theorem is as follows: [3]
If is a set of -valued functions defined on a set and is a probability distribution on then for and a positive integer, we have:
And for any natural number , the shattering number is defined as:
From the point of Learning Theory one can consider to be the Concept/Hypothesis class defined over the instance set . Before getting into the details of the proof of the theorem we will state Sauer's Lemma which we will need in our proof.
The Sauer–Shelah lemma [4] relates the shattering number to the VC Dimension.
Lemma: , where is the VC Dimension of the concept class .
Corollary: .
[1] and [3] are the sources of the proof below. Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.
We present the technical details of the proof.
Lemma: Let and
Then for , .
Proof:
By the triangle inequality,
if and then .
Therefore,
since and are independent.
Now for fix an such that . For this , we shall show that
Thus for any , and hence . And hence we perform the first step of our high level idea.
Notice, is a binomial random variable with expectation and variance . By Chebyshev's inequality we get
for the mentioned bound on . Here we use the fact that for .
Let be the set of all permutations of that swaps and in some subset of .
Lemma: Let be any subset of and any probability distribution on . Then,
where the expectation is over chosen according to , and the probability is over chosen uniformly from .
Proof: For any
(since coordinate permutations preserve the product distribution .)
The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.
Lemma: Basing on the previous lemma,
Proof: Let us define and which is at most . This means there are functions such that for any between and with for
We see that iff for some in satisfies, . Hence if we define if and otherwise.
For and , we have that iff for some in satisfies . By union bound we get
Since, the distribution over the permutations is uniform for each , so equals , with equal probability.
Thus,
where the probability on the right is over and both the possibilities are equally likely. By Hoeffding's inequality, this is at most .
Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.
Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family converge to their theoretical probabilities. Uniform convergence in probability has applications to statistics as well as machine learning as part of statistical learning theory.
The law of large numbers says that, for each single event , its empirical frequency in a sequence of independent trials converges (with high probability) to its theoretical probability. In many application however, the need arises to judge simultaneously the probabilities of events of an entire class from one and the same sample. Moreover it, is required that the relative frequency of the events converge to the probability uniformly over the entire class of events [1] The Uniform Convergence Theorem gives a sufficient condition for this convergence to hold. Roughly, if the event-family is sufficiently simple (its VC dimension is sufficiently small) then uniform convergence holds.
For a class of predicates defined on a set and a set of samples , where , the empirical frequency of on is
The theoretical probability of is defined as
The Uniform Convergence Theorem states, roughly, that if is "simple" and we draw samples independently (with replacement) from according to any distribution , then with high probability, the empirical frequency will be close to its expected value, which is the theoretical probability. [2]
Here "simple" means that the Vapnik–Chervonenkis dimension of the class is small relative to the size of the sample. In other words, a sufficiently simple collection of functions behaves roughly the same on a small random sample as it does on the distribution as a whole.
The Uniform Convergence Theorem was first proved by Vapnik and Chervonenkis [1] using the concept of growth function.
The statement of the uniform convergence theorem is as follows: [3]
If is a set of -valued functions defined on a set and is a probability distribution on then for and a positive integer, we have:
And for any natural number , the shattering number is defined as:
From the point of Learning Theory one can consider to be the Concept/Hypothesis class defined over the instance set . Before getting into the details of the proof of the theorem we will state Sauer's Lemma which we will need in our proof.
The Sauer–Shelah lemma [4] relates the shattering number to the VC Dimension.
Lemma: , where is the VC Dimension of the concept class .
Corollary: .
[1] and [3] are the sources of the proof below. Before we get into the details of the proof of the Uniform Convergence Theorem we will present a high level overview of the proof.
We present the technical details of the proof.
Lemma: Let and
Then for , .
Proof:
By the triangle inequality,
if and then .
Therefore,
since and are independent.
Now for fix an such that . For this , we shall show that
Thus for any , and hence . And hence we perform the first step of our high level idea.
Notice, is a binomial random variable with expectation and variance . By Chebyshev's inequality we get
for the mentioned bound on . Here we use the fact that for .
Let be the set of all permutations of that swaps and in some subset of .
Lemma: Let be any subset of and any probability distribution on . Then,
where the expectation is over chosen according to , and the probability is over chosen uniformly from .
Proof: For any
(since coordinate permutations preserve the product distribution .)
The maximum is guaranteed to exist since there is only a finite set of values that probability under a random permutation can take.
Lemma: Basing on the previous lemma,
Proof: Let us define and which is at most . This means there are functions such that for any between and with for
We see that iff for some in satisfies, . Hence if we define if and otherwise.
For and , we have that iff for some in satisfies . By union bound we get
Since, the distribution over the permutations is uniform for each , so equals , with equal probability.
Thus,
where the probability on the right is over and both the possibilities are equally likely. By Hoeffding's inequality, this is at most .
Finally, combining all the three parts of the proof we get the Uniform Convergence Theorem.