Set of random variables of which any two are independent
In
probability theory, a pairwise independent collection of
random variables is a set of random variables any two of which are
independent.[1] Any collection of
mutually independent random variables is pairwise independent, but some pairwise independent collections are not mutually independent. Pairwise independent random variables with finite
variance are
uncorrelated.
A pair of random variables X and Y are independent if and only if the random vector (X, Y) with
joint cumulative distribution function (CDF) satisfies
or equivalently, their joint density satisfies
That is, the joint distribution is equal to the product of the marginal distributions.[2]
Unless it is not clear in context, in practice the modifier "mutual" is usually dropped so that independence means mutual independence. A statement such as " X, Y, Z are independent random variables" means that X, Y, Z are mutually independent.
Example
Pairwise independence does not imply mutual independence, as shown by the following example attributed to S. Bernstein.[3]
Suppose X and Y are two independent tosses of a fair coin, where we designate 1 for heads and 0 for tails. Let the third random variable Z be equal to 1 if exactly one of those coin tosses resulted in "heads", and 0 otherwise (i.e., ). Then jointly the triple (X, Y, Z) has the following
probability distribution:
Since each of the pairwise joint distributions equals the product of their respective marginal distributions, the variables are pairwise independent:
X and Y are independent, and
X and Z are independent, and
Y and Z are independent.
However, X, Y, and Z are notmutually independent, since the left side equalling for example 1/4 for (x, y, z) = (0, 0, 0) while the right side equals 1/8 for (x, y, z) = (0, 0, 0). In fact, any of is completely determined by the other two (any of X, Y, Z is the
sum (modulo 2) of the others). That is as far from independence as random variables can get.
Probability of the union of pairwise independent events
Bounds on the
probability that the sum of
Bernoullirandom variables is at least one, commonly known as the
union bound, are provided by the
Boole–Fréchet[4][5] inequalities. While these bounds assume only
univariate information, several bounds with knowledge of general
bivariate probabilities, have been proposed too. Denote by a set of Bernoulli events with
probability of occurrence for each . Suppose the
bivariate probabilities are given by for every pair of indices . Kounias [6] derived the following
upper bound:
which subtracts the maximum weight of a
starspanning tree on a
complete graph with nodes (where the edge weights are given by ) from the sum of the
marginal probabilities .
Hunter-Worsley[7][8] tightened this
upper bound by optimizing over as follows:
where is the set of all
spanning trees on the graph. These bounds are not the
tightest possible with general
bivariates even when
feasibility is guaranteed as shown in Boros et.al.[9] However, when the variables are
pairwise independent (), Ramachandra—Natarajan [10] showed that the Kounias-Hunter-Worsley [6][7][8] bound is
tight by proving that the maximum probability of the union of events admits a
closed-form expression given as:
(1)
where the
probabilities are sorted in increasing order as . The
tight bound in Eq. 1 depends only on the sum of the smallest probabilities and the largest probability . Thus, while
ordering of the
probabilities plays a role in the derivation of the bound, the
ordering among the smallest probabilities is inconsequential since only their sum is used.
As shown in Ramachandra-Natarajan,[10] it can be easily verified that the ratio of the two
tight bounds in Eq. 2 and Eq. 1 is
upper bounded by where the maximum value of is attained when
,
where the
probabilities are sorted in increasing order as . In other words, in the best-case scenario, the pairwise independence bound in Eq. 1 provides an improvement of over the
univariate bound in Eq. 2.
Generalization
More generally, we can talk about k-wise independence, for any k ≥ 2. The idea is similar: a set of
random variables is k-wise independent if every subset of size k of those variables is independent. k-wise independence has been used in theoretical computer science, where it was used to prove a theorem about the problem
MAXEkSAT.
^Gut, A. (2005) Probability: a Graduate Course, Springer-Verlag.
ISBN0-387-27332-8. pp. 71–72.
^Hogg, R. V., McKean, J. W., Craig, A. T. (2005). Introduction to Mathematical Statistics (6 ed.). Upper Saddle River, NJ: Pearson Prentice Hall.
ISBN0-13-008507-3.{{
cite book}}: CS1 maint: multiple names: authors list (
link) Definition 2.5.1, page 109.
^Hogg, R. V., McKean, J. W., Craig, A. T. (2005). Introduction to Mathematical Statistics (6 ed.). Upper Saddle River, NJ: Pearson Prentice Hall.
ISBN0-13-008507-3.{{
cite book}}: CS1 maint: multiple names: authors list (
link) Remark 2.6.1, p. 120.
^Boole, G. (1854). An Investigation of the Laws of Thought, On Which Are Founded the Mathematical Theories of Logic and Probability. Walton and Maberly, London. See Boole's "major" and "minor" limits of a conjunction on page 299.
^Fréchet, M. (1935). Généralisations du théorème des probabilités totales. Fundamenta Mathematicae25: 379–387.
^
abD. Hunter (1976). "An upper bound for the probability of a union". Journal of Applied Probability. 13 (3): 597–603.
doi:
10.2307/3212481.
JSTOR3212481.
^
abK. J. Worsley (1982). "An improved Bonferroni inequality and applications". Biometrika. 69 (2): 297–302.
doi:
10.1093/biomet/69.2.297.