A.10 Dirichlet distributions
A.10.1 Beta distribution
Suppose we have a collection of agents of various types in a model, and we know the total number of types, K. Then, we often describe the “demography” by types of collections of agents by empirical distributions, that is, by a K-dimensional vector of fractions of agents.
The fractions are distributed on a finite-dimensional simplex. The simplest nontrivial probability distribution is the Dirichlet distribution. This distribution arises naturally every time we deal with models of agents of several types or with a finite number of choices.6 Here, we proceed without going into the reasons.With only two types or choices, the distribution is particularly simple because the fraction is (p, 1 - p), where p has a Beta distribution,
A.10.2 Dirichlet distribution
With K ≥ 2 the probability distribution on ∆κ with a simple structure is defined on the simplex ∆κ by the density
6 There are actually deeper reasons than technical convenience to use the Dirichlet distributions, as discussed in Zabell (1992), for example, in the case of exchangeable partitions induced by agents.
We can directly manipulate the joint density expression as follows: Firstwrite the product of the density for Yj = yj, j = 1,..., K, S = s = y1 +----------------------------------------------- + yκ,
write the product of the y's in terms of the p's and s, and note that the expression separates into the product of two expressions in terms of s and the ps.
Then, making use of the Jacobian of the transformation, we end up with
This factored form shows that the sum S of K i.i.d. Gamma random variables is also a Gamma random variable with parameter Kα, and that the fractions Pj are independent of S, and have the density called the symmetric Dirichlet disstribution D(α, K), where the first arguments denotes the parameter, and the second indicates that it is the density for p1,... pκ, and where we use pκ = 1 - p1 — ∙ ∙ ∙ — pκ-1.
We also note that the Laplace transform (or the moment generating function) of the gamma distribution is
for θ > —1. This shows that the Gamma distribution is infinitely divisible, because (1 + θ) α' is the Laplace transform of the Gamma distribution with parameter at for every positive '.
Hence, we have the Levy-Khinchin representation
This identifies γ (dz) = z-1e-z dz as the measure for the gamma process.
Let ∏ be a Poisson process on the real half line S = (0, ∞). The count function is defined as
for every A in S. This function is such that
for disjoint Aj in S. This is a completely random measure with integer values.
j = 1,..., K, defines K components of a random vector in ∆κ that has the density of the Dirichlet distribution D(α, K).
A.10.3 Marginal Dirichlet distributions
and integrate qκ-1 out. What is left is the Dirichlet distribution
D(a1, a2,..., aκ-2, aκ-1 + aκ).
A.10.4 Poisson-Dirichlet distribution
Given a Poisson process with the above rate function, the Poisson-Dirichlet process can be constructed as shown by Kingman (1993, Chap. 9).
A.10.5 Size-biased sampling
We follow Kingman (1993, p. 98) to show that size-biased samples of Dirichlet distributions have the same distribution as those due to the residual allocation process. See also Pitman (1996).
Suppose that a vector p = (p1, p2,..., pK) with exchangeable components has the symmetric Dirichlet distribution D(a, K). Let ν be a random variable on {1, 2,..., K} with probability given by
In sampling from a population of agents of K types with fractions pj, j = 1,..., K, type j will be drawn with probability pj, that is, the first sample is of type j with probability pj. For this reason pν is said to be obtained by size-biased sampling. If all agents of the same type are removed, then the remaining agents have fractions p1,..., pν-1, pν+1,..., pK.We can renormalize the fractions by dividing the components of this vector by 1 — pj. Denote this vector by q(1).
The joint density for (pν, p1,..., pν-1, pν+1,..., pκ) is D(a + 1, a, a,..., a).
This can be seen by noting that by symmetry the vector (pj∙, p1,..., pj-1, pj+1,..., pK) has the same density as the original Dirichlet distribution. This occurs with probability pj. Hence, the density of (pν, p1,..., pν-1, pν+1,..., pκ) is Kp1 times the density of the original Dirichlet distribution, which is the density for the distribution D(a + 1, a,..., a), where a is repeated K — 1 times. From this, the marginal density for pν is seen to be
If we let a go to zero, while letting Ka approach θ, then the marginal density approaches Beta(1, θ).
Also, given pν, the sum of the remaining components is p1 + ∙ ∙ ∙ pν-1 + pν+1 + ∙ ∙ ∙ + pK = 1 — pν. Let q(1) be the renormalized vector, that is, the vector in which the conditional joint distribution for the remaining components has the same distribution as that of
where p(1) has the (K — 1)-
dimensional Dirichlet distribution, D(a, K — 1). Now, apply size-biased sampling to q(1) to produce q(2), and so on.
At the end of this process, we obtain the rearranged vector q of p, such that
where the vs are independent, and v j has the density of B (a + 1, (K — j)a). This density approaches B(1, θ) as a approaches zero and K infinity in such as way that Ka approaches θ.
This is the reverse of starting from the random variables distributed as B (1, θ) and denoting the kth largest of the q's as pk. This process produces (p1, p2,...), which has the Poisson-Dirichlet distribution with parameter θ as its limiting distribution. See Kingman (1993).