A.12 GEM and size-biased distributions
Ewens(1990, Sec. 13) has defined a distribution for random variables x1, x2,..., xn-1 that is a special case of the residual allocation model introduced by Halmos (1944) and called the GEM distribution, after Griffiths, Engen, and McClosky.
Itis defined by x1 = z1, xi = zi (1 - zi-1) ∙ ∙ ∙ (1 - z1), i = 2, 3,..., where the z's are i.i.d. with desnity θ(1 - z)θ-1, O < z < 1,0 < θ < ∞.A finite version is for x1,, xn with z j having density
See Donnelly and Joyce (1989). As n goes to infinity while ne goes to θ, this density converges to Beta(1, θ).
Let x have the GEM distribution, and Let y be defined as follows: y 1 = xi with probability xi, and given y1, y2 is defined to be Xj with
and
so on. Then, y has the same distribution as x. We say that the GEM distribution is invariant under size biasing. Since this notion is important, we summarize the steps involved in size-biasing in a context of sampling. First, pick one agent at random out of n agents. Label his type i1, and remove from the sample of all agents of the same type. Second, choose randomly an agent from the remaining agents, and label his type i2. Remove all agents of this second type from the remaining agents, and continue.
Fix the number of the types of agents in the sample of size n at K. The size biasing permutes the types {1, 2,..., K} into π* = {i1, i2,..., iκ}.
We have, letting nj denote the number of agents of type j,

Average this probability with the Ewens sampling formula expressed in terms of the number of each types as
The result is the sized-biased partition probability
Here, we use the lemma in Donnelly and Joyce (1989):
Kingman (1993, Sec. 9.6) illustrates the process of size biasing on the Dirichlet distribution and shows that the size-biased version of the Dirichlet distribution is the GEM distribution. As n goes to infinity while ne goes to θ, this shows that the Poisson-Dirichlet distribution has the GEM distribution as its size-biased version. We give a heuristic derivation here. For rigorous demonstration of this fact see Donnelly and Joyce (1989, Theorem 5).
Start with a vector x = (x1, x2, ■ ■ ■, xn) whose components are all positive and sum to one. The components are distributed as the symmetric Dirichlet distribution, ^D(e, n). A random variable n picks an integer out of {1, 2, ■■■, n} with
Define a new vector x' by rearranging the components of x by putting xν as the first component. The components of this new vector are distributed as the Dirichlet distribution D(e + 1, ∈,..., e). By integrating the variables from the second to the nth, we see that xν has the density
Name this random variable z1, that is, z1 has the density displayed above.
The remaining n - 1 components of x sum to 1 - xν, i.e., 1 - z1. Renormalize these components by dividing them by 1 - z1, and name the resultant vector x(1). The components of this new vector are positive and sum to one. Theirjoint distribution is D(e, n - 1). We repeat the above process of selecting randomly a component from this new vector, x^, say. It has the density
Name this random variable z2.
By repeating the above process the components of the original vector x are rearranged into (y1, y2,...) with
j = 1, 2,..., where Zj has exactly the density shown at the beginning of this section.
Hoppe (1987) establishes the relationship between the residual allocation model and the size-biased sampling
as follows. If Pi is distributed as is randomly deleted and the remaining population is rescaled so that the rescaled fractions sum to one, and the rescaled population is the same as the original one except for the numbering of the types, then the population is Poisson-Dirichlet, and conversely, if a type is randomly deleted from a Poisson-Dirichlet distribution, then the distribution of the rescaled residual population is the same as the Poisson-Dirichlet distribution except for the numbering.
Pitman (1996) has allowed the residual allocation models to have dependent residual fractions. We next describe his characterizations of residual allocation models that are invariant with respect to size-biased permutations.
Construct a sequence ∏n of random partitions of [n] as follows. We think of them in terms of sequential sampling without replacement of n agents from a large population of agents.
Given a sequence of random variables (P1, P2,...) such that
with
consists of the first agent sampled. With probability P1 the next agent sampled is of the same type as the first, and with probability A1 = 1 - P1 he is of a new type and is put in a new block. Thus the probability that the block containing the first agent, A1, is of size n1 is

Withtwotypesofagentsinthesample5Hndwiththesizeof Ai being ri, i = 1, 2, we have
and denote the infinite-dimensional simplex by
Define its subset
Denoting a point in the cube by y, we define a map T from Q into Δ by X1 = y 1, and for i ≥ 2 by 
we recognize that the measure κm is constructed by size-biased sampling as follows: first pick x1, which is uniformly distributed on [0,1]. Next, pick x2, which is uniformly distributed on [0, 1 - x1]. Point xi is thus uniformly distributed on the residual interval [0, 1 - x1 - x2 ∙ ∙ ∙ - xi-1].
Next, we follow Ignatov (1982) and introduce a one-to-one continuous map of Q into itself by8
for some positive θ. For simpler notation we drop θ from L from now on. Then the transformation T(L(y)) is such that
and 
where we drop the subscript θ. We examine the first coordinate of n and denote it by h:
To proceed further we note that h is absolutely continaous with respect to the Lebesgue measure on [0, 1]. Denote the density by g(u). The integral equation can be expressed conveniently as
In the range of u ∈ [1 /2, 1], the integral above on the right-hand side is zero, since g(v) is. Therefore, we obtain
for u in [1 /2, 1]. This result has also been obtained by Watterson (1976) and Watterson and Guess (1977).