A.12 GEM and size-biased distributions

Ewens(1990, Sec. 13) has defined a distribution for random variables x₁, x₂,..., x_n_-₁ that is a special case of the residual allocation model introduced by Halmos (1944) and called the GEM distribution, after Griffiths, Engen, and McClosky.

Itis defined by x₁ = z₁, x_i = z_i (1 - z_i_-1) ∙ ∙ ∙ (1 - z₁), i = 2, 3,..., where the z's are i.i.d. with desnity θ(1 - z)^θ^-1, O < z < 1,0 < θ < ∞.A finite version is for x₁,, x_n with z j having density

See Donnelly and Joyce (1989). As n goes to infinity while ne goes to θ, this density converges to Beta(1, θ).

Let x have the GEM distribution, and Let y be defined as follows: y ₁ = x_i with probability x_i, and given y₁, y₂ is defined to be Xj withand

so on. Then, y has the same distribution as x. We say that the GEM distribution is invariant under size biasing. Since this notion is important, we summarize the steps involved in size-biasing in a context of sampling. First, pick one agent at random out of n agents. Label his type i₁, and remove from the sample of all agents of the same type. Second, choose randomly an agent from the remaining agents, and label his type i₂. Remove all agents of this second type from the remaining agents, and continue.

Fix the number of the types of agents in the sample of size n at K. The size biasing permutes the types {1, 2,..., K} into π* = {i₁, i₂,..., i_κ}.

We have, letting nj denote the number of agents of type j,

Average this probability with the Ewens sampling formula expressed in terms of the number of each types as

The result is the sized-biased partition probability

Here, we use the lemma in Donnelly and Joyce (1989):

Kingman (1993, Sec. 9.6) illustrates the process of size biasing on the Dirichlet distribution and shows that the size-biased version of the Dirichlet distribution is the GEM distribution. As n goes to infinity while ne goes to θ, this shows that the Poisson-Dirichlet distribution has the GEM distribution as its size-biased version. We give a heuristic derivation here. For rigorous demonstration of this fact see Donnelly and Joyce (1989, Theorem 5).

Start with a vector x = (x₁, x₂, ■ ■ ■, x_n) whose components are all positive and sum to one. The components are distributed as the symmetric Dirichlet distribution, ^D(e, n). A random variable n picks an integer out of {1, 2, ■■■, n} with

Define a new vector x' by rearranging the components of x by putting x_ν as the first component. The components of this new vector are distributed as the Dirichlet distribution D(e + 1, ∈,..., e). By integrating the variables from the second to the nth, we see that x_ν has the density

Name this random variable z₁, that is, z₁ has the density displayed above.

The remaining n - 1 components of x sum to 1 - x_ν, i.e., 1 - z₁. Renormalize these components by dividing them by 1 - z₁, and name the resultant vector x⁽¹⁾. The components of this new vector are positive and sum to one. Theirjoint distribution is D(e, n - 1). We repeat the above process of selecting randomly a component from this new vector, x^, say. It has the density

Name this random variable z2.

By repeating the above process the components of the original vector x are rearranged into (y₁, y₂,...) with

j = 1, 2,..., where Zj has exactly the density shown at the beginning of this section.

Hoppe (1987) establishes the relationship between the residual allocation model and the size-biased samplingas follows. If P_i is distributed as is randomly deleted and the remaining population is rescaled so that the rescaled fractions sum to one, and the rescaled population is the same as the original one except for the numbering of the types, then the population is Poisson-Dirichlet, and conversely, if a type is randomly deleted from a Poisson-Dirichlet distribution, then the distribution of the rescaled residual population is the same as the Poisson-Dirichlet distribution except for the numbering.

Pitman (1996) has allowed the residual allocation models to have dependent residual fractions. We next describe his characterizations of residual allocation models that are invariant with respect to size-biased permutations.

Construct a sequence ∏_n of random partitions of [n] as follows. We think of them in terms of sequential sampling without replacement of n agents from a large population of agents.

Given a sequence of random variables (P₁, P₂,...) such that

with

consists of the first agent sampled. With

probability P₁ the next agent sampled is of the same type as the first, and with probability A₁ = 1 - P₁ he is of a new type and is put in a new block. Thus the probability that the block containing the first agent, A₁, is of size n₁ is

Withtwotypesofagentsinthesample₅Hndwiththesizeof A_i being r_i, i = 1, 2, we have

and denote the infinite-dimensional simplex by

Define its subset

Denoting a point in the cube by y, we define a map T from Q into Δ by X₁ = y ₁, and for i ≥ 2 by

we recognize that the measure κ_m is constructed by size-biased sampling as follows: first pick x₁, which is uniformly distributed on [0,1]. Next, pick x₂, which is uniformly distributed on [0, 1 - x₁]. Point x_i is thus uniformly distributed on the residual interval [0, 1 - x₁ - x₂ ∙ ∙ ∙ - x_i_-1].

Next, we follow Ignatov (1982) and introduce a one-to-one continuous map of Q into itself by⁸

for some positive θ. For simpler notation we drop θ from L from now on. Then the transformation T(L(y)) is such that

and

where we drop the subscript θ. We examine the first coordinate of n and denote it by h:

To proceed further we note that h is absolutely continaous with respect to the Lebesgue measure on [0, 1]. Denote the density by g(u). The integral equation can be expressed conveniently as

In the range of u ∈ [1 /2, 1], the integral above on the right-hand side is zero, since g(v) is. Therefore, we obtain

for u in [1 /2, 1]. This result has also been obtained by Watterson (1976) and Watterson and Guess (1977).

<< | >>

↑

Source: Aoki M.. Modeling Aggregate Behaviour & Fluctuations in Economics. Cambridge: Cambridge University Press,2002. — 281 p.. 2002

A.12 GEM and size-biased distributions

More on the topic A.12 GEM and size-biased distributions: