Ewens distribution

Now, we introduce the partition vector a = (a₁, a₂,..., a_n), where a_k is the number of types or clusters with exctly k agents. Consequently, we have an inequality

where K_n is the number of groups or clusters formed by n agents, and which is an accounting identity.

To further simplify our presentation, let us suppose that hj = h for all j in (11.1). Then

This is so because there are aj of the n's that equal j.

Now let K become very large to allow for the possibility of indefinitely many types. To keep the mean finite we make h very small, so that the product Kh approaches a positive constant θ. We note that the negative-binomial expression many ways of realizing the vector a. Hence,

Noting thatapproachesin the limit of K becoming

infinite and h approaching 0 while keeping Kh at θ, we arrive at the Ewens distribution

where θ[”] := θ(θ + 1) ∙ ∙ ∙ (θ + n — 1). This distribution is very well known in the genetics literature; see Ewens (1972), Kingman (1978a,b), or Johnson et al.

(1997). This distribution has been investigated by Arratia and Tavare (1992) and Hoppe (1987) among several others. Kingman (1980) states that this distribution arises in many applications. There are other ways of deriving it; see Costantini and Garibaldi (1999). We next examine some of its properties, following Watterson (1976).

11.2.1 Thenumber of clusters and value of θ

The Ewens sampling formula has a single parameter θ, which was introduced in the previous section as the limit of Kh as K goes to infinity while h goes to zero. We introduce another interpretation here. Its value influences the number of clusters formed by the agents. Smaller values of θ tend to produce a few large clusters, while larger values produce a large number of smaller clusters.

To obtain quickly some intuitive understanding of the effects of the value of θ on the cluster size distributions, take n = 2 and a₂ = 1. All other as are zero. Then,

This shows that two randomly chosen agents are of the same type with large probability when θ is small, and with small probability when θ is large.

Two extreme situations also reveal connections between the value of θ and the number of clusters. We note that the probability of n agents forming a single cluster is given by

while the probability that n agents form n singletons is given by

With θ much smaller than one, the former probability is approximately equal to 1, while the latter is approximately equal to zero. When θ is much larger than n the opposite is true.

We can show that the probability of n agents forming k clusters is given by

where c(n, k) is a signless Stirling number of the first kind, introduced in Appendix A.5, and is defined by

See Hoppe (1987) for the derivation. This number is the number Ofpermutations of n symbols withexactly k cycles.

Hoppe's urn model ofthe Ewens distribution makes the occurrence of this number natural.

We can use this formula to verify that the expected number of types increases with θ.As θ goes to infinity, the expected number of types approaches n, namely, total fragmentation of agents in the sample by types. For small values of θ, Ewens has shown that

where γ = 0.577 is Euler's constant.

11.2.2 Expected values of the fractions

The expected value of aj is given earlier in Section 10.8.3 by

whereWe can evaluate the effects of increasing

correlations or mutual dependence on the size of Eaj by taking the partial derivative of it with respect to θ: As θ increases, Eaj for j much smaller than n increases linearly in θ.

We explain how to calculate moments in Section 10.7, following Watterson (1976). For example the variance and covariances are computed by using the relation

Note that the standard deviations of the as are of the same order of magnitude as the means.

The order statistics of the fractions, x₍₁₎ ≥ X(₂) ≥ ∙ ∙ ∙, are important in markets with highly correlated agents. With θ smaller than 1, the sum of two or three largest fractions can be shown to be nearly one. See Table III of Watterson and Guess (1977), where numerical values of the expected values of the largest fraction are listed for different values of θ. For example, with θ = 0.3, 0.4, and 0.5, the expected value of the largest fraction is E(x(₁)) = 0.84, 0.79, and 0.76.

They calculated these figures numerically. We describe some theoretical background in the next section.

11.2.3 The largest two shares

Next, we calculate the joint distribution of largest two shares, by setting 2 to 2 in Section 10.6.2 as outlined in Watterson and Guess (1977).

Let x and y be the largest two fractions. Theirjoint density is

in the region x + 2y ≥ 1 and x + y ≤ 1. Its partial derivative with respect to y vanishes on the line 2x + 3y = 2, which is located in the region where the expression given above holds. This line is a ridge along which the most probable values of y, given x are located. With θ = 0.5, E(x) = 0.758 by Table III of Watterson and Guess. Approximating y by the most probable value, y ≈ 2/3 - 2x/3, we calculate Ey. We also know that Ey ≈ Exθ(ln2 - θ∕2). They both give the value Ey ≈ 0.16. We may approximate y by the equation for the most probable y without too much error.

The marginal probabilty density of the largest fraction is

forWhen x is not greater than 1/2, the expression is more

complex:

where g(∙) is the density of the random variable Z introduced in Section 10.6.2, and is characterized in terms of its Laplace transform.

11.3

<< | >>

↑

Source: Aoki M.. Modeling Aggregate Behaviour & Fluctuations in Economics. Cambridge: Cambridge University Press,2002. — 281 p.. 2002

Ewens distribution

More on the topic Ewens distribution: