Large clusters
When a large number of agents interact in a market and form clusters or groups, the number of groups formed often depends crucially on the correlation among agents. As the level of aggregation increases, namely, as the number of agents over which averages are being formed increases, the range of correlations increases.
The higher the probability that two randomly chosen agents are of the same type (use the same strategy, share the same view of the future, and so on), the smaller the number of groups in general.A simple scalar parameter θ specifies this degree of correlatedness of two randomly chosen agents in the Ewens distribution, which we use in some of our later analysis. The closer the value of θ to zero, the larger is the probability that two randomly chosen agents are of the same type. The larger the value of θ, the more likely that two randomly chosen agents are not of the same type. We derive expressions for the expected number of clusters as a function of this parameter θ.
In the rest of this section, we treat θ as exogenously fixed, although it may quite possibly be endogenously generated in some models.
10.6.1 Expected value of the largest cluster size
Now, we assume that exchangeable agents have many choices. Denote the number of agents by n, and by K the number of choices, types, categories, or subgroups as the modeling context dictates. We regard both n and K as large. Here, they are kept fixed for ease of explanation, even though they are actually random variables in many applications.
Suppose that fractions X1, X2,... describe the population composition by types of exchangeable agents. Subscripts 1, 2, and so on have no intrinsic meaning, but are a mere convenience in referring to different clusters. Denote the largest fraction by X(1). Here, we follow Watterson and Guess (1977) to show how to calculate its expected value.
An entirely analogous procedure, given later, can calculate the joint probability density for r order statistics of the fractions. From now on we use xs as realized values instead of X's.By exchangeability we can assume that xκ is X(1), i.e., assume without loss of generality that
for i = 1,..., K — 1. We assume that the x's are jointly distributed on the K - dimensional simplex with a symmetric Dirichlet distribution. Readers may be puzzled by the sudden introduction of this distribution here. The use of the Dirichlet distribution is based on the deep connection of this distribution with the representation of random exchangeable partitions introduced by Kingman (1978a,b), and later expounded by Zabell (1992). We do not stop here to explain these facts, but go directly to calculate the expected size of the largest fraction governed by the Dirichlet distribution. See also Costantini and Garibaldi (2000) as well as Appendices A.7 and A.10.4 for further explanations of the connections.
Change variables from the x 's to
Then, we have
where φ is the symmetric Dirichlet distribution with parameter e,
See Appendix A.10 for details.
and let ' go to infinity and e to zero while the product goes to a nonnegative value θ. We see then

10.6.2 Joint probability density for the largest r fractions
We next derive the joint probability density for the largest r fractions on the K-dimensional simplex x(1) ≥ x(2) ≥ ∙ ∙ ∙ ≥ x(r), where xi, i = 1, 2,..., K, are the fractions.
Denote the Dirichlet probability density on the simplex by φ(x1, x2,..., xk) = D(∈, K). Then the probability density for the first r order statistics is given by
in which the integration is carried out over the area
where a := x1 + x2 +----- + xr.
As in the case of the largest fraction, introduce a random variable Z with the density function gκ-r-1, which is the (K - r - 1)-fold convolution of the density ∈ye- 1, j = r + 1,..., K - 1. The integral is approximately given by
where γ is Euler's constant, γ = 0.5772.... Putting all together, we arrive at
for x1 between 1 /2 and 1. To obtain the expression for the density in the range
Differentiating the integral equation with respect to z, we derive the differential equation that determines the function recursively:
where z ≥ 0.
In the range z ∈ [0, 1), this integro-differential equation yields the result we obtained above. In the next range z ∈ [1, 2) we have
Changing variable of integration to
we note that the integration above
becomes
The joint density for the two largest fractions is given by
This expression is valid for the range 0 < y < x < 1, 0 < x + y < 1, and x + 2y > 1, that is, y > (1 - x)/2.
We know that
for z between 0 and 1. For other values of z, we have a recursion
see Watterson and Guess (1977). Alternatively put, we have
in the range n ≤ z < n + 1. This can be verified by direct substitution into the differential equation for gθ.
10.7