Frequency spectrum

The expression of the marginal density for x larger than 1/2 appears often in dealing with distributions of fractions, such as the Ewens distribution. The notion of frequency spectrum was introduced by Ewens (1972) as a way of expressing the average number of clusters of sizes between two specified limits, and is quite useful in calculating some possibly complicated expressions of cluster sizes in a straightforward way.

Here is a typical example of how this notion arises. Suppose we have a function of fractions

say, where h is some bounded and continuous function, and wish to evaluate its expected value. Its expectation is

where the last equality follows by exchangeability of the random variables. SincethemarginaldistributionofXj of aDirichlet distribution D(a₁,..., a_κ)

is Beta (α_j∙,∑_i=j a_i), we have the marginal density for x₁ as

We use it to evaluate the expected value in a straightforward calculation. Using the relation Γ(e) = Γ(1 + ∈)∕∈, and letting K∈ go to θ as ∈ goes to zero and K to infinity, we see that the expected value of S is evaluated by

This clearly exhibitsas the probability density of x in h(x).

It is known as the frequency spectrum in the literature of population genetics. The same expression appears in discussions of relative sizes of basins of attraction of certain random dynamics in the physics literature with θ = 0.5 (Derrida and Flyvbjerg 1987) and in the statistics literature (Aldous 1985), apparently independently.

Donnelly et al. (1991) show how to obtain this as the limit of a discrete frequency spectrum with n points in random mappings of {1, 2,..., n} onto itself, where the discrete frequency spectrum is the expected value of the clusters of specified size. They show how the random mappings with θ = 0.5 and the Ewens sampling formula with θ = 0.5 give nearly the same order statistics for large n.

10.8.1 Definition

The expected number of types with fractions in the interval (x, x + dx) is called the frequency spectrum. It is important in that the expected values of many variables of interest can be calculated readily in terms of it.

The mean number of types with fractions between α and β then is given by

The particular frequency spectrum we have seen above is

For small x, it behaves likewhich indicates that there are many types with small fractions. This function is not normalizable, but

is normalizable. This function is interpreted as the probability that a randomly chosen sample is of the type with fractions between x and x + dx. This function is normalizable:

10.8.2 Herfindahl index of concentration

In many situations, a set of fractions {x_i}”, that is, a set of positive-valued variables that sum to one,arises naturally. For example x_i could

be the “share,” broadly interpreted, of firm i in some industrial sector, or the relative size of a basin of attraction.

We discuss the distribution Π(T) of the quantity This is called

the Herfindahl index of concentration in the older industrial-organization literature. See Scherer (1980), for example. In the population-genetics literature, it is called the homozygosity and is the probability that two randomly selected genes at a given locus are identical. In ecology, the relative abundances of different species within a community are of interest. Here Y is the probability that two randomly selected individuals are of the same species. A similar interpretation is available in the industrial-organization literature: The probability that two randomly selected economic agents are of the same type is given by Π(Y). Recall that agent types may refer to size of firms, size of shares, or other characteristics. It is also interesting that this same variable Y is treated in the physics literature by Derrida and Flyvbjerg (1987) when they discuss distributions of

A simple heuristic derivation of the notion of frequency spectrum is given first. See Kingman (1978b) or Aldous (1985) for rigorous derivations.

10.8.3 Aheuristicderivation

Suppose that a partition vector is distributed as the Ewens distribution. The expected value of aj is

in this integral.

We can justify our interpretation of the frequency spectrum given in the definition above by calculating the expected number of types in the population

where the lower limit of integration is denoted by ∈ = 1/n.

Note also that the summation over j equals one in the integrand. In words, the integral of the frequency spectrum from 0 to 1 gives the expected (i.e., average) number of types. We may thus interpret the expression f (p) dp as the probability that a type exists in the population with relative frequency (fraction) between p and p + dp, or m (p, p + dp). The expression pf (p) = θ(1 - p)^θ may be interpreted as the probability that an agent drawn at random from population is of a type with fraction in (p, p + dp).

To illustrate the use of this notion, suppose that type i agents constitute a fraction p_i of the whole population of agents. Then, the probability that the (j + 1)th draw of the agents from the population is a new type not so far sampled is given by

This gives an interpretation of the parameter θ: The probability that the next draw from the population is a new type is smaller, the smaller the value of θ.

We describe several ways this frequency spectrum arises. One way is in connection with the residual allocation process.

To interpret the parameter θ in the frequency spectrum, we introduce sequential sampling into the relationship between the sample sizes and the numbers of different types of agents contained in the samples. Suppose we take two samples. The probability that they are of the same type is given by

Thus, the larger the value of θ, the smaller the probability that two samples are of the same type. In this sense, the parameter θ represents correlatedness of samples. For k > 1, we compute

This is the probability that first k samples are all of the same type. The next expression,

10.8.4 Recursion relations

Let

We know from the above that the probability that the first j draws produce the same type is

whereThe probability that the first j samples

are all of different types is

The random variables q_ni are governed by the recursion relation

It can be represented by

where c(n, i) is the unsigned Stirling number of the first kind, because

which agrees with the recursion for the qs.

It is the number of cycles of size k in permutations of n symbols.

10.8.5 Examples of applications

Take a sample of size n, and let K_n be the number of different types in the sample. This number may be represented as

where the random variable ξ_i is one if type i is present in the sample and zero otherwise.

Denoteby p_i the relative frequency, or the fraction, of type i in the population. Then the expected number of types present in the sample of size n is

where p stands for the vector with components p_i. This is evaluated in terms of the frequency spectrum as

We return to this expression later.

10.8.6 Discrete frequency spectrum

Donnelly et al. (1991) showed that the expected number of components of random mappings from [n] = {1, 2,..., n} to [n] is

where

is the probability that a random mapping from [ j] to itself is indecomposable, i.e., has a single component. (See Katz (1955, p. 515).)

As n goes to infinity, we see that

Note that jr(j) is the probability of a Poisson random variable with mean j having values less than its mean.

If we keep x = j/n fixed, and let j and n go to infinity, then

which shows that the frequency spectrum of the random map has θ = 1/2. This has been derived by Aldous (1985) and is noted also by Derrida and Flyvjberg (1987).

10.9

<< | >>

↑

Source: Aoki M.. Modeling Aggregate Behaviour & Fluctuations in Economics. Cambridge: Cambridge University Press,2002. — 281 p.. 2002

Frequency spectrum

More on the topic Frequency spectrum: