The Hopkins statistic (introduced by Brian Hopkins and
John Gordon Skellam) is a way of measuring the
cluster tendency of a data set.[1] It belongs to the family of sparse sampling tests. It acts as a
statistical hypothesis test where the
null hypothesis is that the data is generated by a
Poisson point process and are thus uniformly randomly distributed.[2] If individuals are aggregated, then its value approaches 0, and if they are randomly distributed, the value tends to 0.5.[3]
Preliminaries
A typical formulation of the Hopkins statistic follows.[2]
Let be the set of data points.
Generate a random sample of data points sampled without replacement from .
Generate a set of uniformly randomly distributed data points.
Define two distance measures,
the minimum distance (given some suitable metric) of to its nearest neighbour in , and
the minimum distance of to its nearest neighbour
Definition
With the above notation, if the data is dimensional, then the Hopkins statistic is defined as:[4]
Under the null hypotheses, this statistic has a Beta(m,m) distribution.
Notes and references
^Hopkins, Brian; Skellam, John Gordon (1954). "A new method for determining the type of distribution of plant individuals". Annals of Botany. 18 (2). Annals Botany Co: 213–227.
doi:
10.1093/oxfordjournals.aob.a083391.
^Cross, G.R.; Jain, A.K. (1982). "Measurement of clustering tendency". Theory and Application of Digital Control: 315-320.
doi:
10.1016/B978-0-08-027618-2.50054-1.