From Wikipedia, the free encyclopedia
Rocchio Classification

In machine learning, a nearest centroid classifier or nearest prototype classifier is a classification model that assigns to observations the label of the class of training samples whose mean ( centroid) is closest to the observation. When applied to text classification using word vectors containing tf*idf weights to represent documents, the nearest centroid classifier is known as the Rocchio classifier because of its similarity to the Rocchio algorithm for relevance feedback. [1]

An extended version of the nearest centroid classifier has found applications in the medical domain, specifically classification of tumors. [2]

Algorithm

Training

Given labeled training samples with class labels , compute the per-class centroids where is the set of indices of samples belonging to class .

Prediction

The class assigned to an observation is .

See also

References

  1. ^ Manning, Christopher; Raghavan, Prabhakar; Schütze, Hinrich (2008). "Vector space classification". Introduction to Information Retrieval. Cambridge University Press.
  2. ^ Tibshirani, Robert; Hastie, Trevor; Narasimhan, Balasubramanian; Chu, Gilbert (2002). "Diagnosis of multiple cancer types by shrunken centroids of gene expression". Proceedings of the National Academy of Sciences. 99 (10): 6567–6572. doi: 10.1073/pnas.082099299. PMC  124443. PMID  12011421.