A large scale evaluation has been conducted by
Google in 2006[2] to compare the performance of
Minhash and Simhash[3] algorithms. In 2007 Google reported using Simhash for duplicate detection for web crawling[4] and using Minhash and
LSH for
Google News personalization.[5]
^Charikar, Moses S. (2002), "Similarity estimation techniques from rounding algorithms", Proceedings of the 34th Annual ACM Symposium on Theory of Computing, p. 380,
doi:
10.1145/509907.509965,
ISBN978-1581134957,
S2CID4229473.