Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
Finding nearest neighbors is an important topic that has attracted much attention over the years and has applications in many fields, such as market basket analysis, plagiarism and anomaly detection, community detection, ligand-based virtual screening, etc. As data are easier and easier to collect, finding neighbors has become a potential bottleneck in analysis pipelines. Performing pairwise comparisons given the massive datasets of today is no longer feasible. The high computational complexity of the task has led researchers to develop approximate methods, which find many but not all of the nearest neighbors. Yet, for some types of data, efficient exact solutions have been found by carefully partitioning or filtering the search space in a way that avoids most unnecessary comparisons.In recent years, there have been several fundamental advances in our ability to efficiently identify appropriate neighbors, especially in non-traditional data, such as graphs or document collections. In this tutorial, we provide an in-depth overview of recent methods for finding (nearest) neighbors, focusing on the intuition behind choices made in the design of those algorithms and on the utility of the methods in real-world applications. Our tutorial aims to provide a unifying view of "neighbor computing" problems, spanning from numerical data to graph data, from categorical data to sequential data, and related application scenarios. For each type of data, we will review the current state-of-the-art approaches used to identify neighbors and discuss how neighbor search methods are used to solve important problems.
David Anastasiu, Huzefa Rangwala, and Andrea Tagarelli. "Tutorial: Are You My Neighbor?: Bringing Order to Neighbor Computing Problems" Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019). https://doi.org/10.1145/3292500.3332292
SJSU users: Use the following link to login and access the article via SJSU databases.This article was originally presented during the Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, and can also be found online here.© 2019 Copyright held by the owner/author(s)