ICDE Influential Paper Awards
Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer and Muthuramakrishnan Venkitasubramaniam
Citation: L-diversity is method for sharing sensitive data in a privacy-preserving way. The method sanitizes the dataset in order to protect the confidentiality of individuals in the data, while still preserving aggregate statistics. L-diversity was the one of the first methods to demonstrate that user-level information can be protected from attackers without having to precisely specify or know the background knowledge of the attackers. The work started a line of foundational research into formal privacy definitions and algorithms for privacy preserving data publication. Today, l-diversity is often used when solutions with stronger privacy constraints do not leave sufficient utility in the published data.
Ninghui Li, Tiancheng Li and Suresh Venkatasubramanian
Citation: This paper introduces a novel privacy notion called t-closeness for preventing the disclosure of a sensitive attribute. It identifies as privacy-preserving the hypothetical situation where all potentially identifying attributes are removed and only the distribution of the sensitive attribute in the overall population is published. And t-closeness limits any additional information one can learn. The proposed privacy notion is elegant and thought-provoking, and has significantly influenced subsequent research in data privacy.
Michael Stonebraker, Ugur Çetintemel
Citation: This paper asks the question whether we should continue to build general-purpose database systems or whether we should start building special-purpose systems that address a specific class of workloads. This question has raised heavy and ongoing debates in both industry and academia since 2005. The paper makes a case for special-purpose systems because they can achieve orders of magnitude better performance for their specific target workload.
Jeffrey Considine, Feifei Li, George Kollios, John W. Byers
Citation: The paper describes novel methods to handle duplicate-sensitive aggregates over distributed datasets. It carefully extends the duplicate-insensitive Flajolet-Martin method, adapting it to require little computation and communication efforts, and make it robust to link losses. This work has been highly impactful in the area of sensor networks, and has been shown to be applicable to any setting with multiple data sources that may suffer network failures, such as distributed data centers of today.
Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov
Sergey Melnik, Hector Garcia-Molina, Erhard Rahm: Similarity Flooding
Citation: Together, these two papers describe techniques to match and mediate schemas. They show how to exploit schema structures for matching, how peer data management forms a next logical step for data integration research, and how to mediate among schemas in peer-to-peer settings. The proposed techniques are scalable and elegant, and have significantly influenced subsequent research in schema matching and peer data management.
Sanjay Agrawal, Surajit Chaudhuri, Gautam Das: DBXplorer
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, S. Sudarshan
Citation: Together, these two papers from ICDE 2002 laid the foundations for keyword search over relational databases, paving the way for a significant body of follow-on work in the area of Information Retrieval and Databases. The solutions presented in these papers are elegant and highly effective.
Stephan Börzsönyi, Donald Kossmann, Konrad Stocker
Citation: Skyline computation (a.k.a. the maximum vector problem) is a fundamental concept in multi-criteria decision making. This highly influential paper opened a new research topic in the database community. It framed the skyline concept in a database setting and offered a study of fundamental techniques for skyline query processing. The paper laid a solid foundation for a multitude of studies that have refined the concept of skylining and proposed efficient implementations in a variety of settings.
Kin-pong Chan, Ada Wai-Chee Fu
Citation: This paper proposed the first efficient time-series indexing method by making use of discrete wavelet transform (DWT) and greatly influenced subsequent work on indexing of time series. It also showed that DFT (Discrete Fourier Transform) may not be the best representation for dimensionality reduction in time series, leading to significant research into alternative representations as well as wavelet-based scalable data analysis.
Rakesh Agrawal and Ramakrishnan Srikant
Citation: This paper launched a new area in data mining. Sequential pattern mining has since become an important and active area with a variety of applications and much published work. The paper is a milestone in the field of data mining.
Kenneth Salem and Hector Garcia-Molina
Citation: This early paper on disk striping significantly influenced subsequent work on RAID storage.
Jim Gray, Adam Bosworth, Andrew Layman, Hamid Pirahesh
Citation: This seminal paper defined a simple SQL construct that enables one to efficiently compute aggregations over all combinations of group-by columns in a single query, where previous approaches required multiple queries. This feature has had significant impact on industry and is now incorporated in all major database systems.
Goetz Graefe, William J. McKenna
Citation: This seminal paper laid the foundation for transformation-based query optimizers. Volcano was the first optimizer framework based on this approach and inspired several others. Multiple commercial database systems rely on transformation-based query optimizers.