Random sampling with a reservoir

Article Properties
Abstract
Cite
Vitter, Jeffrey S. “Random Sampling With a Reservoir”. ACM Transactions on Mathematical Software, vol. 11, no. 1, 1985, pp. 37-57, https://doi.org/10.1145/3147.3165.
Vitter, J. S. (1985). Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1), 37-57. https://doi.org/10.1145/3147.3165
Vitter JS. Random sampling with a reservoir. ACM Transactions on Mathematical Software. 1985;11(1):37-5.
Journal Categories
Science
Mathematics
Instruments and machines
Electronic computers
Computer science
Science
Mathematics
Instruments and machines
Electronic computers
Computer science
Computer software
Technology
Electrical engineering
Electronics
Nuclear engineering
Electronics
Computer engineering
Computer hardware
Technology
Technology (General)
Industrial engineering
Management engineering
Applied mathematics
Quantitative methods
Description

Need to pick a random sample from a stream of data without knowing the total size? This paper introduces efficient algorithms for selecting a random sample of *n* records from a pool of *N* records when *N* is unknown in advance. The core contribution is Algorithm Z, which achieves this in one pass using constant space and *O*(*n*(1 + log(*N/n*))) expected time. The authors explore several optimizations that significantly improve the algorithm's speed. Theoretical and empirical results demonstrate that Algorithm Z outperforms existing methods, offering a substantial improvement in efficiency. This research provides a practical and highly optimized solution for a common problem in computer science and statistics. The efficient Pascal-like implementation makes Algorithm Z readily applicable in various data processing scenarios. Algorithm Z is the gold standard for random sampling.

Published in ACM Transactions on Mathematical Software, this paper fits within the journal's focus on algorithms and software for mathematical problems. The paper is a significant result for software designers.

Refrences
Citations
Citations Analysis
The first research to cite this article was titled Random sampling from hash files and was published in 1990. The most recent citation comes from a 2024 study titled Random sampling from hash files . This article reached its peak citation in 2023 , with 45 citations.It has been cited in 202 different journals, 13% of which are open access. Among related journals, the IEEE Transactions on Knowledge and Data Engineering cited this research the most, with 22 citations. The chart below illustrates the annual citation trends for this article.
Citations used this article by year