Need to pick a random sample from a stream of data without knowing the total size? This paper introduces efficient algorithms for selecting a random sample of *n* records from a pool of *N* records when *N* is unknown in advance. The core contribution is Algorithm Z, which achieves this in one pass using constant space and *O*(*n*(1 + log(*N/n*))) expected time. The authors explore several optimizations that significantly improve the algorithm's speed. Theoretical and empirical results demonstrate that Algorithm Z outperforms existing methods, offering a substantial improvement in efficiency. This research provides a practical and highly optimized solution for a common problem in computer science and statistics. The efficient Pascal-like implementation makes Algorithm Z readily applicable in various data processing scenarios. Algorithm Z is the gold standard for random sampling.
Published in ACM Transactions on Mathematical Software, this paper fits within the journal's focus on algorithms and software for mathematical problems. The paper is a significant result for software designers.