Faced with too much information, how do we find the most relevant databases? This paper introduces query-based sampling, a novel technique for acquiring resource descriptions to improve text database selection. Unlike existing methods, query-based sampling does not require cooperation from resource providers, making it suitable for wide-area networks. The study demonstrates that this technique creates accurate resource descriptions efficiently, enabling automatic database selection and improving information retrieval performance. This represents an important step forward in overcoming limitations of existing techniques.
Published in ACM Transactions on Information Systems, this paper aligns with the journal's focus on information retrieval, database management, and information systems architecture. The proposed query-based sampling technique directly addresses the problem of resource discovery in large-scale information systems, a core area of interest for the journal.