Conference Paper (published)

What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents

Details

Citation

Beck T, Böschen F & Scherp A (2018) What to Read Next? Challenges and Preliminary Results in Selecting Representative Documents. In: Elloumi M, Granitzer M, Hameurlain A, Seifert C, Stein B, Tjoa A & Wagner R (eds.) Database and Expert Systems Applications. DEXA 2018. Communications in Computer and Information Science, 903. 29th International Conference on Database and Expert Systems Applications, DEXA 2018, Regensburg, Germany, 03.09.2018-06.09.2018. Cham, Switzerland: Springer International Publishing, pp. 230-242. https://doi.org/10.1007/978-3-319-99133-7_19

Abstract
The vast amount of scientific literature poses a challenge when one is trying to understand a previously unknown topic. Selecting a representative subset of documents that covers most of the desired content can solve this challenge by presenting the user a small subset of documents. We build on existing research on representative subset extraction and apply it in an information retrieval setting. Our document selection process consists of three steps: computation of the document representations, clustering, and selection of documents. We implement and compare two different document representations, two different clustering algorithms, and three different selection methods using a coverage and a redundancy metric. We execute our 36 experiments on two datasets, with 10 sample queries each, from different domains. The results show that there is no clear favorite and that we need to ask the question whether coverage and redundancy are sufficient for evaluating representative subsets.

Keywords
Representative document selection; Document clustering

StatusPublished
FundersEuropean Commission
Title of seriesCommunications in Computer and Information Science
Number in series903
Publication date31/12/2018
Publication date online07/08/2018
URLhttp://hdl.handle.net/1893/27854
PublisherSpringer International Publishing
Place of publicationCham, Switzerland
eISSN1865-0937
ISSN of series1865-0929
ISBN9783319991320
eISBN9783319991337
Conference29th International Conference on Database and Expert Systems Applications, DEXA 2018
Conference locationRegensburg, Germany
Dates

Files (1)