Conference Paper (published)

Using Titles vs. Full-text as source for automated semantic document annotation

Details

Citation

Galke L, Mai F, Schelten A, Brunsch D & Scherp A (2017) Using Titles vs. Full-text as source for automated semantic document annotation. In: Proceedings of the Knowledge Capture Conference K-Cap 2017. Knowledge Capture Conference 2017, Austin, TX, USA, 04.12.2017-06.12.2017. New York: ACM, p. Article 20. https://doi.org/10.1145/3148011.3148039

Abstract
We conduct the first systematic comparison of automated semantic annotation based on either the full-text or only on the title metadata of documents. Apart from the prominent text classification baselines kNN and SVM, we also compare recent techniques of Learning to Rank and neural networks and revisit the traditional methods logistic regression, Rocchio, and Naive Bayes. Across three of our four datasets, the performance of the classifications using only titles reaches over 90% of the quality compared to the performance when using the full-text.

Keywords
Multi-label classification; document analysis; semantic annotation;

Journal
Proceedings of the Knowledge Capture Conference, K-CAP 2017

StatusPublished
FundersEuropean Commission
Publication date31/12/2017
URLhttp://hdl.handle.net/1893/28018
PublisherACM
Place of publicationNew York
ISBN9781450355537
ConferenceKnowledge Capture Conference 2017
Conference locationAustin, TX, USA
Dates