Conference Paper (published)
Details
Citation
Böschen F & Scherp A (2015) Multi-oriented text extraction from information graphics. In: Proceedings of the 2015 ACM Symposium on Document Engineering (DocEng '15). 2015 ACM Symposium on Document Engineering, Lausanne, Switzerland, 08.09.2015-11.09.2015. New York: ACM, pp. 35-38. https://doi.org/10.1145/2682571.2797092
Abstract
Existing research on analyzing information graphics assume to have a perfect text detection and extraction available. However, text extraction from information graphics is far from solved. To fill this gap, we propose a novel processing pipeline for multi-oriented text extraction from infographics. The pipeline applies a combination of data mining and computer vision techniques to identify text elements, cluster them into text lines, compute their orientation, and uses a state-of-the-art open source OCR engine to perform the text recognition. We evaluate our method on 121 infographics extracted from an open access corpus of scientific publications. The results show that our approach is effective and significantly outperforms a state-of-the-art baseline.
Keywords
Infographics; OCR; multi-oriented text extraction;
Journal
DocEng 2015 - Proceedings of the 2015 ACM Symposium on Document Engineering
Status | Published |
---|---|
Publication date | 31/12/2015 |
URL | http://hdl.handle.net/1893/28052 |
Publisher | ACM |
Place of publication | New York |
ISBN | 9781450333078 |
Conference | 2015 ACM Symposium on Document Engineering |
Conference location | Lausanne, Switzerland |
Dates | – |