Parameter-free table detection method

Laiphangbam Melinda and Chakravarthy Bhagvati

School of Computer and Information Sciences, University of Hyderabad, India

Abstract

In this paper, we propose two parameter-free table detection methods: one for the closed tables and other for open tables. The unifying idea is multi-gaussian analysis. Multigaussian analysis of text height histograms classifies the document content into text and non-text blocks. Closed tables are classified as non-text and their identification from the non- text blocks is similar to many earlier methods that remove the separators. We do not need any parameters to identify rows and columns and discriminate them from text blocks because of multigaussian analysis. Open tables are initially classified as text blocks and are detected by extending the multigaussian analysis to the heights and widths of text blocks. The text-blocks are grouped into three categories by multigaussian analysis. These groups are used to classify table cells and distinguish them from text blocks. Table blocks are merged to obtain the table region. Evaluation on various Indic script newspapers and ICDAR2013 table competition dataset shows that our methods achieve more than 90% in table recognition. The strength of our algorithm is that it is a parameter-free approach and requires no training dataset.

Results

Figure 1(a-d) are some of the correctly detected tables on ICDAR dataset. Outlines in orange and pink indicate the final close and open table detection respectively.

Figure 2(a-d) are some of the correctly detected tables on newspaper dataset.

Citation

L. Melinda, R. Ghanapuram, and C. Bhagvati, “Document layout analysis using multigaussian fitting,” in 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 747–752, 2017.