Parameter-free table detection method
Laiphangbam Melinda and Chakravarthy Bhagvati
School of Computer and Information Sciences, University of Hyderabad, India
Abstract
In this paper, we propose two parameter-free table
detection methods: one for the closed tables and other
for open tables. The unifying idea is multi-gaussian analysis.
Multigaussian analysis of text height histograms classifies the
document content into text and non-text blocks. Closed tables
are classified as non-text and their identification from the non-
text blocks is similar to many earlier methods that remove the
separators. We do not need any parameters to identify rows
and columns and discriminate them from text blocks because of
multigaussian analysis. Open tables are initially classified as text
blocks and are detected by extending the multigaussian analysis
to the heights and widths of text blocks. The text-blocks are
grouped into three categories by multigaussian analysis. These
groups are used to classify table cells and distinguish them from
text blocks. Table blocks are merged to obtain the table region.
Evaluation on various Indic script newspapers and ICDAR2013
table competition dataset shows that our methods achieve more
than 90% in table recognition. The strength of our algorithm
is that it is a parameter-free approach and requires no training
dataset.