Laiphangbam Melinda

Research Scholar

Dataset:

In my research, I have used publicly available ICDAR competition dataset for layout analysis as well as for table detection. Listed below are the competition details:

  • A. Antonacopoulos, C. Clausner, C. Papadopoulos, and S. Pletschacher, “ICDAR2015 competition on recognition of documents with complex layouts-RDCL2015,” in Document Analysis and Recognition (ICDAR), 2015 13th International Conference on, pp. 1151–1155, IEEE, 2015.
  • M. Gbel, T. Hassan, E. Oro, and G. Orsi, “Icdar 2013 table competition,” in 2013 12th International Conference on Document Analysis and Recognition, pp. 1449–1453, Aug 2013.

And also generated my own layout analysis dataset (a total of 700 document pages) for variuos Indic Scripts (six scripts namely, Meetei Mayek, Telegu, Odia, Tamil, Malayalam and Kanada) and English language document using PRImA's ground-truth generation tool called Aletheia.

Groundtruth generation tool:

Aletheia is an advanced system for accurate and yet cost-effective analysis, recognition and annotation of scanned documents. And it aids the user with a number of automated and semi-automated tools which were developed and fine-tuned based on feedback.

  • C. Clausner, S. Pletschacher, and A. Antonacopoulos, “Aletheia-an advanced document layout and text ground-truthing system for production environments,” in 2011 International Conference on Document Analysis and Recognition, pp. 48–52, 2011.